1 Research question

We measured visual distributional learning in children with DLD as well as different types of lexical proficiency. We want to test whether the ability of picking up distributional cues when learning about novel object categories correlates with abilities underlying building up a mental lexicon. After variable reduction, we have three dependent variables and three predictor variables. The dependent variables are Vocabulary size, Word Categories performance and Word Associations performance.The predictor variable is mean accuracy on the visual distributional learning task (DLMeanAcc). As control variables we include Phonological processing, Non-verbal intelligence and Age.

2 Load data

CAT_regr <- read.delim("Data/CAT_regr.txt")
CAT_Test <- read.delim("Data/CAT_Test.txt")

3 Variable reduction: Principal component analysis

Can we reduce the number of predictor variables in our data? We run a PCA on our variables Digit span forwards, Digit span backwards, Non-word repetition) and Non-verbal intelligence.

3.1 Standardize variables

CAT_regr$DLMeanAcc <- scale(CAT_regr$DLMeanAcc, center = T, scale = T)
CAT_regr$ActVocab_raw <- scale(CAT_regr$ActVocab_raw, center = T, scale = T)
CAT_regr$PassVocab_raw <- scale(CAT_regr$PassVocab_raw, center = T, scale = T)
CAT_regr$WordCatTot_norm <- scale(CAT_regr$WordCatTot_norm, center = T, scale = T)
CAT_regr$WordAss_raw <- scale(CAT_regr$WordAss_raw, center = T, scale = T)
CAT_regr$DigitSpanFW_raw <- scale(CAT_regr$DigitSpanFW_raw, center = T, scale = T)
CAT_regr$DigitSpanBW_raw <- scale(CAT_regr$DigitSpanBW_raw, center = T, scale = T)
CAT_regr$NWR_raw <- scale(CAT_regr$NWR_raw, center = T, scale = T)
CAT_regr$NonVerbInt_raw <- scale(CAT_regr$NonVerbInt_raw, center = T, scale = T)
CAT_regr$Age_months <- scale(CAT_regr$Age_months, center = T, scale = T)

3.2 PCA

#Make dataframe with predictor values
Pred <- subset(CAT_regr, select=c("DigitSpanFW_raw","DigitSpanBW_raw", "NWR_raw", "NonVerbInt_raw"))
fit <- princomp(Pred, cor=TRUE)
summary(fit) # print variance accounted for
## Importance of components:
##                           Comp.1    Comp.2    Comp.3     Comp.4
## Standard deviation     1.3375491 1.1938901 0.7772680 0.42596153
## Proportion of Variance 0.4472594 0.3563434 0.1510364 0.04536081
## Cumulative Proportion  0.4472594 0.8036028 0.9546392 1.00000000
plot(fit,type="lines") # scree plot

The four components explain 44% and 36%, 15% and 5% of the variance respectively. We run a varimax rotated PCA for 3 components.

3.3 Varimax rotated PCA for 3 components

pca <- principal(Pred, nfactors=3, rotate="varimax")
pca
## Principal Components Analysis
## Call: principal(r = Pred, nfactors = 3, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                   RC1   RC2  RC3   h2      u2 com
## DigitSpanFW_raw  0.93 -0.22 0.05 0.91 0.08792 1.1
## DigitSpanBW_raw  0.05  0.20 0.98 1.00 0.00093 1.1
## NWR_raw          0.95  0.13 0.03 0.92 0.07983 1.0
## NonVerbInt_raw  -0.05  0.97 0.21 0.99 0.01276 1.1
## 
##                        RC1  RC2  RC3
## SS loadings           1.77 1.05 1.00
## Proportion Var        0.44 0.26 0.25
## Cumulative Var        0.44 0.70 0.95
## Proportion Explained  0.46 0.27 0.26
## Cumulative Proportion 0.46 0.74 1.00
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 3 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.04 
##  with the empirical chi square  0.47  with prob <  NA 
## 
## Fit based upon off diagonal values = 0.99

The first component seems to represent phonological short-term memory (digit span forwards and non-word repetition). The other components represent the variables non-verbal intelligence (Raven) and verbal working memory (digit span backwards). We save the components for further analyses.

3.4 Save and rename component scores

CAT_regr <- cbind(CAT_regr, pca$scores)
colnames(CAT_regr)[colnames(CAT_regr)=="RC1"] <- "C_PhonProc"
colnames(CAT_regr)[colnames(CAT_regr)=="RC2"] <- "C_NonVerbInt"
colnames(CAT_regr)[colnames(CAT_regr)=="RC3"] <- "C_VerbWM"

4 Regression analyses

4.1 Histogram predictor variable accuracy distributional learning task

Visualization of the predictor variable reflecting distributional learning in children with DLD

## Saving 7 x 5 in image
p

4.2 Check correlations between predictors

Check whether four control predictors (distributional learning ability, age, non-verbal intelligence, phonological processing and verbal working memory) are significantly correlated.

Pred <- subset(CAT_regr, select=c("DLMeanAcc","C_NonVerbInt", "C_PhonProc","C_VerbWM", "Age_months"))
rcorr(as.matrix(Pred), type="pearson")
##              DLMeanAcc C_NonVerbInt C_PhonProc C_VerbWM Age_months
## DLMeanAcc         1.00        -0.05      -0.17    -0.24       0.09
## C_NonVerbInt     -0.05         1.00       0.00     0.00      -0.21
## C_PhonProc       -0.17         0.00       1.00     0.00       0.09
## C_VerbWM         -0.24         0.00       0.00     1.00       0.26
## Age_months        0.09        -0.21       0.09     0.26       1.00
## 
## n= 25 
## 
## 
## P
##              DLMeanAcc C_NonVerbInt C_PhonProc C_VerbWM Age_months
## DLMeanAcc              0.8189       0.4256     0.2558   0.6771    
## C_NonVerbInt 0.8189                 1.0000     1.0000   0.3227    
## C_PhonProc   0.4256    1.0000                  1.0000   0.6748    
## C_VerbWM     0.2558    1.0000       1.0000              0.2165    
## Age_months   0.6771    0.3227       0.6748     0.2165

## Saving 7 x 5 in image

No significant correlations between predictors.

4.3 Multiple linear regression models

We run three separate linear models for the four dependent variables Passive vocabulary, Active vocabulary, Word Categories and Word associations.

4.3.1 Passive vocabulary

model.passvocab <- lm(PassVocab_raw~Age_months+C_NonVerbInt+C_PhonProc+C_VerbWM+DLMeanAcc, data=CAT_regr)
summary(model.passvocab)
## 
## Call:
## lm(formula = PassVocab_raw ~ Age_months + C_NonVerbInt + C_PhonProc + 
##     C_VerbWM + DLMeanAcc, data = CAT_regr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.48212 -0.71245 -0.08842  0.50206  1.70571 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -2.948e-16  2.065e-01   0.000    1.000
## Age_months    1.932e-01  2.274e-01   0.850    0.406
## C_NonVerbInt  1.688e-01  2.158e-01   0.783    0.444
## C_PhonProc    2.659e-01  2.155e-01   1.234    0.232
## C_VerbWM      9.625e-02  2.272e-01   0.424    0.677
## DLMeanAcc     1.509e-01  2.237e-01   0.675    0.508
## 
## Residual standard error: 1.032 on 19 degrees of freedom
## Multiple R-squared:  0.1562, Adjusted R-squared:  -0.06589 
## F-statistic: 0.7033 on 5 and 19 DF,  p-value: 0.6279

Model is not significant (F = 0.703, p = 0.628), none of the predictors are significant.

4.3.2 Active vocabulary

model.actvocab <- lm(ActVocab_raw~Age_months+C_NonVerbInt+C_PhonProc+C_VerbWM+DLMeanAcc, data=CAT_regr)
summary(model.actvocab)
## 
## Call:
## lm(formula = ActVocab_raw ~ Age_months + C_NonVerbInt + C_PhonProc + 
##     C_VerbWM + DLMeanAcc, data = CAT_regr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.94129 -0.50274  0.09347  0.36316  1.87645 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -5.203e-18  1.964e-01   0.000    1.000
## Age_months    3.645e-01  2.163e-01   1.685    0.108
## C_NonVerbInt  1.558e-01  2.052e-01   0.759    0.457
## C_PhonProc    2.953e-01  2.050e-01   1.440    0.166
## C_VerbWM     -1.183e-01  2.161e-01  -0.547    0.591
## DLMeanAcc     6.756e-02  2.127e-01   0.318    0.754
## 
## Residual standard error: 0.9819 on 19 degrees of freedom
## Multiple R-squared:  0.2368, Adjusted R-squared:  0.03594 
## F-statistic: 1.179 on 5 and 19 DF,  p-value: 0.3559

Model is not significant (F = 1.179, p = 0.356), none of the predictors are significant.

model.wordcat <- lm(WordCatTot_norm~Age_months+C_NonVerbInt+C_PhonProc+C_VerbWM+DLMeanAcc, data=CAT_regr)
summary(model.wordcat)
## 
## Call:
## lm(formula = WordCatTot_norm ~ Age_months + C_NonVerbInt + C_PhonProc + 
##     C_VerbWM + DLMeanAcc, data = CAT_regr)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8608 -0.3215 -0.1166  0.5350  1.7561 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  -7.959e-18  1.816e-01   0.000   1.0000  
## Age_months   -1.882e-01  2.000e-01  -0.941   0.3587  
## C_NonVerbInt  4.218e-01  1.898e-01   2.223   0.0386 *
## C_PhonProc   -2.529e-01  1.896e-01  -1.334   0.1980  
## C_VerbWM      1.722e-02  1.999e-01   0.086   0.9322  
## DLMeanAcc    -1.749e-01  1.967e-01  -0.889   0.3852  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9081 on 19 degrees of freedom
## Multiple R-squared:  0.3472, Adjusted R-squared:  0.1754 
## F-statistic: 2.021 on 5 and 19 DF,  p-value: 0.1215

Model not significant: F = 2.021, p = 0.122. Non-verbal intelligence significantly predicts Word categories total score: t = 2.223, p = 0.039.

4.3.3 Word Associations

model.wordass <- lm(WordAss_raw~Age_months+C_NonVerbInt+C_PhonProc+C_VerbWM+DLMeanAcc, data=CAT_regr)
summary(model.wordass)
## 
## Call:
## lm(formula = WordAss_raw ~ Age_months + C_NonVerbInt + C_PhonProc + 
##     C_VerbWM + DLMeanAcc, data = CAT_regr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.64500 -0.58969 -0.01297  0.53016  2.40840 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -2.191e-16  2.050e-01   0.000    1.000
## Age_months    1.679e-01  2.258e-01   0.744    0.466
## C_NonVerbInt  1.343e-01  2.142e-01   0.627    0.538
## C_PhonProc    1.052e-01  2.140e-01   0.491    0.629
## C_VerbWM     -3.596e-01  2.256e-01  -1.594    0.127
## DLMeanAcc     8.031e-02  2.221e-01   0.362    0.722
## 
## Residual standard error: 1.025 on 19 degrees of freedom
## Multiple R-squared:  0.1681, Adjusted R-squared:  -0.05081 
## F-statistic: 0.7679 on 5 and 19 DF,  p-value: 0.5843

Model is not significant (F = 0.768, p = 0.584) and none of the variables significantly predict the dependent variable.

4.4 Report

No evidence for or against a relationship between visual distributional learning and our three measures of vocabulary knowledge. Results imply a relationship between non-verbal intelligence and semantic knowledge.