This file belongs to the dataset published in:
Van Boven, Cindy. 2023. Annotations of plural reduplication in NGT
(corpus & elicited data). University of Amsterdam / Amsterdam
University of Applied Sciences. DOI: 10.21942/uva.23260814.
Title: Phonological restrictions on nominal pluralization in Sign Language of the Netherlands: Evidence from corpus and elicited data
Author: C. van Boven
Contact: c.m.j.vanboven@uva.nl (Cindy van Boven)
Funding: This study is part of the project “Morphological reduplication in Sign Language of the Netherlands: A typological and theoretical perspective”, part of the research programme PhDs in the Humanities with project number PGW19.003, funded by the Dutch Research Council (NWO) (PhD student: C. van Boven; supervisors: dr. R. Pfau, dr. S. Hamann)
Data collection period: January - February 2020
Data collected by: C. van Boven
This study investigates nominal reduplication to express plurality in Sign Language of the Netherlands (NGT). The aim is to offer a comprehensive description of nominal pluralization processes in the language, focusing mostly on reduplication, adressing potential restrictions on this process.
1) Do phonological noun types in NGT differ with respect to the pluralization strategies they undergo, i.e. does NGT pluralization involve phonologically triggered allomorphy?
2) Do numerals/quantifiers ‘block’ reduplication, i.e., does NGT allow for NP-internal number agreement?
3) Does the number of syllables in the mouthing influence the number of repetitions in reduplication?
In addition to statistical analyses aiming to answer these questions, this file contains an analysis of the inter-rater agreement for the annotations of mouthings.
This study combines two methods: corpus analysis and data elicitation.
Corpus search: The starting point is a corpus search in the Corpus NGT (Crasborn et al. 2008; Crasborn and Zwitserlood 2008). The annotated part of the corpus was searched for plural nouns, which, according to the Corpus NGT Annotation Conventions, are annotated for ‘.PL’ in their gloss (Crasborn et al. 2015: 15). I therefore searched for “.pl” on the Gloss tier. I also searched for signs that appear in a plural context, but are not glossed for .PL: I searched for the plural of 12 frequent Dutch nouns on the Translation tier.
Data elicitation: For the purpose of this study, a gap-filling task was designed: Participants were presented with signed (carrier) sentences in which the plural noun was omitted and replaced by a question mark sign. Participants were asked to repeat the sentence and fill in the gap, based on a picture that shows the targeted plural noun. 21 nouns with different phonological specifications were targeted. Each noun was targeted twice: once in a sentence without a numeral/quantifier, and once preceded by a numeral/quantifier, resulting in a total of 42 carrier sentences for plural nouns. Moreover, 11 sentences eliciting singular nouns were added, in order to (i) ensure that participants did not simply reduplicate all signs, because they realized that the task targets plurals, and (ii) elicit the singular forms of the nouns that have an inherent repetition in their citation form for comparison.
References:
Crasborn, Onno & Inge Zwitserlood. 2008. The Corpus
NGT: An online corpus for professionals and laymen. In Onno Crasborn,
Thomas Hanke, Eleni Efthimiou, Inge Zwitserlood & Ernst Thoutenhoofd
(eds.), Construction and exploitation of sign language corpora. 3rd
workshop on the representation and processing of sign languages,
44–49. Paris: ELDA.
Crasborn, Onno, Inge Zwitserlood & Johan Ros. 2008.
The Corpus NGT: A digital open access corpus of movies and annotations
of Sign Language of the Netherlands. http://hdl.handle.net/hdl:1839/00-0000-0000-0004-DF8E-6
(accessed 26 March 2020).
Crasborn, Onno, Richard Bank, Inge Zwitserlood, Els van der
Kooij, Anne Meijer & Anna Sáfár. 2015. Annotation
conventions for the Corpus NGT, version 3. https://doi.org/10.13140/RG.2.1.1779.4649.
297 plural nouns were extracted from the corpus and divided into four phonological noun types: 88 body-anchored nouns, 194 lateral nouns, 11 midsagittal nouns, and 4 complex movement nouns. 189 nominal plurals were elicited from five deaf native NGT signers (one male, four female, age range 25–62, mean age 38.4). The elicited nominal plurals were divided in the same categories: 97 body-anchored nouns, 30 lateral nouns, 26 midsagittal nouns, and 36 complex movement nouns.
Loading data
data.plurals.all <- read.csv("plural_reduplication _annotations.csv", header = TRUE, sep = ";")
class(data.plurals.all)
## [1] "data.frame"
head(data.plurals.all)
## Participant Data_type Noun Noun_type Noun_type..spec Strategy1 Strategy2 Zero Num Number_repetitions Mouthing Syllables_mouthing Corpus.NGT.file.number Corpus.NGT.time.code
## 1 s001 corpus DING.PL L <NA> simple simple 0 0 NA none 0 CNGT0098 00:01:13.520-00:01:14.720
## 2 s001 corpus KIND.PL L <NA> sim sim 0 0 0 kind' 1 CNGT0098 00:04:56.360-00:04:56.805
## 3 s001 corpus SCHOOL L <NA> zero zero 1 0 0 school' 1 CNGT0099 00:04:09.760-00:04:10.280
## 4 s001 corpus SCHOOL L <NA> zero zero 1 0 0 unclear NA CNGT0099 00:04:17.280-00:04:17.400
## 5 s001 corpus KIND.PL L <NA> sim sim 0 0 0 kind' 1 CNGT0099 00:00:42.960-00:00:43.960
## 6 s002 corpus KIND.PL L <NA> sideward sideward 0 0 2 unclear NA CNGT0098 00:02:45.160-00:02:45.600
colnames(data.plurals.all)
## [1] "Participant" "Data_type" "Noun" "Noun_type" "Noun_type..spec" "Strategy1" "Strategy2" "Zero" "Num" "Number_repetitions" "Mouthing" "Syllables_mouthing" "Corpus.NGT.file.number" "Corpus.NGT.time.code"
data.plurals.all$Participant <- as.factor(data.plurals.all$Participant)
data.plurals.all$Data_type <- as.factor(data.plurals.all$Data_type)
data.plurals.all$Noun <- as.factor(data.plurals.all$Noun)
data.plurals.all$Noun_type <- as.factor(data.plurals.all$Noun_type)
data.plurals.all$Noun_type..spec <- as.factor(data.plurals.all$Noun_type..spec)
data.plurals.all$Strategy1 <- as.factor(data.plurals.all$Strategy1)
data.plurals.all$Strategy2 <- as.factor(data.plurals.all$Strategy2)
data.plurals.all$Zero <- as.factor(data.plurals.all$Zero)
data.plurals.all$Num <- as.factor(data.plurals.all$Num)
data.plurals.all$Number_repetitions <- as.numeric(data.plurals.all$Number_repetitions)
data.plurals.all$Mouthing <- as.factor(data.plurals.all$Mouthing)
data.plurals.all$Syllables_mouthing <- as.numeric(data.plurals.all$Syllables_mouthing)
data.plurals.all$Corpus.NGT.file.number <- as.factor(data.plurals.all$Corpus.NGT.file.number)
data.plurals.all$Corpus.NGT.time.code <- as.factor(data.plurals.all$Corpus.NGT.time.code)
#Check structure in data.plurals.all
str(data.plurals.all)
## 'data.frame': 486 obs. of 14 variables:
## $ Participant : Factor w/ 64 levels "p02","p03","p04",..: 6 6 6 6 6 7 7 7 7 8 ...
## $ Data_type : Factor w/ 2 levels "corpus","elicited": 1 1 1 1 1 1 1 1 1 1 ...
## $ Noun : Factor w/ 58 levels "(KLEIN)KIND",..: 14 25 47 47 25 25 25 25 47 51 ...
## $ Noun_type : Factor w/ 4 levels "B","C","L","M": 3 3 3 3 3 3 3 3 3 3 ...
## $ Noun_type..spec : Factor w/ 5 levels "alt","circ","contact",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Strategy1 : Factor w/ 6 levels "other","sideward",..: 5 4 6 6 4 2 2 6 5 2 ...
## $ Strategy2 : Factor w/ 5 levels "other","sideward",..: 4 3 5 5 3 2 2 5 4 2 ...
## $ Zero : Factor w/ 2 levels "0","1": 1 1 2 2 1 1 1 2 1 1 ...
## $ Num : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ Number_repetitions : num NA 0 0 0 0 2 2 0 1 3 ...
## $ Mouthing : Factor w/ 119 levels "'meisjesss' (I think s spreads over PTround)",..: 85 47 100 110 47 110 53 47 100 38 ...
## $ Syllables_mouthing : num 0 1 1 NA 1 NA 3 1 1 4 ...
## $ Corpus.NGT.file.number: Factor w/ 131 levels "CNGT0004","CNGT0007",..: 17 17 18 18 18 17 17 17 18 1 ...
## $ Corpus.NGT.time.code : Factor w/ 290 levels "00:00:04.640-00:00:06.720",..: 136 271 257 260 76 214 235 233 191 231 ...
summary(data.plurals.all)
## Participant Data_type Noun Noun_type Noun_type..spec Strategy1 Strategy2 Zero Num Number_repetitions Mouthing Syllables_mouthing Corpus.NGT.file.number Corpus.NGT.time.code
## p02 : 41 corpus :297 KIND.PL :105 B:185 alt : 10 other : 2 other : 2 0:338 0:330 Min. :0.0000 none : 68 Min. :0.000 none :189 none :189
## p03 : 40 elicited:189 MENS.PL : 43 C: 40 circ : 17 sideward :169 sideward:197 1:148 1:156 1st Qu.:0.0000 unclear : 56 1st Qu.:1.000 CNGT0256: 10 00:02:22.400-00:02:22.720: 3
## p04 : 39 DING.PL : 36 L:224 contact : 60 sideward.sim: 28 sim : 6 Median :1.0000 kinderen': 38 Median :1.000 CNGT0298: 10 00:00:14.370-00:00:14.700: 2
## p06 : 36 SCHOOL : 24 M: 37 nocontact: 37 sim : 6 simple :133 Mean :0.7824 kind' : 32 Mean :1.484 CNGT0014: 9 00:00:33.040-00:00:33.240: 2
## p05 : 33 PROBLEEM: 23 rep : 9 simple :133 zero :148 3rd Qu.:1.0000 mens' : 26 3rd Qu.:2.000 CNGT1684: 7 00:00:34.800-00:00:35.000: 2
## s014 : 22 VROUW : 15 NA's :353 zero :148 Max. :4.0000 mensen' : 19 Max. :4.000 CNGT0136: 6 00:01:42.360-00:01:42.800: 2
## (Other):275 (Other) :240 NA's :77 (Other) :247 NA's :56 (Other) :255 (Other) :286
Trimming the data: Removing nouns for which number of repetitions and/or mouthing was unclear
#First remove irrelevant column with many NA's:
data.plurals.trimmed <- subset (data.plurals.all, select = -Noun_type..spec)
#data.plurals.trimmed
#Then remove all rows with NA's:
data.plurals.withoutNA <- na.omit(data.plurals.trimmed)
#data.plurals.withoutNA
#Check structure in data.plurals.withoutNA
str(data.plurals.withoutNA)
## 'data.frame': 363 obs. of 13 variables:
## $ Participant : Factor w/ 64 levels "p02","p03","p04",..: 6 6 6 7 7 7 8 8 8 8 ...
## $ Data_type : Factor w/ 2 levels "corpus","elicited": 1 1 1 1 1 1 1 1 1 1 ...
## $ Noun : Factor w/ 58 levels "(KLEIN)KIND",..: 25 47 25 25 25 47 51 25 25 14 ...
## $ Noun_type : Factor w/ 4 levels "B","C","L","M": 3 3 3 3 3 3 3 3 3 3 ...
## $ Strategy1 : Factor w/ 6 levels "other","sideward",..: 4 6 4 2 6 5 2 3 2 6 ...
## $ Strategy2 : Factor w/ 5 levels "other","sideward",..: 3 5 3 2 5 4 2 2 2 5 ...
## $ Zero : Factor w/ 2 levels "0","1": 1 2 1 1 2 1 1 1 1 2 ...
## $ Num : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ Number_repetitions : num 0 0 0 2 0 1 3 1 3 0 ...
## $ Mouthing : Factor w/ 119 levels "'meisjesss' (I think s spreads over PTround)",..: 47 100 47 53 47 100 38 53 53 85 ...
## $ Syllables_mouthing : num 1 1 1 3 1 1 4 3 3 0 ...
## $ Corpus.NGT.file.number: Factor w/ 131 levels "CNGT0004","CNGT0007",..: 17 18 18 17 17 18 1 2 4 6 ...
## $ Corpus.NGT.time.code : Factor w/ 290 levels "00:00:04.640-00:00:06.720",..: 271 257 76 235 233 191 231 63 102 221 ...
## - attr(*, "na.action")= 'omit' Named int [1:123] 1 4 6 13 16 19 22 31 33 40 ...
## ..- attr(*, "names")= chr [1:123] "1" "4" "6" "13" ...
summary(data.plurals.withoutNA)
## Participant Data_type Noun Noun_type Strategy1 Strategy2 Zero Num Number_repetitions Mouthing Syllables_mouthing Corpus.NGT.file.number Corpus.NGT.time.code
## p02 : 31 corpus :220 KIND.PL : 60 B:164 other : 1 other : 1 0:238 0:244 Min. :0.0000 none : 45 Min. :0.000 none :143 none :143
## p03 : 29 elicited:143 MENS.PL : 41 C: 22 sideward :105 sideward:120 1:125 1:119 1st Qu.:0.0000 mens' : 26 1st Qu.:1.000 CNGT0014: 7 00:00:14.370-00:00:14.700: 2
## p04 : 29 PROBLEEM: 21 L:148 sideward.sim: 15 sim : 6 Median :1.0000 kinderen': 24 Median :1.000 CNGT1684: 7 00:00:34.800-00:00:35.000: 2
## p05 : 28 SCHOOL : 21 M: 29 sim : 6 simple :111 Mean :0.8058 kind' : 20 Mean :1.485 CNGT0136: 5 00:01:42.360-00:01:42.800: 2
## p06 : 26 DING.PL : 15 simple :111 zero :125 3rd Qu.:1.0000 mensen' : 19 3rd Qu.:2.000 CNGT0254: 5 00:02:22.400-00:02:22.720: 2
## s008 : 12 VROUW : 14 zero :125 Max. :4.0000 school' : 17 Max. :4.000 CNGT0137: 4 00:02:48.120-00:02:49.000: 2
## (Other):208 (Other) :191 (Other) :212 (Other) :192 (Other) :210
Loading annotations of mouthings by both annotators to check inter-rater agreement
Mouthings.both.annotators <- read.csv("Mouthings_bothannotators.csv", header = TRUE, sep = ";")
Mouthings.both.annotators #This data set shows the type of mouthing each annotator observed: no mouthing (none); a singular Dutch word (singular); a plural Dutch word (plural); or a reduplicated Dutch word (reduplication).
## Source Rater1 Rater2
## 1 CNGT0299, s017, 00:00:23.040 none singular
## 2 CNGT1684, s069, 00:02:53.320 none none
## 3 CNGT0099, s001, 00:04:09.760 singular singular
## 4 CNGT0255, s014, 00:06:26.800 singular plural
## 5 CNGT0438, s022, 00:01:59.280 plural plural
## 6 CNGT1628, s067, 00:00:52.305 plural plural
## 7 CNGT0333, s015, 00:00:37.600 reduplication reduplication
## 8 CNGT0065, s005, 00:00:05.720 other other
## 9 p06 none none
## 10 p06 none none
## 11 p05 plural plural
## 12 p03 plural plural
## 13 p02 reduplication singular
## 14 p06 reduplication reduplication
## 15 p02 singular singular
## 16 p06 singular singular
#Removing column with source file, which is not necessary to check inter-rater agreement
Mouthings.ratings <- subset (Mouthings.both.annotators, select = -Source)
Mouthings.ratings
## Rater1 Rater2
## 1 none singular
## 2 none none
## 3 singular singular
## 4 singular plural
## 5 plural plural
## 6 plural plural
## 7 reduplication reduplication
## 8 other other
## 9 none none
## 10 none none
## 11 plural plural
## 12 plural plural
## 13 reduplication singular
## 14 reduplication reduplication
## 15 singular singular
## 16 singular singular
Set contrasts
#setting contrasts for noun type
contrasts.nountype <- cbind(c(-1/4, +3/4, -1/4, -1/4), c(+1/4, +1/4, -1/4, -1/4), c(+3/4, -1/4, -1/4, -1/4)) # B, C, L, M
colnames (contrasts.nountype) <- c("-BLM+C", "-LM+BC", "-CLM+B")
contrasts (data.plurals.all$Noun_type) <- contrasts.nountype
contrasts (data.plurals.all$Noun_type)
## -BLM+C -LM+BC -CLM+B
## B -0.25 0.25 0.75
## C 0.75 0.25 -0.25
## L -0.25 -0.25 -0.25
## M -0.25 -0.25 -0.25
#setting contrasts for data type
contrasts.datatype <- cbind(c(-0.5, +0.5))
colnames (contrasts.datatype) <- c("-corpus+elicited")
contrasts (data.plurals.all$Data_type) <- contrasts.datatype
contrasts (data.plurals.all$Data_type)
## -corpus+elicited
## corpus -0.5
## elicited 0.5
Run model
Zero.nountype <- glmer(Zero ~ Noun_type + Data_type + (Noun_type | Participant), glmerControl(calc.derivs = FALSE, optCtrl=list(maxfun=1e6)), data=data.plurals.all, family=binomial)
## fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
summary(Zero.nountype)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
## Family: binomial ( logit )
## Formula: Zero ~ Noun_type + Data_type + (Noun_type | Participant)
## Data: data.plurals.all
## Control: glmerControl(calc.derivs = FALSE, optCtrl = list(maxfun = 1000000))
##
## AIC BIC logLik deviance df.resid
## 579.2 637.8 -275.6 551.2 472
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.2279 -0.6729 -0.3790 0.8809 3.4473
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Participant (Intercept) 0.3990 0.6317
## Noun_type-BLM+C 0.7889 0.8882 -0.86
## Noun_type-LM+BC 0.8635 0.9293 -0.50 0.14
## Noun_type-CLM+B 0.3594 0.5995 -0.70 0.96 -0.10
## Number of obs: 486, groups: Participant, 64
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8425 0.1657 -5.086 0.000000366 ***
## Noun_type-BLM+C 0.6250 0.3936 1.588 0.112
## Noun_type-LM+BC 2.3263 0.5495 4.233 0.000023031 ***
## Data_type-corpus+elicited 0.2088 0.3407 0.613 0.540
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) N_-BLM N_-LM+
## Nn_ty-BLM+C 0.122
## Nn_ty-LM+BC -0.465 -0.033
## Dt_typ-crp+ 0.281 -0.390 -0.330
## fit warnings:
## fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
Calculate odds ratio & confidence intervals
#Odds ratio of the logit scale coefficients
Zero.nountype_coef <- round(coef(summary(Zero.nountype)), 3)
Zero.nountype_OR <- exp(Zero.nountype_coef[,1])
Zero.nountype_OR
## (Intercept) Noun_type-BLM+C Noun_type-LM+BC Data_type-corpus+elicited
## 0.430848 1.868246 10.236912 1.232445
#Confidence intervals
Zero.nountype_CI <- exp(confint(Zero.nountype, method="Wald"))
Zero.nountype_CI
## 2.5 % 97.5 %
## .sig01 NA NA
## .sig02 NA NA
## .sig03 NA NA
## .sig04 NA NA
## .sig05 NA NA
## .sig06 NA NA
## .sig07 NA NA
## .sig08 NA NA
## .sig09 NA NA
## .sig10 NA NA
## (Intercept) 0.3112589 0.5958415
## Noun_type-BLM+C 0.8637362 4.0412373
## Noun_type-LM+BC 3.4877573 30.0657228
## Data_type-corpus+elicited 0.6319469 2.4027282
Set contrasts
#setting contrasts for numeral/quantifier
contrasts.NQ <- cbind(c(-0.5, +0.5))
colnames (contrasts.NQ) <- c("-no+yes")
contrasts (data.plurals.all$Num) <- contrasts.NQ
contrasts (data.plurals.all$Num)
## -no+yes
## 0 -0.5
## 1 0.5
Run model
Zero.NQ <- glmer(Zero ~ Num + Data_type + (Num | Participant), glmerControl(calc.derivs = FALSE, optCtrl=list(maxfun=1e6)), data=data.plurals.all, family=binomial) #specify family binomial
summary(Zero.NQ)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
## Family: binomial ( logit )
## Formula: Zero ~ Num + Data_type + (Num | Participant)
## Data: data.plurals.all
## Control: glmerControl(calc.derivs = FALSE, optCtrl = list(maxfun = 1000000))
##
## AIC BIC logLik deviance df.resid
## 590.1 615.2 -289.1 578.1 480
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.0739 -0.7160 -0.4788 1.0921 2.9115
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Participant (Intercept) 0.3030 0.5504
## Num-no+yes 0.5101 0.7142 -0.85
## Number of obs: 486, groups: Participant, 64
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.88281 0.15014 -5.880 0.00000000411 ***
## Num-no+yes 0.06348 0.29285 0.217 0.828
## Data_type-corpus+elicited 0.51206 0.34929 1.466 0.143
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Nm-n+y
## Num-no+yes -0.127
## Dt_typ-crp+ 0.293 -0.522
Calculate odds ratio & confidence intervals
#Odds ratio of the logit scale coefficients
Zero.NQ_coef <- round(coef(summary(Zero.NQ)), 3)
Zero.NQ_OR <- exp(Zero.NQ_coef[,1])
Zero.NQ_OR
## (Intercept) Num-no+yes Data_type-corpus+elicited
## 0.4135404 1.0650268 1.6686251
#Confidence intervals
Zero.NQ_CI <- exp(confint(Zero.NQ, method="Wald"))
Zero.NQ_CI
## 2.5 % 97.5 %
## .sig01 NA NA
## .sig02 NA NA
## .sig03 NA NA
## (Intercept) 0.3081747 0.5551369
## Num-no+yes 0.6002021 1.8916376
## Data_type-corpus+elicited 0.8415312 3.3089914
#Correlation test with trimmed data:
cor.test(data.plurals.withoutNA$Number_repetitions, data.plurals.withoutNA$Syllables_mouthing)
##
## Pearson's product-moment correlation
##
## data: data.plurals.withoutNA$Number_repetitions and data.plurals.withoutNA$Syllables_mouthing
## t = 3.8268, df = 361, p-value = 0.0001529
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.09647504 0.29439758
## sample estimates:
## cor
## 0.1974476
#Calculate percentage agreement:
agree(Mouthings.ratings)
## Percentage agreement (Tolerance=0)
##
## Subjects = 16
## Raters = 2
## %-agree = 81.2
#Percentage agreement is 81.2%
#Calculating Cohen's kappa:
kappa2(Mouthings.ratings)
## Cohen's Kappa for 2 Raters (Weights: unweighted)
##
## Subjects = 16
## Raters = 2
## Kappa = 0.756
##
## z = 5.69
## p-value = 0.0000000127