This folder contains all the data compiled for the study, organized in CSV (comma-separated values) files. The contents are structured as follows:
- cldf: dataset in the CLDF format, generated with this python script
- verb_stem_data.csv
- Language_ID: Reference to languages:
ID
. - Form: Derivational morphology segmented with
+
, elements in brackets only surface sometimes. - Source: bibkey[page] referencing references.bib, multiple separated by
;
,pc
for personal communication - Cognateset_ID: cognate_sets:
ID
+cognate_sets:ID
. - Class: Class of the verb:
S_A_
orS_P_
.–
when no split-S,?
when class unknown,S_A_ / S_P_
for mixed Wayana verbs. - Comment: Comments
- Cog_Cert: Cases which do not seem fully cognate are marked with
0.5
here. - Meaning_ID: Reference to values of cognate_sets:
Meaning
.
- Language_ID: Reference to languages:
- split_s_data.csv
- Language_ID: Reference to languages:
ID
. - Construction: Kind of verb form.
- Form: Form with
-
separating morphemes. - Meaning: Direct translation, not a cognate_sets:
Meaning
. - Class: Verb class,
S_A_
orS_P_
. - Source: bibkey[page] referencing references.bib, multiple separated by
;
,pc
for personal communication
- Language_ID: Reference to languages:
- other_lexemes.csv
- Language_ID: Reference to languages:
ID
. - Form: Form.
- Meaning: Translation.
- Source: bibkey[page] referencing references.bib, multiple separated by
;
,pc
for personal communication - Full_Form: Full form as it appeared in the cited source.
- Cognateset_ID: cognate_sets:
ID
- Language_ID: Reference to languages:
- examples.csv
- ID: An ID, usually consisting of languages:
ID
-X
. - Language_ID: Reference to languages:
ID
. - Sentence: Either identical to
Segmentation
, an orthographical form, and/or the form in the source. - Segmentation:
-
separate morphemes, spaces phonological words. - Gloss: Corresponding to
Segmentation
. - Translation: Free English translation.
- Source: bibkey[page] referencing references.bib,
pc
for personal communication - Orig_Segmentation: Segmentation of the form as it appears in the source.
- Orig_Glossing: Glossing as it appears in the source.
- Orig_Translation: Translation as it appears in the source.
- Comment: Comments.
- ID: An ID, usually consisting of languages:
- extensions.csv
- ID: IDs referring to extensions.
- Language_ID: Reference to languages:
ID
. - Form: Form of the innovative prefix.
- Cognateset_ID: cognate_sets:
ID
- Comment: Comments.
- bathe_data.csv
- Language_ID: Reference to languages:
ID
. - Form:
+
separates (etymological) morphemes. - Source: bibkey[page] referencing references.bib, multiple separated by
;
,pc
for personal communication - Cognateset_ID: cognate_sets:
ID
+
cognate_sets:ID
- Transitivity: Either transitive or intransitive 'to bathe'.
- Comment: Comments
- Language_ID: Reference to languages:
- apalai_sa_verb_stats.csv
- Form: Form.
- Meaning: Meaning.
- ID: cognate_sets:
ID
or genericreg_sa
- Count: Times the verb occurred
- % Sa: Ratio of verb tokens in all Sa verbs
- % Words: Overall ratio
- High_Frequency: Defined as more than average, used for classifying cognate verbs.
- cognate_sets.csv
- languages.csv
- inflection_data.csv
- Meaning_ID: Reference to values of cognate_sets:
Meaning
. - Verb_Cognateset_ID: cognate_sets:
ID
+cognate_sets:ID
. - Language_ID: Reference to languages:
ID
. - Inflection: Person inflection of the form.
- Form:
-
mark boundaries between productive morphemes,+
between unproductive morphemes. - Prefix_Cognateset_ID: cognate_sets:
ID
+cognate_sets:ID
. - Source: bibkey[page] referencing references.bib, multiple separated by
;
,pc
for personal communication - Full_Form: The full form in the source, if applicable.
- Comment: Comments
- Meaning_ID: Reference to values of cognate_sets: