-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Najdi Arabic phoneme inventory is missing items #332
Comments
@camoverride -- thanks for pointing this out and sending some reproducible code. i'll look into it. |
@camoverride I'm chiming in here to provide a clarification: "ground truth" in this case is not Wikipedia, but rather Ingram 1994 (https://phoible.org/sources/67053). Phoneme inventories in PHOIBLE are not meant to represent a language but rather a particular instance of language documentation. It certainly can happen that we make a mistake in converting the analysis in Ingram 1994 into a PHOIBLE entry, but if the "mistake" here is that Ingram disagrees with other scholars about Najdi Arabic's phonology, that is a disagreement that we're interested in preserving. |
This issue is a bit more complicated than that, I think. I looked into the grammar by Ingram and indeed it does not list nasals among its consonants, but you nevertheless find them in the word forms in the grammar. After some discussion with @macleginn (this particular inventory is from (an earlier version of) EURPhon https://eurphon.info/languages/html?lang_id=135), he told me that he encountered some systematic omission of nasals (and liquids and rhotics) from descriptions of Arabic dialects. He put it rather succinctly to me: "There are doculects that demand some amount of hermeneutics, unfortunately." A point for discussion. |
Gotcha, Ingram as ground truth definitely makes sense - and it's reasonable not to want to play around with the source material too much. However, have you all developed a general strategy for tracking known "errors" in Ingram? Maybe it could act as a secondary source to augment phoible? |
Just a somewhat technical note regarding a"secondary source to augment phoible": I think that's a good idea - some sort of curated errata. And that's exactly one of the use cases we had in mind when designing CLDF to allow for easy merging: Such an errata list could be distributed in the same overall format as PHOIBLE, and then be transparently used to override PHOIBLE data in specified cases. However, in this particular case, I'd be a bit hesitant. I think the strength of PHOIBLE lies in in it being principled and complete. So for any use case that looks at all of PHOIBLE, an augmented phoible would also have to be complete to not diminish the PHOIBLE strength. "some amount of hermeneutics" doesn't really sound like systematic errata which can be fixed wholesale. |
@camoverride -- if you mean by tracking down errors, it depends on what one means by errors. As @drammock notes, above, inventories in phoible reflect doculects and in this case, at least for the missing nasals in Ingram's grammar, we would still be true to the original source because it does not list them in the consonants. We have always had the issue of systematic gaps in full database sources, e.g. UPSID contains purposely no tones. But since this inventory is from EURPhon, if it gets updated by their editors to address systematic gaps in certain areal linguistic practice (e.g. some semitic language descriptions systematically leaving out nasals, etc.) then EURPhon becomes more like UPSID in the sense that multiple doclects may be used for a single inventory and some typologicalization may occur. I think we will need to be clearer about such cases in our documentation moving forward, especially if some source editors identify systematic gaps and fill them without attributing multiple doculects. |
Yes, PHOIBLE being already the second-level aggregator makes things trickier. And since one of the the big advantages of PHOIBLE is machine-readable data, it would be nice, if documentation about systematic gaps - along the lines of the "no tones in UPSID" would be machine readable, too. But I don't really have an idea how to do that. There doesn't seem to be established terminology for "complete inventory" or "complete inventory without loans" or "complete inventory without tones" which could serve as basis for some sort of ontology. |
Sounds like an ontology is in order. :) |
@bambooforest you're the aggregator, you get to choose the categories :) |
@xrotwang sounds good. And you have / will have a place for them in CLDF :) |
Ground truth wikipedia vs Phoible Najdi Arabic Inventory page
I also confirmed this by inspecting the appropriate lines in
data/phoible.csv
I discovered this with the following SQL query, where I searched for languages lacking phonemes with the
+ nasal
feature (Najdi Arabic will be the last entry returned by this query):('Najdi Arabic', 0)
The text was updated successfully, but these errors were encountered: