Najdi Arabic phoneme inventory is missing items #332

camoverride · 2021-03-28T00:32:56Z

Ground truth wikipedia vs Phoible Najdi Arabic Inventory page

I also confirmed this by inspecting the appropriate lines in data/phoible.csv

I discovered this with the following SQL query, where I searched for languages lacking phonemes with the + nasal feature (Najdi Arabic will be the last entry returned by this query):

SELECT x.LanguageName, SUM(x.nasal) AS num_nasals
FROM (SELECT InventoryID, LanguageName,
CASE WHEN nasal = '-' THEN 0 ELSE 1 END AS nasal
FROM phoible) AS x
GROUP BY InventoryID, x.LanguageName
ORDER BY num_nasals ASC
LIMIT 16

('Najdi Arabic', 0)

The text was updated successfully, but these errors were encountered:

bambooforest · 2021-03-28T07:49:20Z

@camoverride -- thanks for pointing this out and sending some reproducible code. i'll look into it.

drammock · 2021-03-28T19:39:07Z

@camoverride I'm chiming in here to provide a clarification: "ground truth" in this case is not Wikipedia, but rather Ingram 1994 (https://phoible.org/sources/67053). Phoneme inventories in PHOIBLE are not meant to represent a language but rather a particular instance of language documentation. It certainly can happen that we make a mistake in converting the analysis in Ingram 1994 into a PHOIBLE entry, but if the "mistake" here is that Ingram disagrees with other scholars about Najdi Arabic's phonology, that is a disagreement that we're interested in preserving.

bambooforest · 2021-03-28T20:11:34Z

This issue is a bit more complicated than that, I think. I looked into the grammar by Ingram and indeed it does not list nasals among its consonants, but you nevertheless find them in the word forms in the grammar.

After some discussion with @macleginn (this particular inventory is from (an earlier version of) EURPhon https://eurphon.info/languages/html?lang_id=135), he told me that he encountered some systematic omission of nasals (and liquids and rhotics) from descriptions of Arabic dialects.

He put it rather succinctly to me: "There are doculects that demand some amount of hermeneutics, unfortunately."

A point for discussion.

camoverride · 2021-03-29T02:22:00Z

Gotcha, Ingram as ground truth definitely makes sense - and it's reasonable not to want to play around with the source material too much.

However, have you all developed a general strategy for tracking known "errors" in Ingram? Maybe it could act as a secondary source to augment phoible?

xrotwang · 2021-03-29T05:51:32Z

Just a somewhat technical note regarding a"secondary source to augment phoible": I think that's a good idea - some sort of curated errata. And that's exactly one of the use cases we had in mind when designing CLDF to allow for easy merging: Such an errata list could be distributed in the same overall format as PHOIBLE, and then be transparently used to override PHOIBLE data in specified cases.

However, in this particular case, I'd be a bit hesitant. I think the strength of PHOIBLE lies in in it being principled and complete. So for any use case that looks at all of PHOIBLE, an augmented phoible would also have to be complete to not diminish the PHOIBLE strength. "some amount of hermeneutics" doesn't really sound like systematic errata which can be fixed wholesale.

bambooforest · 2021-03-29T08:19:31Z

@camoverride -- if you mean by tracking down errors, it depends on what one means by errors. As @drammock notes, above, inventories in phoible reflect doculects and in this case, at least for the missing nasals in Ingram's grammar, we would still be true to the original source because it does not list them in the consonants.

We have always had the issue of systematic gaps in full database sources, e.g. UPSID contains purposely no tones. But since this inventory is from EURPhon, if it gets updated by their editors to address systematic gaps in certain areal linguistic practice (e.g. some semitic language descriptions systematically leaving out nasals, etc.) then EURPhon becomes more like UPSID in the sense that multiple doclects may be used for a single inventory and some typologicalization may occur.

I think we will need to be clearer about such cases in our documentation moving forward, especially if some source editors identify systematic gaps and fill them without attributing multiple doculects.

xrotwang · 2021-03-29T08:29:49Z

Yes, PHOIBLE being already the second-level aggregator makes things trickier. And since one of the the big advantages of PHOIBLE is machine-readable data, it would be nice, if documentation about systematic gaps - along the lines of the "no tones in UPSID" would be machine readable, too. But I don't really have an idea how to do that. There doesn't seem to be established terminology for "complete inventory" or "complete inventory without loans" or "complete inventory without tones" which could serve as basis for some sort of ontology.

bambooforest · 2021-03-29T08:32:56Z

Sounds like an ontology is in order. :)

xrotwang · 2021-03-29T08:35:00Z

@bambooforest you're the aggregator, you get to choose the categories :)

bambooforest · 2021-03-29T08:36:59Z

@xrotwang sounds good. And you have / will have a place for them in CLDF :)

bambooforest self-assigned this Mar 28, 2021

bambooforest mentioned this issue Mar 29, 2021

Develop categories for known source gaps #333

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Najdi Arabic phoneme inventory is missing items #332

Najdi Arabic phoneme inventory is missing items #332

camoverride commented Mar 28, 2021

bambooforest commented Mar 28, 2021

drammock commented Mar 28, 2021

bambooforest commented Mar 28, 2021

camoverride commented Mar 29, 2021

xrotwang commented Mar 29, 2021

bambooforest commented Mar 29, 2021

xrotwang commented Mar 29, 2021

bambooforest commented Mar 29, 2021

xrotwang commented Mar 29, 2021

bambooforest commented Mar 29, 2021

Najdi Arabic phoneme inventory is missing items #332

Najdi Arabic phoneme inventory is missing items #332

Comments

camoverride commented Mar 28, 2021

bambooforest commented Mar 28, 2021

drammock commented Mar 28, 2021

bambooforest commented Mar 28, 2021

camoverride commented Mar 29, 2021

xrotwang commented Mar 29, 2021

bambooforest commented Mar 29, 2021

xrotwang commented Mar 29, 2021

bambooforest commented Mar 29, 2021

xrotwang commented Mar 29, 2021

bambooforest commented Mar 29, 2021