Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Metadata: Different kinds of missingness #171

Open
dpprdan opened this issue Jan 29, 2020 · 2 comments
Open

Metadata: Different kinds of missingness #171

dpprdan opened this issue Jan 29, 2020 · 2 comments

Comments

@dpprdan
Copy link

dpprdan commented Jan 29, 2020

Destatis reports different kinds of missingness on regionalstatistik.de (e.g. 0 apparently does not mean zero).

grafik

For example regionalstatistik.de reports . for statistic 61511:BAU001 (Veräußerungsfälle von Bauland), region Flensburg (01001), year 2018.

grafik

api.datengui.de and tabular.genesapi.org report this as 0.

r <- purrr::partial(read.csv, colClasses = "character")
r("https://tabular.genesapi.org/?data=61511:BAU001&time=2018&region=01001&labels=id")
#>   region_id year measure value statistic
#> 1     01001 2018  BAU001     0     61511

Would it be possible to report different kinds of missingness? And is this even desirable, given that one would have to switch from numeric to character values for example? If not, should there be at least a distinction between NULL and 0?

@dpprdan
Copy link
Author

dpprdan commented Feb 28, 2020

The Genesis SOAP API returns not only a value but also a quality indicator. The SOAP API query that corresponds to the example above is:

https://www.regionalstatistik.de/genesisws/services/ExportService_2010?method=DatenExport&kennung=USERNAME&passwort=PASSWORD&namen=61511KJ001&bereich=Alle&format=csv&werte=true&metadaten=false&zusatz=false&startjahr=2018&endjahr=2018&zeitscheiben=&inhalte=&regionalmerkmal=&regionalschluessel=01001&sachmerkmal=&sachschluessel=&sachmerkmal2=&sachschluessel2=&sachmerkmal3=&sachschluessel3=&stand=&sprache=en

which returns the following quaderDaten:

<quaderDaten>
* Der Benutzer USERNAME der Benutzergruppe USERNAME hat am 28.02.2020 um 16:37:34 diesen Export angestossen. K;DQ;FACH-SCHL;GHH-ART;GHM-WERTE-JN;GENESIS-VBD;REGIOSTAT;EU-VBD;"mit Werten" D;61511KJ001;;N;J;N;N K;DQ-ERH;FACH-SCHL D;61511 K;DQA;NAME;RHF-BSR;RHF-ACHSE D;KREISE;1;1 K;DQZ;NAME;ZI-RHF-BSR;ZI-RHF-ACHSE D;JAHR;2;2 K;DQI;NAME;ME-NAME;DST;TYP;NKM-STELLEN D;BAU001;Anzahl;GANZ;FALL;0 D;BAU002;1000 qm;GANZ;FALL;0 D;BAU003;Tsd. EUR;GANZ;FALL;0 D;BAU004;EUR;FEST;FALL;2 K;QEI;FACH-SCHL;ZI-WERT;WERT;QUALITAET;GESPERRT;WERT-VERFAELSCHT D;01001;2018;0;.;;0;0;.;;0;0;.;;0;0.00;.;;0.00
</quaderDaten>

Notice the .s on the last line, which are the quality indicators.

The potential categories of the quality indicator are available via the SOAP API as well (ZeichenKatalog). Most of these categories represent (different reasons for) missingness.

If the quality indicator is present, the value is 0 - and not NULL. This is why it is necessary (IMHO) for datengui.de to return the quality indicator as well (optionally on request).

To make things a bit more ... interesting, some missing values are indeed missing (deliberately) from the SOAP API, but not from the web interface. The following is the same query for another region (AGS 08316):

https://www.regionalstatistik.de/genesisws/services/ExportService_2010?method=DatenExport&kennung=USERNAME&passwort=PASSWORD&namen=61511KJ001&bereich=Alle&format=csv&werte=true&metadaten=false&zusatz=false&startjahr=2018&endjahr=2018&zeitscheiben=&inhalte=&regionalmerkmal=&regionalschluessel=08316&sachmerkmal=&sachschluessel=&sachmerkmal2=&sachschluessel2=&sachmerkmal3=&sachschluessel3=&stand=&sprache=en

which returns

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<DatenExportResponse soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<DatenExportReturn>
<quader xmlns:ns1="daten.methods.webservice.genesis" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" soapenc:arrayType="ns1:Quader[1]" xsi:type="soapenc:Array">
<quader>
<format/>
<name>61511KJ001</name>
<quaderDaten/>
<returnInfo>
<code>61</code>
<inhalt>There are no values available.</inhalt>
<typ>Fehler</typ>
</returnInfo>
<stand/>
<status>Aktualisierte Daten</status>
</quader>
</quader>
<quaderAuswahl>
<bereich>Alle</bereich>
<namen>61511KJ001</namen>
</quaderAuswahl>
<quaderOptionen>
<endjahr>2018</endjahr>
<format>csv</format>
<inhalte/>
<metadaten>false</metadaten>
<regionalMerkmal/>
<regionalSchluessel>08316</regionalSchluessel>
<sachMerkmal/>
<sachMerkmal2/>
<sachMerkmal3/>
<sachSchluessel/>
<sachSchluessel2/>
<sachSchluessel3/>
<sprache>en</sprache>
<stand/>
<startjahr>2018</startjahr>
<werte>true</werte>
<zeitscheiben>0</zeitscheiben>
<zusatz>false</zusatz>
</quaderOptionen>
<returnInfo>
<code>1</code>
<inhalt>
At least one object has reported a warning or error.
</inhalt>
<typ>Information</typ>
</returnInfo>
</DatenExportReturn>
</DatenExportResponse>
</soapenv:Body>
</soapenv:Envelope>

If one omits the 08316 Regionalschlüssel from the query above, it returns values for all available regions but nothing for 08316.

If one specifies a different startjahr, e.g. 2017, it returns data for 2017 but not for 2018.

According to the Team Regionaldatenbank Deutschland this is the intended behaviour.

The web interface ("Abruftabellen"), however, returns the following, i.e. it does not omit the data for 2018:

grafik

The latter, i.e. the SOAP API not returning any values for the query above, probably does not have any real consequences for (api.)datengui.de. At the moment it leads to an Internal Server Error on tabular.genesapi.org though:

https://tabular.genesapi.org/?data=61511:BAU001&time=2018&region=08316

So from my point of view both api.datengui.de and tabular.genesapi should return the quality parameter (one request). And tabular.genesapi should be able to handle empty responses instead of throwing an error.

@sjockers
Copy link
Member

Hi Daniel, thanks for the detailed description of this problem and sorry for the extremely slow reply. We will definitely support quality indicators and other "footnotes". We are currently in the process of figuring out how we will represent this in the API. We will let you know once we know more.

@sjockers sjockers added the API label Feb 29, 2020
@sjockers sjockers changed the title Different kinds of missingness Metadata: Different kinds of missingness Sep 22, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants