Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Bugfix 412 replace numeric value in combustion #413

Merged
merged 4 commits into from
Jan 28, 2023

Conversation

nesnoj
Copy link
Collaborator

@nesnoj nesnoj commented Jan 20, 2023

Fixes #412

Here's the start. But adding the column name as in #392 does not work as some values seem to be of list type:

Table 'combustion_extended' is filled with data 'einheitenverbrennung' from the bulk download.
File 'EinheitenVerbrennung.xml' is parsed.
Data is cleansed.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/open-MaStR/open_mastr/mastr.py", line 227, in download
    write_mastr_xml_to_database(
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/open-MaStR/open_mastr/xml_download/utils_write_to_database.py", line 58, in write_mastr_xml_to_database
    df = cleanse_bulk_data(df, zipped_xml_file_path)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/open-MaStR/open_mastr/xml_download/utils_cleansing_bulk.py", line 14, in cleanse_bulk_data
    df = replace_mastr_katalogeintraege(
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/open-MaStR/open_mastr/xml_download/utils_cleansing_bulk.py", line 41, in replace_mastr_katalogeintraege
    df[column_name].astype("float").astype("Int64").map(katalogwerte)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/generic.py", line 6240, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 448, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 352, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 526, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 230, in astype_array
    values = astype_nansafe(values, dtype, copy=copy)
  File "/home/marry_poppins/bnetza_mastr/bnetza_open_mastr_2023-01-20/venv/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 170, in astype_nansafe
    return arr.astype(dtype, copy=True)
ValueError: could not convert string to float: '2442, 2442'

Example numeric data from CSV:

units.WeitereBrennstoffe.unique()
array([nan, '2473', '2469', '2445', '2467', '2466', '2442, 2442', '2421',
       '2479', '2448', '2436', '2445, 2469, 2473', '2424', '2437', 2473.0,
       2416.0, 2469.0, 2442.0, 2467.0, 2445.0, 2472.0, 2448.0, 2477.0,
       2436.0, 2419.0, 2457.0, '2467, 2414', '2414', '2442', '2447',
       ...
       '2469, 2473, 2474, 2477', 3035.0, 2468.0], dtype=object)

Is there a function to cope with list values in the strings? I guess other columns show similar patterns? @FlorianK13

@deniztepe deniztepe requested a review from FlorianK13 January 27, 2023 14:23
@deniztepe deniztepe marked this pull request as ready for review January 27, 2023 14:23
@deniztepe deniztepe merged commit 0a3d86e into develop Jan 28, 2023
@deniztepe deniztepe deleted the bugfix-412-replace-numeric-value-in-combustion branch January 28, 2023 10:02
@nesnoj
Copy link
Collaborator Author

nesnoj commented Jan 28, 2023

Thx!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace numeric value in column combustion.WeitereBrennstoffe with name
3 participants