-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
json_normalize does not normalize subrecords properly if any subrecords values are NoneType #20030
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Labels
Bug
IO JSON
read_json, to_json, json_normalize
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone
Comments
I would personally prefer the second option. Feel free to post the PR! |
@gfyoung Were you referring to opening this as two separate issues? |
@aerymilts : I was referring to the second output being preferable to the first. |
aerymilts
added a commit
to aerymilts/pandas
that referenced
this issue
Mar 18, 2018
…0030) TST: additional coverage for the test cases from (pandas-dev#20030) DOC: added changes to whatsnew/v0.23.0.txt (pandas-dev#20030)
4 tasks
aerymilts
added a commit
to aerymilts/pandas
that referenced
this issue
Mar 20, 2018
jreback
pushed a commit
that referenced
this issue
Mar 20, 2018
nehiljain
added a commit
to nehiljain/pandas
that referenced
this issue
Mar 21, 2018
…ame_describe * upstream/master: (158 commits) Add link to "Craft Minimal Bug Report" blogpost (pandas-dev#20431) BUG: fixed json_normalize for subrecords with NoneTypes (pandas-dev#20030) (pandas-dev#20399) BUG: ExtensionArray.fillna for scalar values (pandas-dev#20412) DOC" update the Pandas core window rolling count docstring" (pandas-dev#20264) DOC: update the pandas.DataFrame.plot.hist docstring (pandas-dev#20155) DOC: Only use ~ in class links to hide prefixes. (pandas-dev#20402) Bug: Allow np.timedelta64 objects to index TimedeltaIndex (pandas-dev#20408) DOC: add disallowing of Series construction of len-1 list with index to whatsnew (pandas-dev#20392) MAINT: Remove weird pd file DOC: update the Index.isin docstring (pandas-dev#20249) BUG: Handle all-NA blocks in concat (pandas-dev#20382) DOC: update the pandas.core.resample.Resampler.fillna docstring (pandas-dev#20379) BUG: Don't raise exceptions splitting a blank string (pandas-dev#20067) DOC: update the pandas.DataFrame.cummax docstring (pandas-dev#20336) DOC: update the pandas.core.window.x.mean docstring (pandas-dev#20265) DOC: update the api.types.is_number docstring (pandas-dev#20196) Fix linter (pandas-dev#20389) DOC: Improved the docstring of pandas.Series.dt.to_pytimedelta (pandas-dev#20142) DOC: update the pandas.Series.dt.is_month_end docstring (pandas-dev#20181) DOC: update the window.Rolling.min docstring (pandas-dev#20263) ...
dworvos
pushed a commit
to dworvos/pandas
that referenced
this issue
Apr 2, 2018
This was referenced May 24, 2018
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Labels
Bug
IO JSON
read_json, to_json, json_normalize
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Code Sample, a copy-pastable example if possible
Output 1
Output 2
Problem description
I expected that the json_normalize function takes into account the presence of NoneTypes in the dictionaries. This leads to 2 separate issues (If I should open this as 2 separate issues, let me know).
I have already written a fix that solves this issue - if anyone else can validate that this is not working as intended, I can set up a PR.
Output 1
Does not unnest json after encountering NoneType at first instance of subrecord, line 192 of
pandas/io/json/normalize.py
Output 2
Keeps the None value when encountered, [{k: {'alpha': 'foo', 'beta': 'bar'}}, {k: None}], see
nested_to_record
function. Creates additional column ofnan
s which would not otherwise occur if that particular key was removed.Expected Output
Output 1
Output 2
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-104-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: None
numpy: 1.13.1
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: