-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Split nested dict columns #96
Comments
Hey, @cuducos , I am back.
Would you like to suggest any other? |
I think flatten is more used, but might be just my impression : ) |
No problem. I believe that your impression has mucho more context info, them mine... By the way, I was reviewing my proposal, and I am realizing that Another approach would be to have a function that flatten the columns according to its class, no needing to process the data adquirement again. What do yout think? |
@cuducos , before I start implementing the
What is your opinion? |
My suggestion would be slightly different:
|
Great. I will work on that. |
I'll leave this just as a provocation: def add(x: int, y: int):
return x + y
def add(x: str, y: str):
return int(x) + int(y)
print(add(40, 2))
print(add("40", "2")) |
@cuducos , before merging I decided to try the solution and make some experiments and come with two concerns. Perhaps one is a potential issue.
Not sure if this could happen (is expected) to other nested columns. I will check it with the Tech lead of Fogo Cruzado.
What do you think? Here is the code snipped with both points:
|
I am dying to see a test with this in
It makes sense, but be aware I am not a heavy user of Pandas and GeoPandas… maybe flattening stuff might be optional? |
Don't worry, ASAP will be done.
But this is not a from crossfire import occurrences
from crossfire.clients.occurrences import flatten
occs = occurrences(id_state='813ca36b-91e3-4a18-b408-60b27a1942ef',
id_cities='5bd3bfe5-4989-4bc3-a646-fe77a876fce0',
initial_date='2018-04-01')#, format='geodf')
flattened_occs = flatten(occs)
# Traceback (most recent call last):
# File "/opt/pycharm-community-2022.3.2/plugins/python-ce/helpers/pydev/pydevconsole.py", line 364, in runcode
# coro = func()
# File "<input>", line 1, in <module>
# File "/home/felipe/repos/crossfire/crossfire/clients/occurrences.py", line 231, in flatten
# data = _flatten_list(data, nested_columns)
# File "/home/felipe/repos/crossfire/crossfire/clients/occurrences.py", line 210, in _flatten_list
# item.update({f"{key}_{k}": v for k, v in item.get(key).items()})
# AttributeError: 'NoneType' object has no attribute 'items'
Yes, What about this second point?
|
That is precisely why I said that.
|
@cuducos I think I din't get your point when you mention that. Could you explain? So, just to confirm my thought (and what I think I have learned):
|
Where is it documented? I think I made wrong assumptions, tried to check the docs and found nothing |
Ouch. It is not documented yet. |
@cuducos , documentation added. Please let me know if it is well written or if I missed something. |
Ok. I think this works but is suboptimal: occurrences('813ca36b-91e3-4a18-b408-60b27a1942ef', format='df') # giver nested fields
occurrences('813ca36b-91e3-4a18-b408-60b27a1942ef', format='df', flat=True) # gives flatten data
occurrences('813ca36b-91e3-4a18-b408-60b27a1942ef', format='df', flat=False) # giver nested fields I don't see a reason for us to expose the |
But if some one, for some reason, get the data without enabling Also, using its functionality in |
The way I said before.
From the Zen of Python:
|
OK, I have just added a test with a list of dicts with and without nested columns; Test is not passing, as I am interested to see how would you guide me on handling this situation. Let's discuss the potential solution in the PR #103 ,right?
Makes sense. But I am not convinced that the way you suggested is THE way of doing it. |
My point is that the function So, a user unfamiliar with the data, API, and package might do what you said. However, another unfamiliar one might try to use it with So, my point is:
But, still… you don't have to agree with me, and you don't have to design the APi the way I would : ) |
That is a really good point. The fact is that I thought about from crossfire import cities
c = cities(format = 'df')
c.state.head()
c.state.head()
#0 {'id': 'b112ffbe-17b3-4ad0-8f2a-2038745d1d14',...
#1 {'id': 'b112ffbe-17b3-4ad0-8f2a-2038745d1d14',...
#2 {'id': 'b112ffbe-17b3-4ad0-8f2a-2038745d1d14',...
#3 {'id': 'b112ffbe-17b3-4ad0-8f2a-2038745d1d14',...
#4 {'id': 'b112ffbe-17b3-4ad0-8f2a-2038745d1d14',...
#Name: state, dtype: object
Thanks, @cuducos I really apprecitate your comments. I always feel my code somehow messy. And I think this is not only because of the code structure. Perhaps it has relation with considering [allways] those worst-case scenarios;
Sure. But to take a good decision I need to confirm if my point of view is not wrong or even, if there is a better approach. Thanks for being patient. I will work on your comments on the PR #103 |
I was reading about the
occurrences
endpoint and playing around with the package when I relized that some columns are returned as a Python dictionary with adicional data related with the violence case.For example, alfter getting some data for Rio de Janeiro:
If I was interested on analyzing the main reasons related to the violence cases, I would need to "unpack" the
contextInfo
column and, then, themainReason
, transforming it to aPandas.Series
, joining them with the "original"DataFrame
, dopping the flattened column and renaming it tomainReason
to keep the meaning of the column:So I thought that perhaps woud be interesting a method that do that "automagically".
Beyond
contextInfo
,transpor
t,victims
andanimalVictims
could be candidates of this process.So I thought something like
occs.flatten()
to have all solumns with dict transfomed in separeted columns.Also, this potential method could be applied to a specific column, like
occs.flatten("contextInfo")
to flatten only thecontextInfo
column.And also something like
occs.flatten("contextInfo", flatten_all=True)
to have all nested dicts insidecontextInfo
flatenned in differents columns.What is your opinion, @cuducos ?
The text was updated successfully, but these errors were encountered: