Replies: 1 comment
-
Hi there, So if I understand correctly, there is no exact implementation for this need algorithmically. But one solution could be to transform your data into 2 dimensions. Imagine that our algorithms process entities as id -> set(concatenated attributes info). Hence if you add for example Cheers, |
Beta Was this translation helpful? Give feedback.
0 replies
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
-
Hey,
we try to use (py)jedAI for dirty ER. Alot of our data hereby has multiple values per attribute. E.g. a person has different name variants, different places they lived in, they authored different works, even different birth dates may be present (as sources differ), etc. It would make a huge difference qualitatively, if we could tell the jedAI algorithms to make use of that information. I wonder if jedAI supports such data?
The documentation uses pandas dataframes. In the examples there seems to be only always one value per attribute (as the dataframe is two dimensional, resp. each entity is represented by one row, and the attribute values are written in the columns of that row).
So my question is, do the algorithms inside the jedAI toolkit are able to somehow understand data attributes with a higher cardinality? And how to feed them such data? Or is there a subset of the algorithms which understand higher cardinality attributes?
One thing which comes to mind is to use the multiple rows per entity. So give them the same id, and different values, but of course the underlying algorithms would need to be implemented to understand that.
best
Beta Was this translation helpful? Give feedback.
All reactions