Data attributes with multiple values / cardinality of attributes #18

mrckzgl · 2024-04-09T12:05:29Z

mrckzgl
Apr 9, 2024

Hey,

we try to use (py)jedAI for dirty ER. Alot of our data hereby has multiple values per attribute. E.g. a person has different name variants, different places they lived in, they authored different works, even different birth dates may be present (as sources differ), etc. It would make a huge difference qualitatively, if we could tell the jedAI algorithms to make use of that information. I wonder if jedAI supports such data?

The documentation uses pandas dataframes. In the examples there seems to be only always one value per attribute (as the dataframe is two dimensional, resp. each entity is represented by one row, and the attribute values are written in the columns of that row).

So my question is, do the algorithms inside the jedAI toolkit are able to somehow understand data attributes with a higher cardinality? And how to feed them such data? Or is there a subset of the algorithms which understand higher cardinality attributes?

One thing which comes to mind is to use the multiple rows per entity. So give them the same id, and different values, but of course the underlying algorithms would need to be implemented to understand that.

best

Nikoletos-K · 2024-04-10T07:54:40Z

Nikoletos-K
Apr 10, 2024
Maintainer

Hi there,
thanks for your interest in pyJedAI!

So if I understand correctly, there is no exact implementation for this need algorithmically. But one solution could be to transform your data into 2 dimensions. Imagine that our algorithms process entities as id -> set(concatenated attributes info). Hence if you add for example id, firstname_1, firstname_2, ... it will for sure work.

Cheers,
Konstantinos

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data attributes with multiple values / cardinality of attributes #18

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Data attributes with multiple values / cardinality of attributes #18

mrckzgl Apr 9, 2024

Replies: 1 comment

Nikoletos-K Apr 10, 2024 Maintainer

mrckzgl
Apr 9, 2024

Nikoletos-K
Apr 10, 2024
Maintainer