Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add set key to data.table operations. #90

Open
ricardonovaes opened this issue Jul 25, 2024 · 0 comments
Open

Add set key to data.table operations. #90

ricardonovaes opened this issue Jul 25, 2024 · 0 comments

Comments

@ricardonovaes
Copy link

setkey make joins extremely faster in data.tables, the codes over join benchmark are not setting the keys properly and can affect the main results.

It is also important in other kinds of data manipulation such as deduce. for instance:
setkey(DT, key)
unique(DT, by = 'key')

is very much faster than
unique(DT, by 'key')

This can go from 15 minutes to seconds for 100GB+ datasets

Joins work the same way:

setkey(DTA, key)
setkey(DTB, key)

DTA[DTB, on = .(key)]

I hope it can make the benchmar better!!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant