Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Speed comparison with DataPusher #25

Closed
davidread opened this issue Nov 10, 2017 · 0 comments
Closed

Speed comparison with DataPusher #25

davidread opened this issue Nov 10, 2017 · 0 comments

Comments

@davidread
Copy link
Contributor

davidread commented Nov 10, 2017

Summary

Express Loader loads the data in 11.4 times the speed compared with DataPusher

Test conditions:

  • Load of Boston 311 dataset (1033882 rows, 475MB)
  • Run locally on a MacBook Pro (i7, 2013 model)

stats with ckanext-xloader

12s - retrieve the file (over HTTP) from local FileStore
23s - convert to UTF8
21s - copy CSV file into PostgreSQL table (one COPY command)
160s - create search index

Total: 206 seconds

At this point the full data is made available to the user.

Afterwards the column indexes are generated which simply speed up common queries - this takes a further 1262s. However we exclude this from the load time, as it is merely an optimization.

stats with datapusher

12s - retrieve the file (over HTTP) from local FileStore
2338s - convert to UTF8 and then to JSON, setup postgres indexes to be generated during load, load JSON into table (4000 INSERT statements).

Total: 2350s

amercader pushed a commit that referenced this issue Nov 30, 2022
[QOL-7596] use 'six' instead of assuming features will exist
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant