Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Minify crosslinks JSON #9

Closed
stscoundrel opened this issue Jan 22, 2023 · 3 comments · Fixed by #31
Closed

Minify crosslinks JSON #9

stscoundrel opened this issue Jan 22, 2023 · 3 comments · Fixed by #31
Assignees

Comments

@stscoundrel
Copy link
Owner

stscoundrel commented Jan 22, 2023

The script produces more crosslinks than expected, which is a positive issue to have. However, the produced json file is large, so it would do the planet some good to make it smaller. Currently around 7 MB (EDIT: now almost 9 MB with added content), which is a lot for additional meta info.

Possible avenues:

  • Just basic JSON minify -> one-lining it should save some space when theres 20K+ entries
  • Minify keys -> repeating "url" and "source" keys 20k+ times does take space. Could just be "a" and "b", moving the key parsing to TypeScript code.
  • Consider dropping urls from master data. They tend to be long and combined with the "source" key, they do not really add any extra information, just convenience. Another possible structure would be to just ship slug & sources list, which are predefined strings. The "truth" of urls could still be hosted in the NPM module, so individual dictionary websites don't have to keep up-to-date info about that.

Minifying keys does produce overhead to Node.js side, so do consider if it gives as much benefits here as it did in Old Swedish dictionary. There we had many more keys and almost all of them were longer than the ones we have here. First and last avenues may be the most beneficial ones without adding too much burden to processing time.

@stscoundrel stscoundrel self-assigned this Jan 22, 2023
@stscoundrel
Copy link
Owner Author

Just using json.Compact in Go might be good enough starting poit to see how much basic minification does.

@stscoundrel
Copy link
Owner Author

stscoundrel commented Feb 4, 2023

Then again, minifiying it in Go / Crosslinker step might make diffs harder when adding new entries. Maybe consider minify being either a step of its own or something done in NPM module before release.

Edit: and should we opt for minifying keys, this should be automated by this build step. Maybe just a script that parses humanreadable keys to minified ones & then minifies the whole thing for the NPM module. Would still keep readability & debuggability of the crosslinker produced json output.

@stscoundrel
Copy link
Owner Author

Preliminary test would point to simple line based minify only shaving couple of MB off from the 9 MB sourceset. So:

  • Use separate minifier script
  • Drop repeating parts of the urls
  • Move the repeated part (=base url) to something that NPM module knows. It can parse them back with the response if needed.
  • Test if minifying keys helps: shouldn't make huge difference, as keys are already very short (source, url).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant