Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: add tmx support and change output #9

Merged
merged 1 commit into from
Mar 11, 2023

Conversation

SethFalco
Copy link
Member

This makes significant changes to the arguments and output of the project.

Adds TMX Support

There are a few benefits to this.

  • This format is designed for programmatically handling translation alignment.
  • It's one of the formats explicitly mentioned on OPUS.

File per Language Combo

Before I'd only output a single file that contained the English to Xyz for every language. This has a few issues.

  • Having a single file means it's not possible to download data for only specific languages. As our dataset grows, the single file might get excessively bloated.
  • We need more than just English to French for example, we want to output all the implicit alignments as well, i.e. if there's an English to French, and English to German, we can derive French to German, this should be in the exported dataset.

Related to the change above, the --output argument no longer takes a file. Instead, it takes a directory which it then populated with all generated files. This means users no longer have control over output file names, but this also means we don't have to care to assume guess file formats anymore which is nice imo.

Change XML Library

Before we used fast-xml-parser but I don't see a clear way to set attributes or namespaces etc with it. I've opted to switch to xmlbuilder2 which is very well documented and makes building the TMX file very apparent.

@SethFalco SethFalco force-pushed the revamp-tmx-support branch from 39007ad to e897615 Compare March 11, 2023 16:47
@SethFalco SethFalco merged commit 66a5f5b into tldr-pages:main Mar 11, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant