Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Old text format importer #62

Open
zverok opened this issue Oct 21, 2017 · 2 comments
Open

Old text format importer #62

zverok opened this issue Oct 21, 2017 · 2 comments
Labels

Comments

@zverok
Copy link
Collaborator

zverok commented Oct 21, 2017

I am not sure how this format is properly called (investigate?), but it is pretty common for scientific and international standartization data. Example (official unicode tables, official timezones tables are also published in this format):

# Note: characters with PROSGEGRAMMENI are actually titlecase, not uppercase!

1F80; 1F80; 1F88; 1F08 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI
1F81; 1F81; 1F89; 1F09 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAMMENI
1F82; 1F82; 1F8A; 1F0A 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND YPOGEGRAMMENI
1F83; 1F83; 1F8B; 1F0B 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND VARIA AND YPOGEGRAMMENI
1F84; 1F84; 1F8C; 1F0C 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA AND YPOGEGRAMMENI
1F85; 1F85; 1F8D; 1F0D 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND OXIA AND YPOGEGRAMMENI

E.g. it is a bit like CSV with ; separator but:

  • # is comment;
  • spaces after/before separator are ignored;
  • empty lines are ignored.

It will be a nice showcase to have those "standard" data parsed out-of-the-box.

@athityakumar
Copy link
Member

I initially thought that just plainly using the CSV Importer with col_sep: '; ' option should be working. But, the Importer won't be able to ignore empty lines. After looking at one of the unicode tables, I think we'd also require this Importer support something like :start_row and :end_row (rather than :skiprows) to crop data in a better way.

@zverok
Copy link
Collaborator Author

zverok commented Oct 21, 2017

  1. It is NOT a work for :csv importer, because this format is not valid CSV.
  2. It does NOT need "skiprows" option, it needs to ignore comments (comments could be in between lines, not only at the beginning of the file, and also at the end of line with data).

I believe that :plaintext importer initially meant to be handler for this format, just not finished. So, let's probably enchance it?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants