Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

character maps to <undefined> #5

Closed
willturn001 opened this issue Mar 30, 2022 · 6 comments
Closed

character maps to <undefined> #5

willturn001 opened this issue Mar 30, 2022 · 6 comments

Comments

@willturn001
Copy link

Hi,
I am trying to use the plugin in the command line, and I'm not sure what the fix is for this error:

Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python310\Scripts\twarc2.exe\__main__.py", line 7, in <module>
  File "C:\Users\osint\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\osint\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "C:\Users\osint\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\osint\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\osint\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\osint\AppData\Roaming\Python\Python310\site-packages\click\decorators.py", line 38, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "C:\Users\osint\AppData\Roaming\Python\Python310\site-packages\twarc_timeline_archive.py", line 27, in timeline_archive
    click.echo(f'\U0001f31f  fetching timeline for {line} since {since_id}')
  File "C:\Users\osint\AppData\Roaming\Python\Python310\site-packages\click\utils.py", line 298, in echo
    file.write(out)  # type: ignore
  File "C:\Program Files\Python310\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f31f' in position 0: character maps to <undefined>
@igorbrigadir
Copy link
Contributor

What exact commands did you run to get this? This looks like a file you are reading was maybe not in utf8, so the error could be with how that input file was saved.

@willturn001
Copy link
Author

twarc2 timeline-archive Users.txt /UserTweets

Users.txt is in utf8

@igorbrigadir
Copy link
Contributor

Can you attach the Users.txt file here please? I'll dig a bit more

@willturn001
Copy link
Author

Here it is - I have tried different files, with different users, and I keep encountering the same issue.
Users.txt

@igorbrigadir
Copy link
Contributor

Alright, i think i figured it out - the issue is with the file having a BOM (Byte Order Mark), so this breaks the input for twarc. The file is UTF8, but it's UTF8 with a BOM character at the start. This usually happens in windows when you save stuff from the command line, or from other ways of saving stuff - how you get rid of the BOM depends on the way you created the file.

Screenshot from 2022-03-30 19-25-09

Screenshot from 2022-03-30 19-25-24

Ideally we should check in twarc for this and either strip this out or show a better warning.

The fix is to re-save the file as UTF8, without a BOM, like this one:
Users_fixed.txt

@willturn001
Copy link
Author

Yes that solved the problem. Thanks

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants