Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

twarc2 timeline --no-context-annotations not pulling 500 tweets #687

Open
mr-devs opened this issue Feb 7, 2023 · 2 comments
Open

twarc2 timeline --no-context-annotations not pulling 500 tweets #687

mr-devs opened this issue Feb 7, 2023 · 2 comments
Labels
bug cli Issues with command line interface

Comments

@mr-devs
Copy link

mr-devs commented Feb 7, 2023

I am running the following command with twarc version 2.13.0 (with academic level access):

# Pulls my own timeline
twarc2 timeline --no-context-annotations 1312850357555539972 test.json

Based on running

twarc2 timeline --help

setting --no-context-annotations "makes --max-results 500 the default." Unfortunately, I can see in the twarc.log output that max-results paramater is still equal to 100. Below is a screenshot of the entire process (aborted after a few calls).

image

I think based on the code here it looks like this is only true if utilizing the full archive method.

That said, I tested using the --use-search flag as well, which doesn't seem to correct the issue. See screenshot below.

image

I think that perhaps the message just needs to be updated as it looks like the 500 option is no longer an option (based on Twitter API reference).

Thoughts?

@igorbrigadir
Copy link
Contributor

Ha! I literally just noticed the same "bug" yes.

So, --no-context-annotations is a shortcut to remove context_annotations from tweet.fields. This is something that causes the search endpoint to be limited to 100 results per page, which is slow due to the 1 request per second limit in academic access.

twarc2 timeline command uses the timeline API endpoint, which likewise has a 100 tweet per page limit, but can NOT have a 500 per page with or without context annotations.

twarc2 timeline --use-search will use the search API instead, to get around the last 3200 tweets limit of the timelines API, if you have academic access. However - it seems like --no-context-annotations doesn't seem to work here either, which is the actual bug i think needs fixing.

The temporary workaround for this is to use the search command explicitly, as these two should end up equivalent:

If you had:

twarc2 timeline --use-search --no-context-annotations 1312850357555539972 results.jsonl

run this instead:

twarc2 search --archive --no-context-annotations "from:1312850357555539972" results.jsonl

@igorbrigadir igorbrigadir added bug cli Issues with command line interface labels Feb 7, 2023
@mr-devs
Copy link
Author

mr-devs commented Feb 7, 2023

This makes perfect sense. Thanks for the workaround!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug cli Issues with command line interface
Projects
None yet
Development

No branches or pull requests

2 participants