Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

close the driver at the end of get_users_follow #134

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

NuLL3rr0r
Copy link

@NuLL3rr0r NuLL3rr0r commented Jun 20, 2022

The driver's window remains open after calls to either get_users_following or get_users_followers. So, this simple fix closes the driver after it's done scraping the users' following/followers.

Why this is troublesome? I run a script like this in order to scrape hundreds of accounts following/followers. Since it won't clean up, it ends up opening plenty of windows and at some point, due to high memory usage, the OS kills the script.

Here is an example to reproduce:

#!/usr/bin/env python3

from Scweet.scweet import scrape
from Scweet.user import get_user_information, get_users_following, get_users_followers

import errno
import multiprocessing
import os
import os.path

out_directory="/home/mamadou/fwsync/"

users = [
            'User0',
            'User1',
            '...',
            'User999'
        ]

env_path = ".env"

def mkdir_p(path):
    try:
        os.makedirs(path, 0o775)
    except OSError as exc:  # Python ≥ 2.5
        if exc.errno == errno.EEXIST and os.path.isdir(path):
            pass
        # possibly handle other errno cases here, otherwise finally:
        else:
            raise

def scrape_following(user):
    try:
        following = get_users_following(users=[ user ], env=env_path, verbose=0, headless=False, wait=2, file_path=out_directory)
    except Exception as ex:
        pass

def scrape_followers(user):
    try:
        followers = get_users_followers(users=[ user ], env=env_path, verbose=0, headless=False, wait=2, file_path=out_directory)
    except Exception as ex:
        pass

def scrape(user):
    if not os.path.isfile(f"{out_directory}{user}_{user}_following.json"):
        scrape_following(user)
    else:
        print(f"'{user}' following file already exists! Skipping...")

    if not os.path.isfile(f"{out_directory}{user}_{user}_followers.json"):
        scrape_followers(user)
    else:
        print(f"'{user}' followers file already exists! Skipping...")

def main():
    mkdir_p(out_directory)

    #threads_count=multiprocessing.cpu_count()
    threads_count=4
    p = multiprocessing.Pool(processes=threads_count)

    result = p.map(scrape, users)


if __name__ == '__main__':
    main()

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant