This repository has been archived by the owner on Nov 16, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 38
Harvest more email addresses from GitHub #1226
Comments
|
https://github.com/gratipay/logs/commit/8dc857ccb520e0cda5444da102b1d2a22f0daf4d, main script
So that's 592 accounts for which we were able to harvest a previously unseen email address from public GitHub commits, 296 of which yielded one new address, 126 of which yielded two, etc. |
Now to manually review those ... |
#!/usr/bin/env python
import csv
from collections import defaultdict
import lib
# Load harvested.csv
harvested = defaultdict(list)
for _, username, _, _, _, address, _ in csv.reader(open('harvested.csv')):
if address.endswith('gmail.com'):
harvested[username].insert(0, address)
else:
harvested[username].append(address)
# Add to payouts.csv
payouts_header, payouts = lib.load_payouts()
for row in payouts:
username = row[4]
if username not in harvested:
continue
status = row[2]
print(status)
addresses = [a for a in lib.get_addresses(row) if a]
addresses += harvested[username]
addresses += ([''] * (4-len(addresses)))
assert len(addresses) == 4, addresses
row[5:] = addresses
csv.writer(open('payouts.csv', 'w+')).writerows([payouts_header] + payouts) |
# for free
to subscribe to this conversation on GitHub.
Already have an account?
#.
Reticketed from #1205.
The idea is to harvest email addresses from commit messages. You end up with a bunch of emails that aren't the person you're after (merging a PR means someone else's email shows up in your commit events). But it's a start! The manual work would be a slog but maybe worth it?
See
harvest-email-from-github.py
for a starting point.The text was updated successfully, but these errors were encountered: