Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Create demo repository to test our scripts against? #3

Closed
penyuan opened this issue May 5, 2020 · 5 comments
Closed

Create demo repository to test our scripts against? #3

penyuan opened this issue May 5, 2020 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@penyuan
Copy link
Contributor

penyuan commented May 5, 2020

A major part of our efforts is to develop a robust set of Python 3 script for mining Github repositories.

I've been testing parts of our script (e.g. get_commits, etc.) on repositories such as Safecast/bGeigieNanoKit, but they have so many forks each with so many commits that downloading so much data takes up lots of time and storage space.

To make testing those scripts easier, what if we made a demo repository to test the scripts against? The demo repository would contain all cases that we might run into including commits, branches, merges, and forks (meaning there will be demo forks, too).

Alternatively, if we can identify an existing project that has all the cases but is also small, then we can use their repository.

@penyuan penyuan added the question Further information is requested label May 5, 2020
@chrvoigt
Copy link
Member

chrvoigt commented May 5, 2020

To make testing those scripts easier, what if we made a demo repository to test the scripts against?

... should that be repositories from the pilots? e.g. SODAQ's development of a soil sensor thingy ?

@penyuan
Copy link
Contributor Author

penyuan commented May 5, 2020

A SODAQ repository would be great if there is a good candidate. I've looked at the list of their repositories, but don't know if there is one that has the following situations that I'd like to test our scripts against, e.g.:

  1. Forks
  2. Forks of forks
  3. Different forks have their own branches and differing sets of commits and merges
  4. Not too many forks (maybe <=10) so that the mining script wouldn't take too long to run

@jbon
Copy link
Contributor

jbon commented May 5, 2020

That is a great idea. @penyuan would you set up the repo? I would then modify the script so they can either work in append mode (fetching info from files if they are already there) or overwrite mode.

@penyuan
Copy link
Contributor Author

penyuan commented May 7, 2020

Yes, I would be happy to set up the demo, or "reference", repository which we can test our scripts against. There are a few details I'd like to go over during our meeting tomorrow to help implement this.

In case we identify a real repository that has the characteristics we are interested in mining - SODAQ or otherwise - we can test our scripts against that in parallel.

@penyuan
Copy link
Contributor Author

penyuan commented Sep 24, 2020

The reference repository works, and we can still use it. But I think I've been short-sighted when creating it because wp2.2_dev is quickly becoming even more complex than the reference repository so we could just use it instead.

The only thing is that wp2.2_dev doesn't have forks and merged/unmerged pull requests yet. So we could still artificially create those if needed.

@penyuan penyuan closed this as completed Sep 24, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants