Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Sorting by citations #3

Open
dgallichan opened this issue May 15, 2019 · 15 comments
Open

Sorting by citations #3

dgallichan opened this issue May 15, 2019 · 15 comments

Comments

@dgallichan
Copy link
Member

dgallichan commented May 15, 2019

Currently we are using the number of citations that Semantic Scholar finds on the main paper associated with the software to allow sorting by a proxy metric related to 'impact' of the software. We realise that this is not a perfect solution, not least because not everyone who cites a paper uses the software - and not everyone who uses software has cited the paper.

For now this seems like a reasonable solution - but feel free to use this space to discuss any issues that arise due to this choice, along with any suggestions you might have for how to improve it.

Note that Semantic Scholar was chosen because of its free-to-use API. We probably can't use Google Scholar as they seem to regularly change their site to block scraping attempts, and Scopus and WoS both have APIs - but at a premium. It seems that CrossRef might be a viable alternative - but I haven't had the time to read more to see if it offers something beyond what now seems to work with the Semantic Scholar option.

@uecker
Copy link
Member

uecker commented Aug 18, 2020

A good idea would be to add a randomize option.

@dgallichan
Copy link
Member Author

Thanks @uecker for the suggestion - I spent a little time trying to work out how this could be implemented, but my web-coding knowledge wasn't sufficient to get it work! If anyone wants to implement this, please go ahead :)

@mmuckley
Copy link
Contributor

mmuckley commented Nov 1, 2021

Hello @dgallichan et al. - this seems like a great initiative. I have some feedback. I don't have any expectations for how the feedback is used - feel free to use or ignore as you wish :).

In terms of citation sorting, I think it would be really good or even necessary to use something besides Semantic Scholar. The main reason is Semantic Scholar is doing a poor job of indexing ISMRM abstracts from the main conference and workshops. One option would be for the ISMRM to work on getting its proceedings indexed, but if that doesn't happen I think it might be necessary to move away from Semantic Scholar as many people are publishing their packages at ISMRM. The current status of MR Hub leaves a great community project like SigPy at the very bottom of the list.

In the meantime as long as MR Hub is sticking with Semantic Scholar I would change the default sorting mechanism. Citations is pretty good, but at the moment many packages are linked to research papers rather than software papers, which just reinforces an author's scientific contributions rather than their software contributions. There are many ways to highlight scientific contributions. We have Google Scholar pages, research awards, prestigious positions, etc. MR Hub is a little more unique in that it can showcase software work that isn't naturally promoted as much.

I think a better option vs. the status quo would be to sort by most recent software update. This would also have the added benefit of highlighting projects that are actively being updated and maintained. Also, it would help promote new projects that might benefit the most from promotion.

Disclaimer: I am looking to PR my project, torchkbnufft, for which the associated paper was at the 2020 ISMRM Sedona Workshop.

@uecker
Copy link
Member

uecker commented Nov 2, 2021

I agree, sorting by last update would also be good, but may be difficult to automate. I think we need somebody to implement it...

@mmuckley
Copy link
Contributor

mmuckley commented Nov 2, 2021

According to the README there is currently a mechanism for querying BitBucket and GitHub for the last update. A commit seems a reasonable surrogate. Were you thinking to use releases for the date? Or is the concern about software not kept on GitHub/BitBucket?

@uecker
Copy link
Member

uecker commented Nov 2, 2021

I was thinking about software not on GitHub/BitBucket, e.g. a repository maintained by some institution. But maybe those could also be polled automatically.

@notZaki
Copy link
Contributor

notZaki commented Nov 2, 2021

For reference, the default sorting option is defined here:

featureList.sort('ncitations', { order: "desc" });

If it is decided that the default sort should be by the most recent commit, then the 'ncitations' part can be replaced with 'dateupdated'.

@uecker
Copy link
Member

uecker commented Nov 2, 2021

I am not sure what this does. For BART is says 2021-07-07 but the latest release was in March and the latest public commit a couple of days ago. Maybe this is the random number I asked for....

@notZaki
Copy link
Contributor

notZaki commented Nov 2, 2021

That might be because the update script was last ran ~3 months ago, so the project info could be out of date.
A github action can be set up to automatically run the update script every day, but that's likely a separate issue.

@mmuckley
Copy link
Contributor

mmuckley commented Nov 3, 2021

@dgallichan
Copy link
Member Author

My impression is that if a Github action could be set to run, say, once a week, this would be a nice solution. I think the main problems in getting it working would be making sure you don't exceed the daily API queries without logging in (although I may already be out of date on this, as these kinds of limits have a habit of changing as well...)

@notZaki
Copy link
Contributor

notZaki commented Nov 3, 2021

Each instance of github actions should have 60 API queries per hour. If that becomes a bottleneck, then it should be possible to use the builtin GITHUB_TOKEN to make authenticated requests which have a limit of 1,000 queries per hour.

@mmuckley
Copy link
Contributor

mmuckley commented Nov 3, 2021

@dgallichan I will try to test a draft of an Action on my fork now that my PR is merged. I'll open a PR if everything works.

@mmuckley
Copy link
Contributor

mmuckley commented Nov 3, 2021

Sorry - I see it is already merged!

@dgallichan
Copy link
Member Author

So the default sorting option is now 'last update' - I think there are only a few repositories that don't use Github or Bitbucket, so it's mostly pretty good (we could do with adding API querying for Gitlab as well though, but again, hardly any packages affected at the moment). For those hosting themselves, then I guess the onus is on them to submit a PR to the MRHub whenever they want to manually update the date for their package.

Thanks so much notZaki for the Github Action - it ran successfully this lunchtime, and is definitely a good way to keep the MRHub 'fresh'! :)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants