Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Pull license information from CrossRef #12

Open
Daniel-Mietchen opened this issue Feb 18, 2014 · 15 comments
Open

Pull license information from CrossRef #12

Daniel-Mietchen opened this issue Feb 18, 2014 · 15 comments

Comments

@Daniel-Mietchen
Copy link
Member

They plan to have the info available in (Northern) spring 2014.

@notconfusing notconfusing added this to the Integration with CrossRef milestone Mar 25, 2014
@notconfusing notconfusing modified the milestones: Phase 1 - Wikisource & Selected Articles, Integration with CrossRef May 5, 2014
@wrought
Copy link
Member

wrought commented May 8, 2014

Any progress on this data service?

@wrought
Copy link
Member

wrought commented May 20, 2014

Ping.

Moving to Phase 1B milestone.

@gbilder
Copy link

gbilder commented May 21, 2014

API call is

http://api.crossref.org/works/{doi}

So, for example:

http://api.crossref.org/works/10.1155/2013/530651

See license section of resulting JSON

See the API documentation for more details.

Report problems at the issue tracker

@Daniel-Mietchen
Copy link
Member Author

Thanks, Geoffrey.

@wrought
Copy link
Member

wrought commented Jul 30, 2014

Seems like Hindawi is fully compliant, others a long way off:
http://participation.labs.crossref.org/features/tdm

Another example with license info http://api.crossref.org/works/10.1155/2014/945364

"license":[{"content-version":"vor","delay-in-days":0,"start":{"date-parts":[[2014,1,1]],"timestamp":1388534400000},"URL":"http:\/\/creativecommons.org\/licenses\/by\/3.0\/"}]

@gbilder
Copy link

gbilder commented Jul 31, 2014

Hindawi is always pretty agile in adopting new features as they control all their own tech. Other publishers will start submitting this info soon. This hockey-stick pattern is typical for CrossRef initiatives as many publishers need to modify their third-party production systems before they can supply the data in bulk. I know most of the bigger publishers (Elsevier, Springer, PLOS, T&F) are working on this now. I am guessing that ~ half CrossRef metadata will have this info in the next 6-9 months.

@wrought
Copy link
Member

wrought commented Jul 31, 2014

@gbilder Good to hear, thanks for the update!

I think we're all coming to this realistically too--nothing will happen over night. Indeed there are many parties that have to change systems and process on their own respective schedules.

Curious to hear if you have any info on sources of scraped license data.

@gbilder
Copy link

gbilder commented Jul 31, 2014

Nothing specific- just that scraping is always fragile and error-prone. But I do it myself in the absence of an API or metadata, so I can hardly complain. I suppose I would generally advise that one try several passes- first through any existing APIs (.e.g CrossRef/DataCite), second through screen scraping, third through supervised screen scraping (allow human to confirm)- lastly through manual updating. I expect that relatively quickly, the last three techniques will become fallback exceptions. At least for formal scholarly articles.

@wrought
Copy link
Member

wrought commented Jul 31, 2014

Ah, indeed, I was thinking that at the very least scraped data could be helpful for naive verification of publisher-submitted license data. What's the hit rate? What is the relative coverage?

It's possible others have aggregated some of this data already, would be interesting to see.

@gbilder
Copy link

gbilder commented Jul 31, 2014

You can see coverage fairly easily:

compare:

http://api.crossref.org/members/98
http://api.crossref.org/members/78

@wrought
Copy link
Member

wrought commented Jul 31, 2014

Ah, cool, that is handy. However, I meant the relative coverage between the license data that is available (and discover-able) via public access (scrape) versus the coverage of submitted data from the publisher. Rather than the coverage relative between the submitted license data and the works registered with DOIs by those publishers.

Perhaps an exercise in futility, but it could give you a better idea of the range of articles for which there is currently no easily obtainable license information, and for which publisher-submissions would reveal new information, and the rate it changes over time.

@Daniel-Mietchen
Copy link
Member Author

@gbilder The link http://participation.labs.crossref.org/features/tdm provided above to track progress on providing license information does not seem to be persistent. Any pointers on how to get an update?

@gbilder
Copy link

gbilder commented Oct 6, 2014

Hmm. W had a server die and are still migrating over links.

You can get same data via API like this:

http://api.crossref.org/members/78/works?filter=has-license:true,has-full-text:true&rows=0

We will fix the link ASAP.

@Daniel-Mietchen
Copy link
Member Author

@gbilder any news on this? The API call is nice but not very handy to get an overview of the progress across publishers.

@Daniel-Mietchen
Copy link
Member Author

Just had a chat with @gbilder who pointed me to
https://github.com/CrossRef/rest-api-doc/blob/master/rest_api_tour.md ,
which explains the API in a very digestible fashion.

For instance,
http://api.crossref.org/licenses
provides an overview of licenses used,
whereas the number of articles available under
http://creativecommons.org/licenses/by/3.0/
can be gauged from
http://api.crossref.org/works?rows=0&filter=license.url:http://creativecommons.org/licenses/by/3.0/ ,
and
http://api.crossref.org/works?rows=100&filter=license.url:http://creativecommons.org/licenses/by/3.0/
provides the first 100 of these.

To check performance, see http://search.crossref.org/help/status .

pinging @wrought @notconfusing

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants