Skip to content

Latest commit

 

History

History
150 lines (120 loc) · 4.85 KB

notes.md

File metadata and controls

150 lines (120 loc) · 4.85 KB

Notes

TODO

  • PDF extracts
  • DOCX extracts
  • Text extracts
  • RTF extracts
  • Implement lazy validation of DOI via doi.org APIs
  • Address partial DOI extracts from PDF split text fields
  • Consider DOI validation to improve accurate extraction

Live validation of DOI

Valid: https://doi.org/api/handles/10.1177/0020720920940575

{
  "responseCode": 1,
  "handle": "10.1177/0020720920940575",
  "values": [
    {
      "index": 1,
      "type": "URL",
      "data": {
        "format": "string",
        "value": "http://journals.sagepub.com/doi/10.1177/0020720920940575"
      },
      "ttl": 86400,
      "timestamp": "2020-07-29T04:52:30Z"
    },
    {
      "index": 700050,
      "type": "700050",
      "data": {
        "format": "string",
        "value": "2020072821515400292"
      },
      "ttl": 86400,
      "timestamp": "2020-07-29T04:52:30Z"
    },
    {
      "index": 100,
      "type": "HS_ADMIN",
      "data": {
        "format": "admin",
        "value": {
          "handle": "0.na/10.1177",
          "index": 200,
          "permissions": "111111110010"
        }
      },
      "ttl": 86400,
      "timestamp": "2020-07-29T04:52:30Z"
    }
  ]
}

Or narrow down requested "type" but that's probably not important

Valid, URL only: https://doi.org/api/handles/10.1177/0020720920940575?type=URL

{
  "responseCode": 1,
  "handle": "10.1177/0020720920940575",
  "values": [
    {
      "index": 1,
      "type": "URL",
      "data": {
        "format": "string",
        "value": "http://journals.sagepub.com/doi/10.1177/0020720920940575"
      },
      "ttl": 86400,
      "timestamp": "2020-07-29T04:52:30Z"
    }
  ]
}

Probably all that matters is we don't get the invalid response.

Invalid: https://doi.org/api/handles/10.1177/5555555555555555

{
  "responseCode": 100,
  "handle": "10.1177/5555555555555555"
}

https://www.doi.org/the-identifier/resources/factsheets/doi-resolution-documentation

Response Codes

1 : Success. (HTTP 200 OK)
2 : Error. Something unexpected went wrong during handle resolution. (HTTP 500 Internal Server Error)
100 : Handle Not Found. (HTTP 404 Not Found)
200 : Values Not Found. The handle exists but has no values (or no values according to the types and indices specified). (HTTP 200 OK)

Formats

Reporting

Do we want to give...

  • Page number (pdf/word)?
  • Line number (text)?
  • Stringed context with highlighting?
  • Retraction watch records: for each entry
    • Sort by date
    • RetractionNature
    • RetractionDate -- change from datetime
    • RetractionDOI
    • RetractionReasons?

Demo

  • Server in separate repository (in progress)
  • Binder/colab notebook for easy Python sampling

API

We could alternatively hit the Crossref API with DOIs in hand

Discussion of citation practice & policy

COPE Case number 15-17, "Citing a retracted paper," https://publicationethics.org/case/citing-retracted-paper:

They are presumably asking whether a paper citing a retracted paper is to be considered sound? We think this should be a question for the peer reviewers. Perhaps, as a responsible editor, they should point out to the reviewers that one of the references has been retracted. The reviewers could then decide whether this was a key reference supporting the crux of the current paper or whether it was merely something that could be deleted or replaced with something more suitable.

On the somewhat more philosophical question of whether a retracted paper should ever be cited, there may be legitimate cases where one would want to cite a retracted article. It comes down to why you cite something; as a way of noting something that happened previously, would be fine. If writing a paper about retractions, for example, one might quite reasonably want to cite some key retracted papers to illustrate the issues involved. However, it is very important to mark the paper as retracted in the reference section so this is clearly marked for readers (eg, Author AB, et al. RETRACTED: Title of article. Journal name. 2015, 100: 1-7.)

Analogous use of RW database at Wikipedia

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2024-06-08/Special_report