Skip to content

Add filter doi2cite #178

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 9 commits into from
Jun 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions doi2cite/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
DIFF ?= diff --strip-trailing-cr -u

test:
@pandoc --lua-filter=doi2cite.lua --wrap=preserve --output=output.md sample1.md
@$(DIFF) expected1.md output.md
@rm -f output.md

expected1.md: sample1.md doi2cite.lua
pandoc --lua-filter=doi2cite.lua --wrap=preserve --output $@ $<

expected1.pdf: sample1.md sample1.csl doi2cite.lua
pandoc --lua-filter=doi2cite.lua --filter=pandoc-crossref --citeproc --csl=sample1.csl --output $@ $<

expected2.md: sample2.md doi2cite.lua
pandoc --lua-filter=doi2cite.lua --wrap=preserve --output $@ $<

clean:
@rm -f expected1.md
@rm -f expected2.md
@rm -f expected1.pdf

.PHONY: test
74 changes: 74 additions & 0 deletions doi2cite/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# pandoc-doi2cite
This pandoc lua filiter helps users to insert references in a document
with using DOI(Digital Object Identifier) tags. With this filter, user
s do not need to make bibtex file by themselves. Instead, the filter
automatically generate bib file from the DOI tags, and convert the DOI
tags into citation keys available by --citeproc.

<img src="https://user-images.githubusercontent.com/30950088/121386635-209e2300-c985-11eb-8b1d-8d941e29d98d.png" width="960">

What the filter do are as follows:
1. Search citations with DOI tags in the document
2. Search corresponding bibtex data from `__from_DOI.bib` file
3. If not found, get bibtex data of the DOI from
http://api.crossref.org
4. Add reference data to `__from_DOI.bib` file
5. Check duplications of reference keys
6. Replace DOI tags to the correspoinding citation keys

# Prerequisites
- Pandoc version 2.0 or newer
- This filter does not need any external dependencies
- This filter should be executed before `pandoc-crossref` or
`--citeproc`

# DOI tags
Following DOI tags can be used:
- @https://doi.org/
- @doi.org/
- @DOI:
- @doi:

The first one (@https://doi.org/) may be the most useful because it is
same as the accessible URL.

# YAML header
The file **name** of the auto-generated bibliography file **MUST** be
`__from_DOI.bib`, but the **place** of the file can be changed (e.g.
`'./refs/__from_DOI.bib'` or `'refs\\__from_DOI.bib'` for Windows). Yo
u can designate the filepath in the document yaml header. The yaml key
is `bibliography`, which is also used by --citeproc.

# Example
example1.md:
```{.md}
---
bibliography:
- 'my_refs.bib'
- '__from_DOI.bib'
---

# Introduction
The Laemmli system is one of the most widely used gel systems for the
separation of proteins.[@LAEMMLI_1970] By the way, Einstein is genius.
[@https://doi.org/10.1002/andp.19053220607; @doi.org/10.1002/andp.1905
3220806; @doi:10.1002/andp.19053221004]
```

Example command 1 (.md -\> .md)

``` {.sh}
pandoc --lua-filter=doi2cite.lua --wrap=preserve \
-s example1.md -o expected1.md
```

Example command 2 (.md -\> .pdf with
[ACS](https://pubs.acs.org/journal/jacsat) style):

``` {.sh}
pandoc --lua-filter=doi2cite.lua --filter=pandoc-crossref --citeproc \
--csl=sample1.csl -s example1.md -o expected1.pdf
```

Example result
![expected1](https://user-images.githubusercontent.com/30950088/119964566-4d952200-bfe4-11eb-90d9-ed2366c639e8.png)
36 changes: 36 additions & 0 deletions doi2cite/__from_DOI.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
@article{Einstein_1905,
doi = {10.1002/andp.19053220607},
url = {https://doi.org/10.1002%2Fandp.19053220607},
year = 1905,
publisher = {Wiley},
volume = {322},
number = {6},
pages = {132--148},
author = {A. Einstein},
title = {Über einen die Erzeugung und Verwandlung des Lichtes betreffenden heuristischen Gesichtspunkt},
journal = {Annalen der Physik}
}
@article{Einstein_1905_10.1002/andp.19053220806,
doi = {10.1002/andp.19053220806},
url = {https://doi.org/10.1002%2Fandp.19053220806},
year = 1905,
publisher = {Wiley},
volume = {322},
number = {8},
pages = {549--560},
author = {A. Einstein},
title = {Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen},
journal = {Annalen der Physik}
}
@article{Einstein_1905_10.1002/andp.19053221004,
doi = {10.1002/andp.19053221004},
url = {https://doi.org/10.1002%2Fandp.19053221004},
year = 1905,
publisher = {Wiley},
volume = {322},
number = {10},
pages = {891--921},
author = {A. Einstein},
title = {Zur Elektrodynamik bewegter Körper},
journal = {Annalen der Physik}
}
252 changes: 252 additions & 0 deletions doi2cite/doi2cite.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
--------------------------------------------------------------------------------
-- Copyright © 2021 Takuro Hosomi
-- This library is free software; you can redistribute it and/or modify it
-- under the terms of the MIT license. See LICENSE for details.
--------------------------------------------------------------------------------


--------------------------------------------------------------------------------
-- Global variables --
--------------------------------------------------------------------------------
base_url = "http://api.crossref.org"
mailto = "pandoc.doi2cite@gmail.com"
bibname = "__from_DOI.bib"
key_list = {};
doi_key_map = {};
doi_entry_map = {};
error_strs = {};
error_strs["Resource not found."] = 404
error_strs["No acceptable resource available."] = 406
error_strs["<html><body><h1>503 Service Unavailable</h1>\n"
.."No server is available to handle this request.\n"
.."</body></html>"] = 503


--------------------------------------------------------------------------------
-- Pandoc Functions --
--------------------------------------------------------------------------------
-- Get bibliography filepath from yaml metadata
function Meta(m)
local bib_data = m.bibliography
local bibpaths = get_paths_from(bib_data)
bibpath = find_filepath(bibname, bibpaths)
bibpath = verify_path(bibpath)
local f = io.open(bibpath, "r")
if f then
entries_str = f:read('*all')
if entries_str then
doi_entry_map = get_doi_entry_map(entries_str)
doi_key_map = get_doi_key_map(entries_str)
for doi,key in pairs(doi_key_map) do
key_list[key] = true
end
end
f:close()
else
make_new_file(bibpath)
end
end

-- Get bibtex data of doi-based citation.id and make bibliography.
-- Then, replace "citation.id"
function Cite(c)
for _, citation in pairs(c.citations) do
local id = citation.id:gsub('%s+', ''):gsub('%%2F', '/')
if id:sub(1,16) == "https://doi.org/" then
doi = id:sub(17):lower()
elseif id:sub(1,8) == "doi.org/" then
doi = id:sub(9):lower()
elseif id:sub(1,4) == "DOI:" or id:sub(1,4) == "doi:" then
doi = id:sub(5):lower()
else
doi = nil
end
if doi then
if doi_key_map[doi] then
citation.id = doi_key_map[doi]
else
local entry_str = get_bibentry(doi)
if entry_str == nil or error_strs[entry_str] then
print("Failed to get ref from DOI: " .. doi)
else
entry_str = tex2raw(entry_str)
local entry_key = get_entrykey(entry_str)
if key_list[entry_key] then
entry_key = entry_key.."_"..doi
entry_str = replace_entrykey(entry_str, entry_key)
end
key_list[entry_key] = true
doi_key_map[doi] = entry_key
citation.id = entry_key
local f = io.open(bibpath, "a+")
if f then
f:write(entry_str .. "\n")
f:close()
else
error("Unable to open file: "..bibpath)
end
end
end
end
end
return c
end


--------------------------------------------------------------------------------
-- Common Functions --
--------------------------------------------------------------------------------
-- Get bib of DOI from http://api.crossref.org
function get_bibentry(doi)
local entry_str = doi_entry_map[doi]
if entry_str == nil then
print("Request DOI: " .. doi)
local url = base_url.."/works/"
..doi.."/transform/application/x-bibtex"
.."?mailto="..mailto
mt, entry_str = pandoc.mediabag.fetch(url)
end
return entry_str
end

-- Extract designated filepaths from 1 or 2 dimensional metadata
function get_paths_from(metadata)
local filepaths = {};
if metadata then
if metadata[1].text then
filepaths[metadata[1].text] = true
elseif type(metadata) == "table" then
for _, datum in pairs(metadata) do
if datum[1] then
if datum[1].text then
filepaths[datum[1].text] = true
end
end
end
end
end
return filepaths
end

-- Extract filename and dirname from a given a path
function split_path(filepath)
local delim = nil
local len = filepath:len()
local reversed = filepath:reverse()
if filepath:find("/") then
delim = "/"
elseif filepath:find([[\]]) then
delim = [[\]]
else
return {filename = filepath, dirname = nil}
end
local pos = reversed:find(delim)
local dirname = filepath:sub(1, len - pos)
local filename = reversed:sub(1, pos - 1):reverse()
return {filename = filename, dirname = dirname}
end

-- Find bibname in a given filepath list and return the filepath if found
function find_filepath(filename, filepaths)
for path, _ in pairs(filepaths) do
local filename = split_path(path)["filename"]
if filename == bibname then
return path
end
end
return nil
end

-- Make some TeX descriptions processable by citeproc
function tex2raw(string)
local symbols = {};
symbols["{\textendash}"] = "–"
symbols["{\textemdash}"] = "—"
symbols["{\textquoteright}"] = "’"
symbols["{\textquoteleft}"] = "‘"
for tex, raw in pairs(symbols) do
local string = string:gsub(tex, raw)
end
return string
end

-- get bibtex entry key from bibtex entry string
function get_entrykey(entry_string)
local key = entry_string:match('@%w+{(.-),') or ''
return key
end

-- get bibtex entry doi from bibtex entry string
function get_entrydoi(entry_string)
local doi = entry_string:match('doi%s*=%s*["{]*(.-)["}],?') or ''
return doi
end

-- Replace entry key of "entry_string" to newkey
function replace_entrykey(entry_string, newkey)
entry_string = entry_string:gsub('(@%w+{).-(,)', '%1'..newkey..'%2')
return entry_string
end

-- Make hashmap which key = DOI, value = bibtex entry string
function get_doi_entry_map(bibtex_string)
local entries = {};
for entry_str in bibtex_string:gmatch('@.-\n}\n') do
local doi = get_entrydoi(entry_str)
entries[doi] = entry_str
end
return entries
end

-- Make hashmap which key = DOI, value = bibtex key string
function get_doi_key_map(bibtex_string)
local keys = {};
for entry_str in bibtex_string:gmatch('@.-\n}\n') do
local doi = get_entrydoi(entry_str)
local key = get_entrykey(entry_str)
keys[doi] = key
end
return keys
end

-- function to make directories and files
function make_new_file(filepath)
if filepath then
print("doi2cite: creating "..filepath)
local dirname = split_path(filepath)["dirname"]
if dirname then
os.execute("mkdir "..dirname)
end
f = io.open(filepath, "w")
if f then
f:close()
else
error("Unable to make bibtex file: "..bibpath..".\n"
.."This error may come from the missing directory. \n"
)
end
end
end

-- Verify that the given filepath is correct.
-- Catch common Pandoc user mistakes about Windows-formatted filepath.
function verify_path(bibpath)
if bibpath == nil then
print("[WARNING] doi2cite: "
.."The given file path is incorrect or empty. "
.."In Windows-formatted filepath, Pandoc recognizes "
.."double backslash ("..[[\\]]..") as the delimiters."
)
return "__from_DOI.bib"
else
return bibpath
end
end

--------------------------------------------------------------------------------
-- The main function --
--------------------------------------------------------------------------------
return {
{ Meta = Meta },
{ Cite = Cite }
}
4 changes: 4 additions & 0 deletions doi2cite/expected1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Introduction

The Laemmli system is one of the most widely used gel systems for the separation of proteins.[@LAEMMLI_1970]
By the way, Einstein is genius.[@Einstein_1905; @Einstein_1905_10.1002/andp.19053220806; @Einstein_1905_10.1002/andp.19053221004]
Binary file added doi2cite/expected1.pdf
Binary file not shown.
3 changes: 3 additions & 0 deletions doi2cite/expected2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Introduction

People sometimes make mistakes.[@DOI:10.1002/THIS.IS.NOT.VALID.DOI.SAMPLE]
Loading