Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Text extraction from PDF to annotation not working #6011

Closed
funnym0nk3y opened this issue Feb 24, 2020 · 5 comments
Closed

Text extraction from PDF to annotation not working #6011

funnym0nk3y opened this issue Feb 24, 2020 · 5 comments
Labels
status: stale Issues marked by a bot as "stale". All issues need to be investigated manually. status: waiting-for-feedback The submitter or other users need to provide more information about the issue

Comments

@funnym0nk3y
Copy link

JabRef version:
JabRef 5.0-beta.438--2020-02-20--5fa1dcf
Windows 10 10.0 amd64
Java 13.0.2

Hi there,

when using JabRef I found a bug in the annotations function. I marked several passages in my pdf-file and they were shown in the annotations tab accordingly. But unfortunately the text extracted from those annotations did not match the marked text. I tried copying the text directly from Adobe Reader DC via right click > copy text and pasting it to notepad, which works fine. There are characters missing within the whole marked area. Also there are passages included which are not marked.

Example:
While these results are excellent from an academic perspective, scaling up the overallcell device area is crucial for achieving practical utility for hybrid perovskite based thin-film solar cells.In this paper, we present a comprehensive study on the use of temperature-controlled doctor bladingtechnique for the growth of large island, crystalline perovskite thin-films. Specifically, we elucidate thephysical conditions such as substrate temperature, solution volume, and blade speed under ambientconditions that control the growth of large area perovskite thin-films with desired island size, thickness,uniformity and crystallinity. Using these doctor-bladed thin-films we fabricated devices of ∼1 cm2areain air that yielded an average efficiency of 7.32% with negligible hysteresis in the current-voltage scans.Further improvements in

In JabRef displayed:
ng up the overall rea is crucial for achieving practical utility for hybrid perovskite based thin-film solar cells.
r, we present a comprehensive study on the use of temperature-controlled doctor blading r the growth of large island

When exported via Acrobat Reader:
Specifically, we elucidate the
physical conditions such as substrate temperature, solution volume, and blade speed under ambient
conditions that control the growth of large area perovskite thin-films with desired island size, thickness,
uniformity and crystallinity

Regards,
funnym0nk3y

@Siedlerchr
Copy link
Member

Could you maybe attach or give us a link to the pdf? Is this the only pdf where you noticed that or do experience that with others as well?

@Siedlerchr Siedlerchr added the status: waiting-for-feedback The submitter or other users need to provide more information about the issue label Feb 25, 2020
@funnym0nk3y
Copy link
Author

I experienced this behavior with several PDFs.
Unfortunately I can't attach the PDF here because of copyright protection. But you can find the files here:
https://pubs.acs.org/doi/10.1021/la803646e
https://www.sciencedirect.com/science/article/pii/S2352940716300038?via%3Dihub
If you don't have access through your institution I'll try to find some with OpenAccess.

Besides that I noticed that there are difficulties with some characters like µ,° and sub-/superscript or stuff like 4,23 x 10^2. But that is just an annoyance.

@smihael
Copy link

smihael commented Mar 28, 2020

I can reproduce this with the following test.pdf file in a test.bib library.

I am using:
JabRef 5.0--2020-03-06--2e6f433
Linux 5.3.0-42-generic amd64
Java 13.0.2

PDF was generated using LibreOffice and annotations were added in Okular.

@github-actions
Copy link
Contributor

This issue will be closed in 7 days due to inactivity 💤 Please provide the requested information if the problem persists.

@github-actions github-actions bot added the status: stale Issues marked by a bot as "stale". All issues need to be investigated manually. label Apr 28, 2020
@funnym0nk3y
Copy link
Author

@smihael provided an example pdf, please update status

@github-actions github-actions bot closed this as completed May 6, 2020
koppor pushed a commit that referenced this issue Jul 1, 2022
3d3573c Update centre-de-recherche-sur-les-civilisations-de-l-asie-orientale.csl (#5988)
5de0fbe Update society-of-biblical-literature-fullnote-bibliography.csl (#5913)
04b6c7a Create revue-internationale-durbanisme.csl (#5974)
4a5bfe2 Update biological-reviews.csl (#6116)
957b2bc Update harvard-cite-them-right-no-et-al.csl (#6115)
e836a6c Update harvard-university-of-bath.csl (#6011)
b4a8dd7 Update and rename harvard-cite-them-right.csl to harvard-cite-them-ri… (#6113)
a198884 Update twentieth-century-music.csl (#6110)
81c1619 Update archaeonautica.csl (#5928)
fc46f1d Bump actions/cache from 2 to 3 (#6112)
fab57ed Bump actions/checkout from 2 to 3 (#6111)
519d594 [don't merge] chore: Included githubactions in the dependabot config (#6109)
a8aa898 Update universidade-estadual-de-alagoas-uneal-abnt.csl (#5915)
6191640 Update isnad-dipnotlu.csl (#5909)
d65a6ac Update isnad-metinici.csl (#5910)
830d337 Update technische-universitat-dresden-linguistik.csl (#6097)
81adc43 Update american-society-for-horticultural-science.csl (#6089)
b767623 Create south-african-law-journal.csl (#6092)
215e1e9 Create journal-of-lithic-studies.csl (#6080)
0740f8c Create eunomia-revista-en-cultura-de-la-legalidad.csl (#6095)
f93c809 Create endocrine-journal.csl (#6086)
3fdeb51 Revert "chore: Set permissions for GitHub actions (#6096)" (#6108)
35ebd1e chore: Set permissions for GitHub actions (#6096)
1cb8758 Create journal-fur-medienlinguistik (#6100)
f4b5f7f Update unified-style-sheet-for-linguistics.csl (#6098)
c3f856a Update advanced-materials.csl (#6103)
d1e7576 Bump diffy from 3.4.0 to 3.4.2 (#6107)
9e5e7ab Fix Dev Dynamics (#6099)
7234520 Add CSL style for the journal Developmental Dynamics (#6093)
ba8db05 Create independent style for vox-sanguinis.csl (#6085)
845dee0 Create meta.csl (#6088)
684bc3a Update universite-du-quebec-a-montreal.csl (#6087)
3602c18 Up-date & re-title pour-reussir/dionne (#6043)
0cc6e82 Fix Mainz Geography
cfc4cec Add DOI and fix printing author names in Population and Économie et statistique (#6079)
14e8b1d Update journal-of-neuroimaging.csl (#6084)
2c0e1f1 Update isnad-dipnotlu.csl (#6081)
02fdb9b Merge pull request #6082 from denismaier/patch-ube-muwi-note
9309378 removes default-locale

git-subtree-dir: buildres/csl/csl-styles
git-subtree-split: 3d3573c
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
status: stale Issues marked by a bot as "stale". All issues need to be investigated manually. status: waiting-for-feedback The submitter or other users need to provide more information about the issue
Projects
None yet
Development

No branches or pull requests

3 participants