Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Error using parse_pubmed_xml #95

Open
sublimotion opened this issue Oct 22, 2020 · 5 comments
Open

Error using parse_pubmed_xml #95

sublimotion opened this issue Oct 22, 2020 · 5 comments

Comments

@sublimotion
Copy link

sublimotion commented Oct 22, 2020

It looks like the pubmed parser doesn't support the pubmed baseline files?

I get the error below. It also doesn't look like the test file is using a similar file format.

pubmed_dict = pp.parse_pubmed_xml('./data/pubmed20n1015.xml') # dictionary output
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-2b4cea8c6fb9> in <module>
----> 1 pubmed_dict = pp.parse_pubmed_xml('./data/pubmed20n1015.xml') # dictionary output

~/anaconda3/envs/python3/lib/python3.6/site-packages/pubmed_parser/pubmed_oa_parser.py in parse_pubmed_xml(path, include_path, nxml)
    155         journal = ""
    156 
--> 157     dict_article_meta = parse_article_meta(tree)
    158     pub_year_node = tree.find(".//pub-date/year")
    159     pub_year = pub_year_node.text if pub_year_node is not None else ""

~/anaconda3/envs/python3/lib/python3.6/site-packages/pubmed_parser/pubmed_oa_parser.py in parse_article_meta(tree)
     67     """
     68     article_meta = tree.find(".//article-meta")
---> 69     pmid_node = article_meta.find('article-id[@pub-id-type="pmid"]')
     70     pmc_node = article_meta.find('article-id[@pub-id-type="pmc"]')
     71     pub_id_node = article_meta.find('article-id[@pub-id-type="publisher-id"]')

AttributeError: 'NoneType' object has no attribute 'find'
@GanjinZero
Copy link

Same issue here.

@kmnis
Copy link

kmnis commented Nov 10, 2020

I think the format of XML files is changed since the time these scripts were written. The XML structure and the attributes in the current PubMed data are completely different from the way it's processed here. I had to write an XML parser from scratch for my project.

@titipata
Copy link
Owner

@mnis @GanjinZero @sublimotion thanks so much for the report! I haven't checked the script for a bit but would be great to check if the current script is in sync with the current MEDLINE baseline structure.

@raypereda-gr
Copy link
Contributor

I ran into the same issue as @sublimotion and @GanjinZero when running parse_pubmed_xml on an article downloaded last week, pubmed21n1298.xml.

dict_out = pp.parse_pubmed_xml('data/pubmed21n1298.xml') # errors

I can confirm parse_medline_xml() parses without errors and returns useful output. I believe the current script is in sync with the current MEDLINE baseline structure. I hope this helps @titipata.

dict_out = pp.parse_medline_xml('data/pubmed21n1298.xml')
pprint(dict_out[0])

OUTPUT:
{'abstract': 'BACKGROUND\n'
             'Drugs of abuse have a common property in mammals, which is their '
             'ability to facilitate the release of the neurotransmitter and '
             'neuromodulator dopamine in specific brain regions involved in '
             'reward and motivation. This increase in synaptic dopamine levels '
             'is believed to act as a positive reinforcer and to mediate some '
             'of the acute responses to drugs. The mechanisms by which '
             'dopamine regulates acute drug responses and addiction remain '
             'unknown.\n'
             '\n'
             '\n'
             'RESULTS\n'
             'We present evidence that dopamine plays a role in the responses '
             'of Drosophila to cocaine, nicotine or ethanol. We used a '
             'startle-induced negative geotaxis assay and a locomotor tracking '
             'system to measure the effect of psychostimulants on fly '
             'behavior. Using these assays, we show that acute responses to '
             'cocaine and nicotine are blunted by pharmacologically induced '
             'reductions in dopamine levels. Cocaine and nicotine showed a '
             'high degree of synergy in their effects, which is consistent '
             'with an action through convergent pathways. In addition, we '
             'found that dopamine is involved in the acute '
             'locomotor-activating effect, but not the sedating effect, of '
             'ethanol.\n'
             '\n'
             '\n'
             'CONCLUSIONS\n'
             'We show that in Drosophila, as in mammals, dopaminergic pathways '
             'play a role in modulating specific behavioral responses to '
             'cocaine, nicotine or ethanol. We therefore suggest that '
             'Drosophila can be used as a genetically tractable model system '
             'in which to study the mechanisms underlying behavioral responses '
             'to multiple drugs of abuse.',
 'affiliations': 'Department of Anesthesia, University of California San '
                 'Francisco, California 94143-0452, USA.',
 'authors': 'RJ Bainton;LT Tsai;CM Singh;MS Moore;WS Neckameyer;U Heberlein',
 'chemical_list': 'D000431:Ethanol; D009538:Nicotine; D003042:Cocaine; '
                  'D004298:Dopamine',
 'country': 'England',
 'delete': False,
 'doi': '10.1016/s0960-9822(00)00336-5',
 'issn_linking': '0960-9822',
 'journal': 'Current biology : CB',
 'keywords': '',
 'medline_ta': 'Curr Biol',
 'mesh_terms': 'D000818:Animals; D001522:Behavior, Animal; D003042:Cocaine; '
               'D004298:Dopamine; D004330:Drosophila; D000431:Ethanol; '
               'D008297:Male; D009538:Nicotine',
 'nlm_unique_id': '9107782',
 'other_id': '',
 'pmc': '',
 'pmid': '10704411',
 'pubdate': '2000',
 'publication_types': 'D016428:Journal Article; D013486:Research Support, U.S. '
                      "Gov't, Non-P.H.S.; D013487:Research Support, U.S. "
                      "Gov't, P.H.S.",
 'references': '',
 'title': 'Dopamine modulates acute responses to cocaine, nicotine and ethanol '
          'in Drosophila.'}

@titipata
Copy link
Owner

Thanks @raypereda-gr! Is that possible to make the PR with the same file with a new structure of MEDLINE database? I can also take look into it further and change the test file accordingly.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

5 participants