Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Bug report - XML citations from website #127

Open
chungimungi opened this issue Oct 11, 2023 · 1 comment
Open

Bug report - XML citations from website #127

chungimungi opened this issue Oct 11, 2023 · 1 comment
Labels

Comments

@chungimungi
Copy link

chungimungi commented Oct 11, 2023

Error In
Parse Outgoing XML citations from website

for a lot of the PMIDs this error is shown

image

import csv
import multiprocessing
import pubmed_parser as pp

def write_to_file(f, pmid, result):
    try:
        if isinstance(result, dict) and "pmid_cited" in result:
            f.write(f'## PMID : {pmid}\n')
            f.write(f'PMID CITED : {result["pmid_cited"]}\n')
            # You can add more information from `result` here if needed
        else:
            f.write(f'Error processing PMID {pmid}: Invalid result format\n')
    except Exception as e:
        f.write(f'Error processing PMID {pmid}: {str(e)}\n')

def process_pmid(pmid):
    try:
        return pp.parse_outgoing_citation_web(pmid, id_type='PMID')
    except Exception as e:
        return f'Error processing PMID {pmid}: {str(e)}'

if __name__ == '__main__':
    # Output Markdown file
    output_file = 'out1.md'

    # Open the output file for writing
    with open(output_file, 'w') as f:
        # Write Markdown headers or other content here if needed
        f.write("# Outgoing Citations\n")

        # Open and read the CSV file with PMID values
        with open('pmidfinal.csv', 'r') as csvfile:
            csvreader = csv.reader(csvfile)
            
            # Skip the first 16021 rows
            for i in range(16021):
                next(csvreader, None)
            
            # Create a multiprocessing pool
            pool = multiprocessing.Pool()
            
            for row in csvreader:
                if row:
                    pmid = str(row[0])  # Assuming the 'PMID' column is the first (index 0) column
                    pool.apply_async(process_pmid, args=(pmid,), callback=lambda result: write_to_file(f, pmid, result))
            
            pool.close()
            pool.join()

    print("Process Complete")

This is my code for the parser (skipped first 16021 rows as i had already gotten information on the ones before)

I have a csv file containing only PMIDs

image

This is how it looks all PMIDs where taken from pubmeds oa subset

@chungimungi chungimungi changed the title Bug report Bug report - XML citations from website Oct 11, 2023
@titipata
Copy link
Owner

Thanks @chungimungi! I do not have time to take a look at the code. However, it seems like we need to check parse_outgoing_citation_web to see what goes wrong. The XML format may have changed quite a bit since my last time written this code.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants