-
Notifications
You must be signed in to change notification settings - Fork 1
Loading MeSH datasets
Jena assembler config file <MTW_HOME_DIR>/instance/conf/mesh.ttl MUST BE set up properly !
https://github.com/filak/MTW-MeSH/blob/master/flask-app/instance/conf/mesh_Jena4.ttl
https://github.com/filak/MTW-MeSH/blob/master/flask-app/instance/conf/mesh_Jena5.ttl
Copy the file to <MTW_HOME_DIR>/instance/conf/ and rename it as mesh.ttl
Adjust the paths in mesh.ttl to your <FUSEKI_DATA_DIR>
Use forward slashes
tdb2:location "c:/<FUSEKI_DATA_DIR>/databases/mesh" ;
text:directory "c:/<FUSEKI_DATA_DIR>/indexes/mesh" ;
- Validate mesh.ttl
No output = file is OK
riot --validate mesh.ttl
-
Copy the mesh.ttl file to:
<FUSEKI_DATA_DIR>/configuration/
Download the official MeSH RDF dataset mesh.nt.gz from https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/
You might use curl tool for downloading
curl https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/mesh.nt.gz --ssl-no-revoke -O
The mesh.nt.gz currently available is still the MeSH 2024 version - hash c9ef004de88b9201b84f90aad2966bfd067af799
And despite several efforts (https://github.com/HHS/meshrdf/issues/212#issuecomment-2539919254) to get some information when the full RDF dataset for MeSH 2025 version will be made available (if at all) - NLM stays silent. Also the release notes are outdated https://hhs.github.io/meshrdf/release-notes
The only official MeSH 2025 RDF datasets available are here: https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/2025/ - but:
- these are not the complete datasets - obsolete/inactive items are missing - no meshv:active triples are present
- this is the "name-spaced" version - prefix http://id.nlm.nih.gov/mesh/2025/
The information about MeSH item status is vital - both for the translation process and for functional MTW outputs/exports. There are existing data workflows for updating obsolete MeSH descriptors etc which rely on active/inactive status.
So what can be done in this situation ? Let's try create the most complete MeSH 2025 RDF version.
You can follow this guide or skip it and just download the final files
Download all the official MeSH 2025 XML files here and produce the RDF dataset mesh.nt.gz with https://github.com/HHS/meshrdf script - no year in the namespace (!)
OR
Download the https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/2025/mesh2025.nt.gz and update the namespace using MTW script tools/update-ns.py
py update-ns.py mesh2025.nt.gz http://id.nlm.nih.gov/mesh/2025/ http://id.nlm.nih.gov/mesh/ mesh.nt.gz
Download the complete MeSH 2024 dataset mesh.nt.gz - save it as **mesh2024_full.nt.gz ** and extract the inactive items using Jena tool arq with this query:
arq --data=mesh2024_full.nt.gz --query=mesh-inactive.sparql > mesh2024_inactive.nt
If you have not translated MeSH before - you can proceed to Import.
Use the trans_only_YYYY_extended.txt and convert it with the mesh-trx2nt tool.
The file MUST have the following columns/items:
DescriptorUI | ConceptUI | Language | TermType | String | TermUI | ScopeNote | Tree | Created | Relation | ParentCUI
- the header row is optional
- the TermUI column is always empty
- the Relation and ParentCUI need to be present at rows with Custom Concepts (ConceptUI starts with F...) and TermType PEP only
Display help - open CMD and run:
mesh-trx2nt -h
usage: mesh-trx2nt inputFile langcode meshxPrefix [options]
Extracting translation dataset from NLM UMLS text file [trans_only_2023_expanded.txt]
positional arguments:
inputFile NLM UMLS text file name (plain or gzipped)
langcode Language code
meshxPrefix MeSH Translation namespace prefix ie. http://my.mesh.com/id/
options:
-h, --help show this help message and exit
--out OUT Output file name prefix
IMPORTANT
The langcode parameter MUST be the same as the TARGET_LANG value in your mtw.ini config file !
The meshxPrefix parameter MUST be the same as the TARGET_NS value in your mtw.ini config file !
Run the conversion - open CMD and run ie.:
mesh-trx2nt trans_only_2023_extended.txt fr http://id.mesh.fr/
Download your *.xml translation file at
https://nlmpubs.nlm.nih.gov/projects/mesh/MESH_FILES/.mtms/
Extract translation data from MeSH XML as N-triples dataset using mesh-xml2trx tool
-
Run the extraction script:
mesh-xml2trx *.xml <TARGET_NS>
IMPORTANT: TARGET_NS - target namespace parameter - the custom URI prefix for you translation - it MUST be the same as TARGET_NS used in your mtw.ini config file !
https://github.com/filak/MTW-MeSH/blob/master/flask-app/instance/conf/mtw.ini
ie.
mesh-xml2trx czedesc2018.xml.gz http://mesh.medvik.cz/link/
-
ALWAYS validate ALL the input files
Run the validation:
No output = dataset is OK
riot --validate *.gz
-
Move the input files into a versioned <IMPORT> directory ie. .../MeSH-data/2023/import/
-
Load the MeSH datatset(s) into Apache Jena
Stop Fuseki server instance (if running)
Go to your <IMPORT> directory
Run the import:
tdb2_tdbloader --loc %FUSEKI_BASE%/databases/mesh mesh.nt.gz mesh-trx_ ...
or if you do not have a translation then just:
tdb2_tdbloader --loc %FUSEKI_BASE%/databases/mesh mesh.nt.gz
-
Create Fuseki search index
Go to your <FUSEKI_DATA_DIR>
cd %FUSEKI_BASE%
Run the indexation - Jena v4:
java -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/mesh.ttl
Run the indexation - Jena v5+:
java --add-modules jdk.incubator.vector -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/mesh.ttl
-
Start Fuseki server instance
-
Stop MTW services
-
Stop your Fuseki instance
-
Go to your <FUSEKI_DATA_DIR> and make sure the <mesh> directories under datatabases and indexes dirs are empty !
Run the import:
tdb2_tdbloader --loc %FUSEKI_BASE%/databases/mesh %FUSEKI_BASE%/backups/mesh_YYYY-MM-DD_....nq.gz
Create the search index - Jena v4 - run:
java -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/mesh.ttl
Create the search index - Jena v5+ - run:
java --add-modules jdk.incubator.vector -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/mesh.ttl
-
Start your Fuseki instance
-
Start MTW services
Continue to MeSH Annual Updates