GitHub - jhajagos/PreparedSource2OHDSI: Spark based mapper for converting EHR data to OHDSI

Scaling EHR (Electronic Health Record) data mapping to the OHDSI CDM

The goal of this project is to scale the mapping of clinical data to the OHDSI CDM (Common Data Model). The OHDSI CDM is the data standard for data analytics on EHR data. The scalable compute is provided by using a SPARK (>3.0) compute environment. Data is written in the Apache Parquet format and can be either directly queried or staged into a relational SQL database.

The mapping from source to OHDSI consists of the following steps:

Stage CSV files extracted from the EHR in a SPARK Cluster accessible location
Map stage CSV data to PSF (Prepared Source Format) format (See Synthea example)
Stage OHDSI Vocabulary/Concept (TSV) files as parquet file
Map PSF to the OHDSI CDM (Currently supported are 5.4 and 5.3.1 and versions) Parquet Format
Register generated parquet files in a database catalog (Delta tables) or insert parquet files into a relational database.

The mapping scripts writes parquet files in an OHDSI "compatible" format. The generated parquet files include additional fields not part of the OHDSI CDM. The additional columns allow the tracking of the initial data provence.

A Docker biuld file is included to map Synthea data to PSF and to OHDSI; see: README.md.

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
map		map
src		src
tests		tests
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scaling EHR (Electronic Health Record) data mapping to the OHDSI CDM

About

Releases

Packages

Languages

License

jhajagos/PreparedSource2OHDSI

Folders and files

Latest commit

History

Repository files navigation

Scaling EHR (Electronic Health Record) data mapping to the OHDSI CDM

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages