RD3_database

As part of the EJP-RD and as implemented in Solve-RD we are developing a metadata database to track and find samples processed by CNAG and submitted to the EGA.

The core model for this database is designed to store sample, subject and file metadata. Using existing standards.

Datamodel

All entities are:

Study - Container for all activities. Contains datasets
Organisation - Organisation involved in the study
Subject - Human subjects, typically patients or family members
SubjectInfo - Extra information about subject
Sample - Samples used as input for the analysis
File - Individual files on file systems so we can find them back, linked to the datasets describing them.
Filetype - type of files (e.g. BAM, gVCF, phenopacket, BED, etc.)
Person - Researcher or other person involved in the study
Job - Jobs used to process sampledata
Run - Container of jobs
Dataset - Collection of files, collected in context of a study. Could also call this a 'fileset' if we like that better
Publication - Publication linked to subject and/or variant
LabInfo - Information of process in lab (barcodes, sequencer,etc)
SequencingTechniqueType - Sequencing technique types (in CNAG batchfile = ExpType)
Variant - Identifier of an allele/genotype (HGVS)
VariantTypes - Sequence variant types
ClinicalClassification - Clinical Classification (1,2,3,4,5)
GenomeBuild - Human reference sequence used in UCSC
Library - Information for library used in experiment
Library Source - Library Source, e.g Genomic/Transcriptomic
European Reference Networks - European Reference Networks, source: https://ec.europa.eu/health/ern/networks_en
Tissue Types - TissueTypes, source is GTeX; https://www.gtexportal.org/home/tissueSummaryPage

CodeList (Ontologies)

anatomicalLocation - Code list for anatomicalLocation used for sampling. E.g. Blood
dataUseConditions - Code list describing different types of conditions to access the data
disease - ICD-10 codes example_data; from C00 till C06.2
materialType - Code list for materialType, e.g. DNA
phenotype - Code list for phenotype, e.g. HPO term
Sex - code list for sex. E.g. 'M'

EMX

The default import format for MOLGENIS is 'EMX'. This is a flexible spreadsheet format (Excel, CSV) that allows you to annotate your data with a data model. This works because you can tell MOLGENIS the 'model' of your data via a special sheet named 'attributes'.

Entities not in use:

Relation - Family entity relationship
Disease inheritance - Description of known inheritance linked to disease and possibly mutation
VariantInfo - Extra information about variant

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
Dataflow		Dataflow
EMX		EMX
UML		UML
datamodel		datamodel
LICENSE		LICENSE
README.md		README.md
q&a.md		q&a.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RD3_database

Datamodel

EMX

About

Releases 1

Packages

Contributors 4

License

molgenis/RD3_database

Folders and files

Latest commit

History

Repository files navigation

RD3_database

Datamodel

EMX

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Packages