Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

data mapping service using linkml value sets #31

Open
balhoff opened this issue Apr 6, 2021 · 2 comments
Open

data mapping service using linkml value sets #31

balhoff opened this issue Apr 6, 2021 · 2 comments
Labels
in progress Somebody is working on this right now

Comments

@balhoff
Copy link

balhoff commented Apr 6, 2021

This will build on the validation mechanism defined in #29. Given some value set definitions within a linkml model, we should be able to map input data to likely data elements and values. This service could later be connected with a tool like Ptolemy to provide CRDC-H support.

@balhoff
Copy link
Author

balhoff commented May 28, 2021

Given the late-breaking addition of enumerations to the model, implementation of this will need to be deferred to Phase 3.

@balhoff balhoff removed this from the Phase 2 - ENDS (2021) milestone May 28, 2021
@gaurav
Copy link

gaurav commented Jul 6, 2021

I think this breaks down into two tasks:

  1. Demonstrating that transformations can be set up using Python data classes automatically generated from the LinkML model: Convert "head and neck" Juypter Notebooks into an automated Python test suite example-data#8
  2. Figure out if we can automate that transformation-generation process, i.e. if the model could tell you how to transform a GDC:Sample.biospecimen_anatomic_site to a CCDH:BodySite.site, or how to transform data from v1.0.1 of the CCDH model to v2.0. We're planning to do this in two ways:
    1. The Data Model Harmonization team is looking into coming up with a format for recording this transformation information in the model itself.
    2. Currently, enumerated values in the CCDH model are taken directly from the node data dictionaries, so CCDH:BodySite.site uses the same enumerated values as the union of the values used by GDC:Sample.biospecimen_anatomic_site, PDC:Sample.biospecimen_anatomic_site as well as other node mappings. When we start mapping these values to concepts and removing duplicates from these lists, I think the Terminology team plans to produce SSSOM files of those mappings, and the Tools team would then build tools to map values using those SSSOM files.

Does that sound right? Or am I missing something?

@gaurav gaurav added the in progress Somebody is working on this right now label Aug 16, 2021
@gaurav gaurav added this to the Phase 3 - Quarter 4 (2021) milestone Aug 16, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
in progress Somebody is working on this right now
Projects
None yet
Development

No branches or pull requests

2 participants