-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
import eCRF data (.xls, .xslx, SQLite, ...) #104
import eCRF data (.xls, .xslx, SQLite, ...) #104
Comments
Overviewthe eCRF data importer can be launched from commandline ( phoenixctms/install-debian#12 )
or the trials' "Jobs" tab. Currently it supports loading the intuitive "horizontal" .csv/.xls/.xlsx formats ("single row per subject"). It will work for partial data (eg. you want to populate only certain eCRF fields such as lab data) and is safe for repeated execution. This means it will lookup the subject by alias or subject list attributes, and create®ister a subject if it does not exist yet. It is implemented as BulkProcessor project ( phoenixctms/bulk-processor#9 ), so it is a multithreaded perl program connecting to the PhoenixCTMS webapp via REST-API:
Disabled ecrfs/fields are skipped, as well as locked/signed eCRFs. By default errors (such as values exceding configured range limits) will be logged to the job output, but will not abort the import. It mimics data entry by users and is protected from mutual interference in the same way (optimistic locking). There are similar config options like the eCRF exporter has, which can be specified in settings.yml files ( phoenixctms/config-default#6 ). It eg. allows to tune how to derive and abbreviate column names (unless they have explicit "External ID"s). so eCRF data export and import by default will work with the same dictionary of column names. File FomatsBoth eCRF exporter and importer agree on .csv file specs below:
Support for excel date values and explicit cell format is still pening for excel format in both importer and exporter. So be aware it loves coercing strings to numbers so effectively cutting off leading zeroes when using excel file types! ValidationImporting an exported .csv will create a perfect clone of the dataset, so it's suggested to use .csv over Excel formats for exchanging data. This allows to give a proof of correctness:
Now database views can be prepared to obtain the datasets in SQL directly:
The datasets can be compared deeply using SQL set operations. If exactly equal, both "trial X dataset EXCEPT trial Y dataset" (see below) and "trial Y dataset EXCEPT trial X dataset" have to give empty results.
for the given example eCRF setup this will however report values of SKETCH input types not present in trial Y. which is expected since sketch data is currently not exported/imported.
|
The bulk-processor supports exporting eCRF data in .xlsx and .csv format. It also provides exporting to a SQLite ".db" file, containing eCRF data including metadata in a verbose, denormalized schema ("vertical" format). eCRF data can also be exported in "horizontal" formats (single row per subject).
The request is to implement a multi-purpose importer which processes these various file formats to populate eCRFs.
This way the clinical data can be easily transfered from instance to instance. It should also be possible to import files generated externally (eg. lab data).
It is planned to trigger imports from either the command line or in the UI (trial's "Jobs" tab).
The text was updated successfully, but these errors were encountered: