Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Importing Agilent file formats directly #23

Open
daniel-darnell1 opened this issue Jan 18, 2022 · 8 comments
Open

Importing Agilent file formats directly #23

daniel-darnell1 opened this issue Jan 18, 2022 · 8 comments
Labels

Comments

@daniel-darnell1
Copy link

Hello! I am trying to batch export ~1,400 .D1000 files that my organization has amassed from 2018 to the present. Has anyone ever tried this before? It is unfeasible to open each file in the TapeStation Analysis software and export to XML.

I contacted Agilent tech support and they basically said it was impossible. I discovered that the .D1000 can be expressed/opened as a .zip but most of the files require a password to extract (of course they, understandably, would not give me the password).

I may be attempting an impossible task, but figured I had nothing to lose by asking.

Thanks all!

@jwfoley
Copy link
Owner

jwfoley commented Jan 18, 2022

Yes, it sure would be great to import data directly from the Agilent software's file formats. Unfortunately, I've corresponded with Agilent support too and they're unwilling to give me any of the specifications. It's probably possible to reverse-engineer the formats from the Bioanalyzer and ProSize software if some hacker wants to spend a while on that (leaving this Issue open in case anyone wants to volunteer, or just beg for it to be done by someone else), but because of the encryption that's not possible with the TapeStation format.

If you get into a nice groove with a fast computer, a reliable sequence of actions, some keyboard shortcuts, and maybe a nice podcast to keep your sanity, it might not actually take that long to export lots of files the slow way - maybe around ten seconds per file? In a recent release I added a (possibly not foolproof) check in read.electrophoresis to make sure you exported the electropherogram CSV in the correct format (uncheck "Aligned") so at least you can find user errors a lot faster than you make them.

Sorry I can't be of more help on this.

@jwfoley jwfoley changed the title Non-issue question Importing Agilent file formats directly Jan 18, 2022
@daniel-darnell1
Copy link
Author

Thank you @jwfoley!

I am currently attempting to acquire the password from the hash, but I have little faith it will yield anything based on the time involved. It has been running for 7 days non-stop with no hits yet. I'll keep it running just for "fun" but it seems you're right- the brute force manual approach will likely be quicker!

Thanks for your help and you're welcome to mark as closed if you'd like?

@HibaShaban
Copy link

HibaShaban commented May 9, 2023

I've found this R package that worked for me to convert the XAD to XML files:
https://github.com/grimbough/bioanalyzeR

I had to play with the hard-coded numbers, and changed "Oy9" to "Ox9" on line 43 in the readXAD.R file.

Not sure this is the same as .D1000 files but posting in case anyone wants to convert XAD files

@jwfoley
Copy link
Owner

jwfoley commented May 10, 2023

I've found this R package that worked for me to convert the XAD to XML files: https://github.com/grimbough/bioanalyzeR

I had to play with the hard-coded numbers, and changed "Oy9" to "Ox9" on line 43 in the readXAD.R file.

Not sure this is the same as .D1000 files but posting in case anyone wants to convert XAD files

Thanks for finding that! With a little bit of work, that approach should make it possible to read the XAD files directly. Unfortunately I'm quite busy this month and might not have time to do that work soon, but I will try to get to it when I can - or feel free to code it yourself (start in bioanalyzer.R) and submit a pull request.

@jwfoley jwfoley self-assigned this May 10, 2023
@jwfoley jwfoley removed the wontfix label May 10, 2023
@jwfoley
Copy link
Owner

jwfoley commented Jul 21, 2023

I've found this R package that worked for me to convert the XAD to XML files: https://github.com/grimbough/bioanalyzeR

I had to play with the hard-coded numbers, and changed "Oy9" to "Ox9" on line 43 in the readXAD.R file.

Not sure this is the same as .D1000 files but posting in case anyone wants to convert XAD files

Well, using these tricks I did figure out how to decode the XAD files: the compressed data field contains base64-encoded DEFLATE-compressed data in an unknown format with a 76-byte header and 9-byte footer, so if you cut those off and prepend a minimal gzip header you can then decompress it with gzip or even R's built-in transparent gzip reading (though it's encoded by UTF-16LE so you have to parse it accordingly or possibly even transcode to UTF-8 to avoid errors).

However, after all that trouble I discovered the XAD file doesn't actually contain the peak tables! The Bioanalyzer software must be calling peaks on the fly every time you open a file. So that means the XAD file can't be parsed into an electrophoresis object because it doesn't contain all the required data in the first place.

I'll save the code in a branch in case there's ever a workaround. Maybe if this package ever starts doing its own peak calling, it can handle XAD files, but for now peak calling isn't planned and no one has requested it.

@jwfoley jwfoley removed their assignment Jul 21, 2023
@daniel-darnell1
Copy link
Author

You sir, @jwfoley, are a genius!! Thank you so much for your continued work on this. This is absolutely amazing! Thanks again.

@HibaShaban
Copy link

Thanks @jwfoley for going through all the trouble to get this working! I may update at some point with an equivalent to find_peaks from scipy in python if I can find an appropriate way to do it. I have been successful computing the peaks that way in the past.

@jwfoley
Copy link
Owner

jwfoley commented Jul 24, 2023

Thanks for the kind words but I'm going to keep this open as we might someday find a workaround for the Bioanalyzer, get the TapeStation encryption key from Agilent, or tinker with the ProSize file formats.

@jwfoley jwfoley reopened this Jul 24, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants