Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Maximum number of samples analyzed with dmr multi #229

Closed
antoniognmk opened this issue Jul 10, 2024 · 5 comments
Closed

Maximum number of samples analyzed with dmr multi #229

antoniognmk opened this issue Jul 10, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@antoniognmk
Copy link

Hello, thank you for developing modkit.

Is there any maximum number of samples that can be analyzed with the modkit dmr multi command? I tried to analyze about 130 samples with dmr multi, both with versions 0.2.5 and 0.3.1, but the task failed with the following error:

thread 'main' panicked at src/dmr/util.rs:497:12: attempt to divide by zero

Then, I tried with five samples and it ran successfully. Is there some upper limit of samples? If there is no such limit, could it be that one of my samples is corrupted?

Any guidance is appreciated, thanks.

@ArtRand
Copy link
Contributor

ArtRand commented Jul 11, 2024

Hello @antoniognmk,

No that's a bug. I've pushed a fix to this branch. I don't have a build for you because Centos7 (the build environment I use) has reached EOL. If you can compile the project with cargo, you should be able to run all of your samples together.

If you can't get a custom build on your system, you'll have to wait for me to finish refactoring the builds/CI.

@ArtRand ArtRand added enhancement New feature or request bug Something isn't working and removed enhancement New feature or request labels Jul 11, 2024
@antoniognmk
Copy link
Author

Dear @ArtRand, thank you for your response. I'll give it a try.

@antoniognmk
Copy link
Author

Hello @ArtRand, I built the binaries with your branch, and it seems to be working, thank you. By the way, do you have any recommendation of how many CPU threads and how much RAM dmr multi requires? (as I mentioned, I am handling a few hundreds of samples)

@ArtRand
Copy link
Contributor

ArtRand commented Jul 16, 2024

@antoniognmk

Great, I should have the build system back up and running soon too (I'll need it for the next release of course).

Good question regarding resource requirements. tl;dr in my benchmarks ~1GB RAM/thread. More threads should speed up each pairwise comparison roughly linearly.

The dmr multi command isn't overly sophisticated, I'm afraid, in that it is a simple loop over the combinations of samples. So the settings you may have used in dmr pair experiments would be a good starting point for dmr multi. The parallelism in dmr multi comes from processing the regions in parallel. The number of regions that are processed in parallel is determined by the number of threads you pass to --threads. You may notice that the CPU utilization isn't very high and that's because most of the work is IO getting the bedmethyl records. If your regions are really large my above estimate may be low.

@antoniognmk
Copy link
Author

Great @ArtRand, thank you for your support. I am closing the issue now.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants