-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Parse2: created #96
Parse2: created #96
Conversation
agalitsyna
commented
Mar 9, 2021
•
edited
Loading
edited
- docs
- parse 'all' policy removed
- parse2 command
- parse2 coordinate-system option
- tests
Parse2 | ||
------------------------- | ||
|
||
If your Hi-C has long reads, you may want to report all the alignments in the reads with ``pairtools parse2``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe specify read length where it would benefit? e.g.
If your Hi-C has long reads (>50bp).
Would this work on nanopore reads or not? Maybe good to mention somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
150 bp allows you to save 10% of the simple Hi-C library on DpnII. On Nanopore this won't work for now. Good points, will mention!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved by #99
pairtools/pairtools_trackpath.py
Outdated
'first column lists scaffold names. Any scaffolds not listed will be ' | ||
'ordered lexicographically following the names provided.') | ||
@click.option( | ||
"-o", "--output", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be more clear if this variable be renamed to output-file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, good suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revolved in PR update: #99
|
This PR is moved to drafts, the branch is renamed to "parse2" |
Updated version of this PR: #99 |
* Parse2: created. Improved version of parse2 with resolved comments from the previous PR: #96 Major changes: * Single-end mode of parse2 added, --single-end option. Tested on minimap2 output for MC-3C. * parse2 now has three possible coordinate systems for reporting: read, walk and pair (described in the docstring). Default coord system "read" tested. * demo notebook with MC-3C and Arima datasets * simplified code of parse2, e.g. push_pair function added instead of repetitive code improved docstrings * Max molecule size replaced with max fragment size. * parse2(docs): Documentation improved, #96 (comment) resolved. * Option to report 5' or 3' ends option added.
Improved version of parse2 with resolved comments from the previous PR: #96 - Separation of parse and parse2 modules. Parse has an option --walks-policy all, which parses long walks, but always reporting pair orientation and outer positions of 5'-ends, as if each pair was read in paired-end mode independently. Parse2 is specifically designed for long walks, and has options --report-position and --report-orientation, which might be used to report junctions, or reads, or walks. - Parse2 has an option to parse single-end reads, --single-end option, tested on minimap2 output for MC-3C. - Parse2 has the max_fragment_size instead instead of parse's max_molecule_size, which help to determine the overlapping ends of forward and reverse reads. - Recent update simplifies the code: single _parse library used by both parse and parse2, - a number of functions that reduce repetitive code, e.g. push_pair function, - dosctrings and documented structure of _parse library. - Both parse and parse2 have the options to report 5' or 3' ends; to flip alignments according to chromosome coordinate. - Both parse and parse2 have the pysam backend - Improvements of the tests for parse and parse2 - Documentation includes description of various --report-orientation and --report-position cases.
* Separate cli and lib * pairtools flip fix for unannotated chromosomes, resolving #91 * handle empty chromosomes, resolved #76 * fixed rfrags indexing and first rfrag omission, resolved #73 * resolved or deprecated suggestions in #16 * merge improvements, header merge fixed - resolved merge without arguments: #61 - option to add only the first header in merge, resolved #18 * in merge, added option to concatenate instead of merge sorted inputs, resolving: #23 * merge now checks that columns of inputs are the same * I/O improvements - auto_open defaults to stdin/stdout when path evaluates to False. resolved #48 - auto_open defaults to stdin/stdout when the path is "-" - if the stream is optional, it's controlled by the module itself * Parse2 update (#99) (#109) Improved version of parse2 with resolved comments from the previous PR: #96 - Separation of parse and parse2 modules. Parse has an option --walks-policy all, which parses long walks, but always reporting pair orientation and outer positions of 5'-ends, as if each pair was read in paired-end mode independently. Parse2 is specifically designed for long walks, and has options --report-position and --report-orientation, which might be used to report junctions, or reads, or walks. - Parse2 has an option to parse single-end reads, --single-end option, tested on minimap2 output for MC-3C. - Parse2 has the max_fragment_size instead instead of parse's max_molecule_size, which help to determine the overlapping ends of forward and reverse reads. - Recent update simplifies the code: single _parse library used by both parse and parse2, - a number of functions that reduce repetitive code, e.g. push_pair function, - dosctrings and documented structure of _parse library. - Both parse and parse2 have the options to report 5' or 3' ends; to flip alignments according to chromosome coordinate. - Both parse and parse2 have the pysam backend - Improvements of the tests for parse and parse2 - Documentation includes description of various --report-orientation and --report-position cases. * Merge pairlib into pairtools.lib. * CLI for scalings added. * stats output in yaml format * Header CLI (#121) - new module called by `pairtools header` - submodules: - generate : Generate the header - set-columns : Add the columns to the .pairs/pairsam file - transfer : Transfer the header from one pairs file to another - validate-columns : Validate the columns of the .pairs/pairsam file - resolves #119 - option remove-columns for `pairtools select`: Remove the columns from .pairs/pairsam file * pairtools phase critical update (#114) * imporant fixes: - cython dedup with no-parent id forgotten counter reset; - sphinx doc update (added pysam); - header warning if empty and error if try to add a field to empy one * Add summaries (#105) * Add functions for duplication tile and complexity * Make dedup stats! * Benchmarks finalization * [WIP] Stats split by filters (#132) * Markasdup lib removed; markasdup CLI explanation improved * dedup filter stats added and tested Co-authored-by: Aleksandra Galitsyna <agalitzina@gmail.com> Co-authored-by: Ilya Flyamer <flyamer@gmail.com>