Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Parse2: created #96

Closed
wants to merge 5 commits into from
Closed

Conversation

agalitsyna
Copy link
Member

@agalitsyna agalitsyna commented Mar 9, 2021

  • docs
  • parse 'all' policy removed
  • parse2 command
  • parse2 coordinate-system option
  • tests

Parse2
-------------------------

If your Hi-C has long reads, you may want to report all the alignments in the reads with ``pairtools parse2``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe specify read length where it would benefit? e.g.
If your Hi-C has long reads (>50bp).

Would this work on nanopore reads or not? Maybe good to mention somewhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

150 bp allows you to save 10% of the simple Hi-C library on DpnII. On Nanopore this won't work for now. Good points, will mention!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by #99

'first column lists scaffold names. Any scaffolds not listed will be '
'ordered lexicographically following the names provided.')
@click.option(
"-o", "--output",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be more clear if this variable be renamed to output-file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, good suggestion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revolved in PR update: #99

@agalitsyna
Copy link
Member Author

agalitsyna commented Mar 9, 2021

  • IPython notebook on MC-3C reads with parse2 (TestWalks.ipynb update)
  • Parsing of the sam header (breaks when the options are not as expected)
  • Arima or Hi-C reads on 150-300 bp as proof of concept
  • [-] Pore-C as additional test (impossible for now, as there are no public Pore-C fastqs)
  • minimap output?

@agalitsyna agalitsyna marked this pull request as draft March 21, 2021 18:11
@agalitsyna agalitsyna closed this Mar 21, 2021
@agalitsyna agalitsyna deleted the origin/parse_all branch March 21, 2021 18:48
@agalitsyna
Copy link
Member Author

This PR is moved to drafts, the branch is renamed to "parse2"

agalitsyna added a commit to agalitsyna/pairtools that referenced this pull request Apr 9, 2021
@agalitsyna agalitsyna mentioned this pull request Apr 10, 2021
4 tasks
@agalitsyna
Copy link
Member Author

Updated version of this PR: #99

agalitsyna added a commit that referenced this pull request Dec 8, 2021
* Parse2: created. Improved version of parse2 with resolved comments from the previous PR: #96

Major changes:

* Single-end mode of parse2 added, --single-end option. Tested on minimap2 output for MC-3C.

* parse2 now has three possible coordinate systems for reporting: read, walk and pair (described in the docstring). Default coord system "read" tested.

* demo notebook with MC-3C and Arima datasets

* simplified code of parse2, e.g. push_pair function added instead of repetitive code
improved docstrings

* Max molecule size replaced with max fragment size.  

* parse2(docs): Documentation improved, #96 (comment) resolved.

* Option to report 5' or 3' ends option added.
@agalitsyna agalitsyna mentioned this pull request Dec 8, 2021
agalitsyna added a commit that referenced this pull request Apr 11, 2022
Improved version of parse2 with resolved comments from the previous PR: #96

- Separation of parse and parse2 modules. Parse has an option --walks-policy all, which parses long walks, but always reporting pair orientation and outer positions of 5'-ends, as if each pair was read in paired-end mode independently. Parse2 is specifically designed for long walks, and has options --report-position and --report-orientation, which might be used to report junctions, or reads, or walks.

- Parse2 has an option to parse single-end reads, --single-end option, tested on minimap2 output for MC-3C.

- Parse2 has the max_fragment_size instead instead of parse's max_molecule_size, which help to determine the overlapping ends of forward and reverse reads.

- Recent update simplifies the code: single _parse library used by both parse and parse2,

- a number of functions that reduce repetitive code, e.g. push_pair function,

- dosctrings and documented structure of _parse library.

- Both parse and parse2 have the options to report 5' or 3' ends; to flip alignments according to chromosome coordinate.

- Both parse and parse2 have the pysam backend

- Improvements of the tests for parse and parse2

- Documentation includes description of various --report-orientation and --report-position cases.
agalitsyna added a commit that referenced this pull request Jun 1, 2022
* Separate cli and lib

* pairtools flip fix for unannotated chromosomes, resolving #91

* handle empty chromosomes, resolved
#76

* fixed rfrags indexing and first rfrag omission, resolved
#73

* resolved or deprecated suggestions in #16

* merge improvements, header merge fixed

- resolved merge without arguments: #61

- option to add only the first header in merge, resolved
#18

* in merge, added option to concatenate instead of merge sorted inputs,
resolving: #23

* merge now checks that columns of inputs are the same

* I/O improvements

- auto_open defaults to stdin/stdout when path evaluates to False.
resolved #48

- auto_open defaults to stdin/stdout when the path is "-"

- if the stream is optional, it's controlled by the module itself

* Parse2 update (#99) (#109)

Improved version of parse2 with resolved comments from the previous PR: #96

- Separation of parse and parse2 modules. Parse has an option --walks-policy all, which parses long walks, but always reporting pair orientation and outer positions of 5'-ends, as if each pair was read in paired-end mode independently. Parse2 is specifically designed for long walks, and has options --report-position and --report-orientation, which might be used to report junctions, or reads, or walks.

- Parse2 has an option to parse single-end reads, --single-end option, tested on minimap2 output for MC-3C.

- Parse2 has the max_fragment_size instead instead of parse's max_molecule_size, which help to determine the overlapping ends of forward and reverse reads.

- Recent update simplifies the code: single _parse library used by both parse and parse2,

- a number of functions that reduce repetitive code, e.g. push_pair function,

- dosctrings and documented structure of _parse library.

- Both parse and parse2 have the options to report 5' or 3' ends; to flip alignments according to chromosome coordinate.

- Both parse and parse2 have the pysam backend

- Improvements of the tests for parse and parse2

- Documentation includes description of various --report-orientation and --report-position cases.

* Merge pairlib into pairtools.lib.

* CLI for scalings added.

* stats output in yaml format

* Header CLI (#121)

- new module called by `pairtools header`
- submodules: 
  - generate : Generate the header
  - set-columns : Add the columns to the .pairs/pairsam file
  - transfer : Transfer the header from one pairs file to another
  - validate-columns : Validate the columns of the .pairs/pairsam file
- resolves #119 
- option remove-columns for `pairtools select`: Remove the columns from .pairs/pairsam file

* pairtools phase critical update (#114)

* imporant fixes: - cython dedup with no-parent id forgotten counter reset; - sphinx doc update (added pysam); - header warning if empty and error if try to add a field to empy one

* Add summaries (#105)

* Add functions for duplication tile and complexity

* Make dedup stats!

* Benchmarks finalization

* [WIP] Stats split by filters (#132)

* Markasdup lib removed; markasdup CLI explanation improved

* dedup filter stats added and tested

Co-authored-by: Aleksandra Galitsyna <agalitzina@gmail.com>
Co-authored-by: Ilya Flyamer <flyamer@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants