-
Notifications
You must be signed in to change notification settings - Fork 13
Version History
Jouni Siren edited this page Nov 16, 2020
·
95 revisions
- Direct
GBWT
toDynamicGBWT
conversion. - Temporary files are now thread-safe.
- An option to use persistent phasing files for haplotype generation. These files persist when the associated object is deleted, but they are still deleted when the program exits.
- The fast GBWT merging algorithm now works with overlapping node id ranges as long as the non-empty records do not overlap.
-
metadata_tool
now prints metadata or removes it completely.
-
FastLocate
: Optional fastlocate()
structure based on the r-index. - Ignore metadata from empty GBWTs during merging.
- Construction from paths with a many different starting nodes is faster.
- Option to force the phasing of homozygous variants (default on).
-
CachedGBWT
: A caching layer for workloads that repeatedly access the same subset of nodes. - Direct
DynamicGBWT
toGBWT
conversion. - Install script.
- Extended metadata with path, sample, and contig names.
- Sample names and contig name in VCF parse.
- Create full metadata when building GBWT from a VCF parse using
build_gbwt
. - Renamed
metadata
tometadata_tool
. - Remove sequences by sample / contig name in
remove_seq
. - New functionality:
GBWT::firstNode()
,GBWT::empty(node)
.
- An algorithm for removing sequences from
DynamicGBWT
. - Multiple parallel merge jobs in BWT-merge.
-
build_gbwt
improvements: Accept file lists, write metadata when building from VCF parse.
- Parallel merging algorithm for quickly merging multiple GBWTs over the same chromosome. It can reduce the index construction time for large datasets by a factor of 2 to 3.
- Optional metadata in the GBWT index.
- New functionality:
GBWT::extract(position)
,GBWT::extract(position, max_length)
,DynamicGBWT::fullLF()
.
- Option to change the path identifier sampling interval.
- Save the temporary structures from haplotype generation and use them as input for
build_gbwt
. - Decompress the endmarker of compressed GBWT for faster
extract()
queries in indexes with millions of paths. - Bug fix: Initialize incoming edges correctly when loading
DynamicGBWT
if alphabet offset is non-zero. - Support for Clang.
- Support for bidirectional search.
- Bug fixes for empty indexes.
- Use
vector_type
(32-bit integers) instead ofstd::vector<node_type>
(64-bit integers). - Support structures for generating haplotypes from a phased VCF file.
- New functionality:
GBWT::hasEdge()
,GBWT::edges()
,GBWT::find(node)
. - Read and write data in smaller blocks to avoid the issue with >2 GB reads in GCC on macOS.
- Faster
GBWT::LF(from, i)
,GBWT::prefix()
,GBWT::locate()
, andGBWT::extract()
queries.
- New construction option:
GBWTBuilder
collects inserted sequences and builds GBWT in a background thread. - Support for node and path orientations.
- Fast merging when the node ids do not overlap.
- The second pre-release.
- High-level interface (
find()
,extend()
,locate()
,extract()
) shared betweenGBWT
andDynamicGBWT
. - Construction from
std::vector<node_type>
, which is also the type of extracted sequences. - More versatile construction program supporting multiple inputs and inserting sequences into an existing index.
- Tools display version information.
- The first pre-release.
- Incremental index construction and GBWT merging.
- LF-mapping and
locate()
queries for determining path identifiers.
- Use two records for the endmarker in bidirectional indexes.
- With one endmarker, the BWT contains an alternating sequence of initial and final nodes of a chromosome.
- Use binary search in
DynamicGBWT::tryLocate()
. - Inverse suffix array functionality.
- Get offset for a path in a given node.
- Compressed to dynamic GBWT conversion.
- Encode the destination of the first outgoing edge relative to the current node.
- Incremental construction without buffering: Make the
Sequence
objects public and extend each sequence by one node at a time.- This only works with forward orientation.
- Memory-mapped compressed GBWT.