-
Notifications
You must be signed in to change notification settings - Fork 13
Version History
Jouni Siren edited this page Nov 1, 2024
·
95 revisions
- GBWT is now silent by default; adjust with
Verbosity::set()
if necessary. -
GBWTBuilder
(and related tools) will automatically increase buffer size if a sequence is too large for the buffer. - Metadata improvements:
-
FullPathName
: A standalone version ofPathName
that stores sample/contig names/ids as strings without requiringMetadata
. -
Metadata::findFragment()
: Returns the path identifier of the haplotype fragment possibly covering the (sample, contig, haplotype, offset) represented by a path name.
-
- New functionality:
-
FastLocate::decompressSA()
andFastLocate::decompressDA()
for decompressing the part of the suffix array / document array corresponding to a node.
-
- Empty paths are fully supported (but still discouraged).
- Text input format for
build_gbwt
(mostly for testing). - The broken CMake support has been removed.
- Supports 64-bit ARM.
- File format version 5:
- Optional serialization using simple-sds structures.
-
Tags
structure storing arbitrary key-value pairs. - Compatible with versions 1-4.
- Uses
Metadata
version 2 (compatible with versions 0-1).
-
inverseLF()
: Follow the sequence backward in a bidirectional index. - Serialization and loading use exceptions to handle failures.
- Requires the vgteam fork of SDSL.
- Uses C++14 and the vgteam fork of SDSL.
- Direct
GBWT
toDynamicGBWT
conversion. - Temporary files are now thread-safe.
- An option to use persistent phasing files for haplotype generation. These files persist when the associated object is deleted, but they are still deleted when the program exits.
- The fast GBWT merging algorithm now works with overlapping node id ranges as long as the non-empty records do not overlap.
-
metadata_tool
now prints metadata or removes it completely.
-
FastLocate
: Optional fastlocate()
structure based on the r-index. - Ignore metadata from empty GBWTs during merging.
- Construction from paths with a many different starting nodes is faster.
- Option to force the phasing of homozygous variants (default on).
-
CachedGBWT
: A caching layer for workloads that repeatedly access the same subset of nodes. - Direct
DynamicGBWT
toGBWT
conversion. - Install script.
- Extended metadata with path, sample, and contig names.
- Sample names and contig name in VCF parse.
- Create full metadata when building GBWT from a VCF parse using
build_gbwt
. - Renamed
metadata
tometadata_tool
. - Remove sequences by sample / contig name in
remove_seq
. - New functionality:
GBWT::firstNode()
,GBWT::empty(node)
.
- An algorithm for removing sequences from
DynamicGBWT
. - Multiple parallel merge jobs in BWT-merge.
-
build_gbwt
improvements: Accept file lists, write metadata when building from VCF parse.
- Parallel merging algorithm for quickly merging multiple GBWTs over the same chromosome. It can reduce the index construction time for large datasets by a factor of 2 to 3.
- Optional metadata in the GBWT index.
- New functionality:
GBWT::extract(position)
,GBWT::extract(position, max_length)
,DynamicGBWT::fullLF()
.
- Option to change the path identifier sampling interval.
- Save the temporary structures from haplotype generation and use them as input for
build_gbwt
. - Decompress the endmarker of compressed GBWT for faster
extract()
queries in indexes with millions of paths. - Bug fix: Initialize incoming edges correctly when loading
DynamicGBWT
if alphabet offset is non-zero. - Support for Clang.
- Support for bidirectional search.
- Bug fixes for empty indexes.
- Use
vector_type
(32-bit integers) instead ofstd::vector<node_type>
(64-bit integers). - Support structures for generating haplotypes from a phased VCF file.
- New functionality:
GBWT::hasEdge()
,GBWT::edges()
,GBWT::find(node)
. - Read and write data in smaller blocks to avoid the issue with >2 GB reads in GCC on macOS.
- Faster
GBWT::LF(from, i)
,GBWT::prefix()
,GBWT::locate()
, andGBWT::extract()
queries.
- New construction option:
GBWTBuilder
collects inserted sequences and builds GBWT in a background thread. - Support for node and path orientations.
- Fast merging when the node ids do not overlap.
- The second pre-release.
- High-level interface (
find()
,extend()
,locate()
,extract()
) shared betweenGBWT
andDynamicGBWT
. - Construction from
std::vector<node_type>
, which is also the type of extracted sequences. - More versatile construction program supporting multiple inputs and inserting sequences into an existing index.
- Tools display version information.
- The first pre-release.
- Incremental index construction and GBWT merging.
- LF-mapping and
locate()
queries for determining path identifiers.
- Use binary search in
DynamicGBWT::tryLocate()
. - Inverse suffix array functionality.
- Get offset for a path in a given node.
- Encode the destination of the first outgoing edge relative to the current node.
- Memory-mapped compressed GBWT.