Releases: bioinform/somaticseq
v3.10.0
v3.9.1
v3.9.0
Mostly for maintenance
- Add
pyproject.toml
to modernize build scripts forpip install .
as./setup.py install
is being deprecated by python.- Additional entrypoints for scripts shown in
pyproject.toml
. Scripts insetup.py
are kept to be backward-compatible to previous command line interface.
- Additional entrypoints for scripts shown in
- Extra packages for development can be installed via
pip install '.[dev]'
. - Add an initial test in
tests
and movedexample
there. - Refactored some functions
v3.8.0
Mostly maintenance
- A lot of coding stylistic changes to make things more modern and easier to maintain
- Enforce versioning for some dependencies in
setup.py
, includingpython>=3.10
- For XGBoost model, additionally output json file (i.e., decision trees)
- Updated some docker files for 3rd party tools
- remove
-d dbsnp.vcf.gz
parameter from tumor-only LoFreq command (that param is only meaningful for tumor-normal pair)
Full Changelog: v3.7.4...v3.8.0
Restrict xgboost to >=1.4
ntree_limit is replaced with iteration_range in xgboost.predict in xgboost >=1.4. This release uses iteration_range=(0, iterations)
instead of ntree_limit=iterations
.
Allow custom hyperparameters be passed into somaticseq_parallel.py
Allow xgboost hyperparameters be passed into somaticseq_parallel.py
, e.g., somaticseq_parallel.py --somaticseq-train --extra-hyperparameters scale_pos_weight:0.1 seed:100
. Previously, they could only be passed into somatic_xgboost.py
. Beware, however, multi-argument options like --extra-hyperparameters
and --features-excluded
cannot be placed immediately before paired
or single
, because otherwise it'll try to include paired
or single
as an argument instead of invoking paired
or single
mode.
Check VCF file sorting order
- More robustly check sorting order when VCF files are being read. Raise Exception when they are not sorted according to the reference file.
- Change
-u $UID
to-u $(id -u):$(id -g)
when invoking docker command insomaticseq.utilities.dockered_pipelines.container_option
.
minor bug fix in docker workflow
- Fixed three bugs where dbsnp and cosmic VCF and exclusion-region BED files did not pass properly in
makeSomaticScripts.py
. - No change in SomaticSeq code otherwise.
SomaticSeq now supports input of *any* VCF file(s) from any caller(s)
Major feature upgrade: SomaticSeq now supports the input of any arbitrary VCF files in addition to the callers we have explicitly incorporated, e.g., via --arbitrary-snvs callerX_snv.vcf callerY_snv.vcf
and --arbitrary-indels callerA_indel.vcf callerB_indel.vcf
options for the somaticseq_parallel.py
command.
Must separate the SNVs and indels into separate VCF files before using them as input to SomaticSeq. If you have a VCF file that has combined SNV and indels, you may use this script included in our repo, i.e., splitVcf.py -infile combined_variants.vcf -snv snvs.vcf -indel indels.vcf
. Input can be both .vcf
or .vcf.gz
. Output will be .vcf
.
For the "arbitrary input VCF files," calls labeled as REJECT in the FILTER field will not be counted and will be assigned a value of 0 in the if_Caller_X
fields. Calls labeled as LowQual will be assigned a value of 0.5. Calls without any filter label will be counted as a bona fide call for that particular VCF file and assigned a value of 1, i.e., as though it is a PASS call. So modify your VCF files accordingly if needed.
Special release for SEQC2 somatic mutation reference samples project
- This is a special release for the Somatic Mutation Working Group of the SEQC2 Consortium to establish v1.2 of the somatic reference call set, i.e., Fang, L.T., Zhu, B., Zhao, Y. et al. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat Biotechnol 39, 1151-1160 (2021) / PMID:34504347 / SharedIt Link / Youtube presentation.
- This release is based on the older SomaticSeq v2.8.1. It contains many custom scripts specifically designed to complete SEQC2's somatic reference samples project.
- This release is not intended for general use.
- No code change, but updated NCBI's new FTP address in the README over the original commit. The FTP address for the SEQC2 Somatic Mutation Working Group can be found here here, so navigate there if a file changed its original location.