Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

taxonkit reformat taxid 0 not found #79

Closed
Krasnopeev opened this issue Apr 10, 2023 · 2 comments
Closed

taxonkit reformat taxid 0 not found #79

Krasnopeev opened this issue Apr 10, 2023 · 2 comments

Comments

@Krasnopeev
Copy link

Hi there!

I try to classify my ASV after 16S/18S gene sequncing with kraken and get lineage for each taxid.

here is my pipe:

cat kraken2_console_out.tsv \
 | csvtk cut -Ht -f 2,3 \
 | taxonkit reformat -I 2 -f '{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}\t{t}' -r "Unclassified" \
 | csvtk add-header -t -n seq,taxid,kindom,phylum,class,order,family,genus,species,strain > kraken2_console_out_taxonomy.tsv

kraken2_console_out.tsv looks like:

C	ASV0000001	1959104	199	0:5 338190:5 0:11 338190:2 0:15 1959104:2 338190:9 0:32 1959104:9 0:13 1959104:3 0:23 651137:7 0:10 651137:3 0:4 651137:5 0:7
C	ASV0000002	222543	226	0:2 131567:6 0:1 222543:3 0:6 222543:5 0:169
C	ASV0000003	222543	195	0:2 131567:6 0:1 222543:3 0:6 222543:5 0:138
C	ASV0000004	1959104	199	0:5 338190:5 0:11 338190:2 0:15 1959104:2 338190:9 0:32 1959104:9 0:13 1959104:3 0:23 651137:7 0:10 651137:3 0:4 651137:5 0:7
C	ASV0000005	651137	204	0:28 338190:1 0:27 651137:2 0:8 651137:1 0:15 651137:5 0:17 651137:5 0:12 651137:4 0:42 1959104:1 0:2
C	ASV0000006	651137	204	0:28 338190:1 0:27 651137:2 0:8 651137:1 0:15 651137:5 0:17 651137:5 0:12 651137:4 0:42 1959104:1 0:2
C	ASV0000007	1959104	221	0:5 338190:5 0:11 338190:2 0:15 1959104:2 338190:9 0:32 1959104:9 0:13 1959104:3 0:23 651137:7 0:10 651137:3 0:4 651137:5 0:11 651137:1 0:17
C	ASV0000008	222543	253	0:2 131567:6 0:1 222543:3 0:6 222543:5 0:196
C	ASV0000009	1959104	181	0:5 338190:5 0:11 338190:2 0:15 1959104:2 338190:9 0:32 1959104:9 0:13 1959104:3 0:23 651137:7 0:10 651137:1
U	ASV0000010	0	214	0:10 55601:2 0:162 131567:3 0:3

after this line | taxonkit reformat -I 2 -f '{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}\t{t}' -r "Unclassified" \
I recive in console:

...
14:11:29.082 [WARN] taxid 0 not found
14:11:29.082 [WARN] taxid 0 not found
14:11:29.082 [WARN] taxid 0 not found
14:11:29.082 [WARN] taxid 0 not found
14:11:29.082 [WARN] taxid 0 not found
14:11:29.082 [WARN] taxid 0 not found
14:11:29.082 [WARN] taxid 0 not found
14:11:29.082 [WARN] taxid 0 not found
14:11:29.082 [WARN] taxid 0 not found
14:11:29.082 [WARN] taxid 0 not found
...

Ok, but when I try to add header with
| csvtk add-header -t -n seq,taxid,kindom,phylum,class,order,family,genus,species,strain > kraken2_console_out_taxonomy.tsv

I recive

[ERRO] record on line 10: wrong number of fields

Ez way is to skip all lines with taxid 0 but I need to keep them for downstream analysis. That is a problem.

How can I do that?

Thanks!

@shenwei356
Copy link
Owner

Thanks for using TaxonKit. Now it outputs the same format for TaxIds not found in the database, and the missing default values can also be set with -r and R.

  -r, --miss-rank-repl string          replacement string for missing rank
  -R, --miss-taxid-repl string         replacement string for missing taxid

Examples:

$ echo -ne  "562\n0"  \
    | taxonkit  reformat -I 1 -f '{p}\t{s}' \
    | csvtk pretty -Ht
15:47:49.478 [WARN] taxid 0 not found
562   Proteobacteria   Escherichia coli
0


$ echo -ne  "562\n0" \
    | taxonkit  reformat -I 1 -f '{p}\t{s}' -t -r / -R 0 \
    | csvtk pretty -Ht
15:48:39.860 [WARN] taxid 0 not found
562   Proteobacteria   Escherichia coli   1224   562
0     /                /                  0      0

taxonkit_linux_amd64.tar.gz

@Krasnopeev
Copy link
Author

It works! Thanks a lot!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants