Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

incompatibility with tab character in line1 of FASTQ #241

Open
iorga opened this issue Sep 15, 2024 · 0 comments
Open

incompatibility with tab character in line1 of FASTQ #241

iorga opened this issue Sep 15, 2024 · 0 comments

Comments

@iorga
Copy link

iorga commented Sep 15, 2024

Hello,

With FASTQ files generated by dorado demux --emit-fastq kmc (I am using KMC 3.2.4) doesn't work correctly:

~/temp_test > cat dorado.fastq 
@924f595d-fb8d-473e-bb99-b554dfdd5ce9	st:Z:2024-09-11T03:04:06.675+00:00	RG:Z:41b0457e40e383b3df64af4d8e649576ca9a4668_dna_r10.4.1_e8.2_400bps_fast@v5.0.0
GAAGCGACAGCGTATGCGCGTGTTTAAGTTCGACTGGTTCTCTGCCACCGGTACCGCCATTCTTTTTGCTGCCCTGCTCTCGATTGTCTGGCTGAAGATGAAACCATCTGACGCTATCAGCGCCTTCGGCAGCACGCTGAAGGACTGGCTCTGCCTATCTACTCCATCGGTATGGTGCTGGCGTTCGCCTTTATCTCGAACTATTCCGGACTATCATCAACGCTGGCGCTGGCGCTCGCACACACCGGCCATGCATTCCGCCTTTTCTCTCTCGCCGTTCCTCGGCTGGCTTGGTGTCTTCCTGACCGGATCGGATACCTCATCTAACGCCCTGTTCGCCGCCCTGCAAGCCGCTGCAGCACAACAAATTGGCGTTTCTGACCTGTTGTTGGTTGCCGCCAACACCGCCGGTGGTGTCGCCGGTTAAGATGATCTCTTCCGCAATCTATCGCTATCACCTATGCGGGGATAGGCGTGGTAGGCAAAGAGTCAGATCTCTCGCTTTACCATCAAACGCAGCTAAATCTCACCTGTATGGTCGGCGTGATCGCCACGCTCAGGCTTATGTCTTAACGTGGATAATTTGCTAATGATTGTTTTACCCAGACGCCTGTCAGACAAGGTCCGATCGTGTGCGGGCGCTGATGGTGATG
+
89D>AABD:9:2106.-&)'))**4495@50'&'7.99=54166AH@A>=D>86789A=>=ADSDMIHB@>??BG?=<1<...1>8;9>656M@9869711A?<:422<<GFBC@>E<77<(((EI4116213321882/>=)0'62669)'(01/%%%&%&)+68B?@ABGCAB==>KHDCGB71246:DB64331427/2(%$%%%+,/.>>?>>:99@888B58))*=<@++*)()(((348:8;7/3//:'&&*,*46<64@33)7.@9;<HAAA66B0..@S546>(((*<603878F8<<C+)**0/,'7-,.5//0/3)*2579<;<...6>?96?A811/<546,**42+,--.,03./43-+,-1-/,(+-6>AS?ACD::6771)((-/75.1777.-')-6312596**,1571-*,,,,-420.1*'<<:**-655<,,-299&%&&:''&..6071*+/0.*)*;<76:7+*79.-.%$#($.+%'$&'(;7*&%$()1(76,*+))(%%*0.-*56:HIE??=77''-/-,.345+*+/-(/<';:;?@946@732;<ABEBA::>-.3++/..-26,,11))+>C8==G:66:534H>B5/,,02020---))),()**(.%),;'''.++(*&$&&)

~/temp_test > kmc -sm -m8 -t20 -k21 -ci1 dorado.fastq  /home_local/tmp/MLeT7OaRUA/kmc /home_local/tmp/MLeT7OaRUA
**
Stage 1: 100%
Stage 2: 100%


1st stage: 0.622076s
2nd stage: 0.110107s
3rd stage: 0.0044s
Total    : 0.736583s
Tmp size : 0MB
Tmp size strict memory : 0MB
Tmp total: 0MB

Stats:
   No. of k-mers below min. threshold :            0
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :            0
   No. of unique counted k-mers       :            0
   Total no. of k-mers                :            0
   Total no. of reads                 :            1
   Total no. of super-k-mers          :            0

I realized that this is because of the presence of tab characters in the line1 of the FASTQ file. By converting the tabs into spaces, the expected behaviour is retrieved:

~/temp_test > cat dorado.fastq | tr '\t' ' ' > dorado_notab.fastq

~/temp_test > kmc -sm -m8 -t20 -k21 -ci1 dorado_notab.fastq  /home_local/tmp/MLeT7OaRUA/kmc /home_local/tmp/MLeT7OaRUA
**
Stage 1: 100%
Stage 2: 100%


1st stage: 0.645912s
2nd stage: 0.102941s
3rd stage: 0.002236s
Total    : 0.751089s
Tmp size : 0MB
Tmp size strict memory : 0MB
Tmp total: 0MB

Stats:
   No. of k-mers below min. threshold :            0
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :          633
   No. of unique counted k-mers       :          633
   Total no. of k-mers                :          633
   Total no. of reads                 :            1
   Total no. of super-k-mers          :           93

Could you please fix this issue in a future release ? Many thanks !

Bogdan

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant