2015年7月8日星期三

MAQ: "Inconsistent sequence name" in fastq2bfq

When using fastq2bfq command covert .fastq to .bfq, sometimes it gives error/warning: 

[seq_read_fastq] Inconsistent sequence name: @XXXXXXXXX. Continue anyway. 

This terminal printout slows down the file conversion, the possible solution is to remove the content after '+' in every third-line [1], like this:

@ the header info
ATCGATCG...
+
quality scores....


a easy python code to remove the content after '+' of 3rd line and write everything in another file:  

======================================
#!/usr/bin/env python

writer = open("new_fastq_file.fastq", 'w')
with open("original_fastq_file.fastq") as f:
for line in f:
                # change '+SRR' to the first 4~5 letters in 3rd line of your fastq file. 
if '+SRR' in line:             
line = '+\n'
writer.write(line)
======================================