NGS software development

【NGS Software Development】Using SeqAn3 to Read FASTQ Format

SeqAn3 was a library that supports modern C++ for data preprocessing of NGS data. In this article, we will explain how to use SeqAn3 to read raw data (FASTQ) from disk using C++.

The Hello World of Seqan3

After installation, we can use the debug_stream method to see if everything worked correctly. Here’s a sample test code:

C++
#include <seqan3/core/debug_stream.hpp> 
 
int main()
{
    seqan3::debug_stream << "Hello World!";
}

After compiling the source code and run, we will get:

Hello World

Seqan3函式庫讀取fastq

seqan3讀取資料的方式為使用建構子建立一個sequence_file_input的物件(seqan3::sequence_file_input fin{std :: cin, format_fasta{}}; ),之後就可以用迭代器來取得每一筆read。這裡示範如下,用seqan3::sequence_file_input fin{“R1.fastq”};讀取檔案R1.fastq,然後用迭代器走訪每個read。每個read會有id,sequence,以及base_qualities三個主要的屬性可以存取。

C++
#include <iostream>
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/io/sequence_file/input.hpp>
#include <string>
int main() {
 seqan3::sequence_file_input fin{"R1.fastq"};
 for (auto& read : fin) {
   seqan3::debug_stream << "id:  " << read.id() << std::endl;
   seqan3::debug_stream << "sequence: " << read.sequence() << std::endl;
   seqan3::debug_stream << "base qualities: " << read.base_qualities() << std::endl;
 }
 return 0;
}

另外,也可以省略auto& read,直接用類似tuple的方式寫:

C++
#include <iostream>
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/io/sequence_file/input.hpp>
#include <string>
int main() {
 seqan3::sequence_file_input fin{"R1.fastq"};
 for (auto& [seq , id, bqual] : fin) {
   seqan3::debug_stream << "id:  " << id << std::endl;
   seqan3::debug_stream << "sequence: " << seq<< std::endl;
   seqan3::debug_stream << "base qualities: " << bqual << std::endl;
 }
 return 0;
}

Leave a Reply

Your email address will not be published. Required fields are marked *

en_USEnglish