Retirado de: https://playingwithgenomics.wordpress.com/2015/07/15/producing-a-bam-file-and-extracting-unique-reads-from-bowtie2-results/

I am always looking for ways to keep my disk usage down. Especially since I’ve been mapping close to 100 Chip-Seq files. I find that 1) piping Bowtie2 output into samtools to create a bam file and 2) keeping only the uniquely mapped reads help a lot. Here is how I do those:

I’m dealing with single-end data here. You can modify the command if you are dealing with paired-end data. Text written in red should be substituted according to your data.

Converting Bowtie2 output to bam file

bowtie2 -x BOWTIE2_INDEX -\
-p NUM_THREADS \
-U INPUT_FILE.fastq.gz | samtools view -bS -t \
YOUR_GENOME_INDEX.fa.fai - > output.bam

Counting unique reads
You basically can exploit the XS tag that are set by Bowtie2 for reads that can be mapped in multiple places. Therefore, uniquely mapped reads lack the XS tag. The following command exclude unmapped (-F 4) as well as multiple-mapped (grep -v “XS:”) reads

samtools view -F 4 input.bam | grep -v "XS:" | wc -l

Extracting unique reads
We can use the above concept to not only count uniquely mapped reads, but also extract them into a separate bam file. First, we need to provide a header file, which we can get from the original bam file. Then, we concatenate the header and the extracted uniquely mapped reads, based on the lack of XS tag.

samtools view -H input.bam > header.sam
samtools view -F 4 input.bam | grep -v "XS:" | cat header.sam - | \
samtools view -b - > unique.bam
rm header.sam

Comentários

Postagens mais visitadas deste blog

Instalar o VMPlayer no Linux Ubuntu

MitoBim results