Retirado de: https://playingwithgenomics.wordpress.com/2015/07/15/producing-a-bam-file-and-extracting-unique-reads-from-bowtie2-results/
I am always looking for ways to keep my disk usage down. Especially since I’ve been mapping close to 100 Chip-Seq files. I find that 1) piping Bowtie2 output into samtools to create a bam file and 2) keeping only the uniquely mapped reads help a lot. Here is how I do those:
I’m dealing with single-end data here. You can modify the command if you are dealing with paired-end data. Text written in red should be substituted according to your data.
Converting Bowtie2 output to bam file
bowtie2 -x BOWTIE2_INDEX -\ -p NUM_THREADS \ -U INPUT_FILE.fastq.gz | samtools view -bS -t \ YOUR_GENOME_INDEX.fa.fai - > output.bam
Counting unique reads
You basically can exploit the XS tag that are set by Bowtie2 for
reads that can be mapped in multiple places. Therefore, uniquely mapped
reads lack the XS tag. The following command exclude unmapped (-F 4) as
well as multiple-mapped (grep -v “XS:”) reads
samtools view -F 4 input.bam | grep -v "XS:" | wc -l
Extracting unique reads
We can use the above concept to not only count uniquely mapped reads,
but also extract them into a separate bam file. First, we need to
provide a header file, which we can get from the original bam file.
Then, we concatenate the header and the extracted uniquely mapped reads,
based on the lack of XS tag.
samtools view -H input.bam > header.sam samtools view -F 4 input.bam | grep -v "XS:" | cat header.sam - | \ samtools view -b - > unique.bam rm header.sam
Comentários
Postar um comentário