Difference between revisions of "Preparing read data"

From GenomeView Manual
Jump to navigation Jump to search
(Created page with "The best format to present short read alignments to GenomeView is the BAM format. You need to have your read data in this format and it has to be aligned. == Aligning reads =...")
 
Line 15: Line 15:
 
You need to have a local copy of [[Install Picard|Picard]] installed and you need to run these commands on the [[Starting command-line|command-line]].
 
You need to have a local copy of [[Install Picard|Picard]] installed and you need to run these commands on the [[Starting command-line|command-line]].
  
In this example we have a read alignment in BAM format called 'alignment.bam'.
+
In this example we have a read alignment in BAM format called 'alignment.sam'. We use the program [http://picard.sourceforge.net/command-line-overview.shtml#SortSam|SortSam] from Picard to sort the file by coordinates. This also works if your aligner gives you a BAM file as output, i.e. 'alignment.bam'
  
 
<code lang='bash'>
 
<code lang='bash'>
 
+
java -Xmx1g -jar SortSam.jar I=aligned.sam O=sorted.bam SO=coordinate
 
</code>
 
</code>
 
  
 
=== Sorting and indexing with SAMtools (Mac OS and Linux) ===
 
=== Sorting and indexing with SAMtools (Mac OS and Linux) ===

Revision as of 23:44, 17 October 2013

The best format to present short read alignments to GenomeView is the BAM format. You need to have your read data in this format and it has to be aligned.

Aligning reads

GenomeView is a visualization tool and does not do the computationally intensive read alignment. There are however dozens of tools already available to do this job.

If you need help aligning your reads, you may want to have a look at the Recipe to align reads to get some ideas.

Preparing for visualization

To prepare a BAM file straight from the aligner there are a few more steps you may have to take to get your data in the right format.

  1. Sort reads based on genomic coordinates
  2. Index sorted reads.

Sorting and indexing with Picard (OS independent)

You need to have a local copy of Picard installed and you need to run these commands on the command-line.

In this example we have a read alignment in BAM format called 'alignment.sam'. We use the program [1] from Picard to sort the file by coordinates. This also works if your aligner gives you a BAM file as output, i.e. 'alignment.bam'

java -Xmx1g -jar SortSam.jar I=aligned.sam O=sorted.bam SO=coordinate

Sorting and indexing with SAMtools (Mac OS and Linux)

Steps to get from the various aligner formats to the SAM format are available on the SAMtools website.

Steps to go from SAM to indexed BAM.

samtools faidx reference.fasta (will create reference.fasta.fai for the next step)

samtools view -bS -t reference.fasta.fai alignment.sam -o alignment.bam

samtools sort alignment.bam sorted (will create sorted.bam)

samtools index sorted.bam (will create sorted.bam.bai, which is read by GenomeView together with the bam file)


Summary visualizations

Coverage plots

If you are primarily interested in the read coverage and not in individual reads, you may want to create coverage plots.

Variants

If you are investigating SNPs or other genetic variants, you also may want to create variant calls.