https://manual.genomeview.org/api.php?action=feedcontributions&user=Thomas&feedformat=atomGenomeView Manual - User contributions [en]2024-03-28T21:38:12ZUser contributionsMediaWiki 1.35.6https://manual.genomeview.org/index.php?title=Preparing_and_loading_data&diff=9995Preparing and loading data2013-10-18T15:07:06Z<p>Thomas: </p>
<hr />
<div><br />
{{TOC|align=right}}<br />
<br />
<br />
<br />
There are several easy ways to load up data into GenomeView. Before you load your data, you may want to make sure you're using a supported format from the list below. Generally, GenomeView will notify you if it doesn't understand your data.<br />
<br />
==Loading data ==<br />
You can load your data files ...<br />
* ... by selecting "work with my data" in the [[Genome Explorer]]<br />
* ... by dragging them onto GenomeView<br />
* ... by selecting the 'File' menu and then 'Load data...' ([http://genomeview.org/content/load-data tutorial])<br />
* ... by pressing CTRL+O<br />
* ... by specifying them as argument on the [[command-line use|command-line]]<br />
* ... by loading a [[session file]]<br />
<br />
<br />
You can load preloaded data ...<br />
* ... by selecting a genome from the [[Genome Explorer]]<br />
* ... following a link from a GenomeView enabled [[platform|website]]<br />
<br />
==Data preparation recipe==<br />
<br />
# Match identifiers: GenomeView uses the identifiers to link different sources, so make sure that the identifiers match (case-sensitive).<br />
# Create indices for data files that need it (check table below)<br />
# Convert file formats to get desired visuals (check table below)<br />
# Load data (see above)<br />
<br />
'''Why index files?''' Indexing will create a look-up table for GenomeView to load data on-the-fly. This will will speed up browsing and loading speed, as well as significantly reduce the amount of memory you need. For some file formats we recommend you create indices, for other we do not. See the table below for more details and links to instructions.<br />
<br />
<br />
==Recommended file formats ==<br />
This is a list of file formats that are recommended for different data types. See the full list of data types in the section below.<br />
<br />
<br />
<br />
<table border=1><br />
<tr><th>Data type</th><th>Recommended file format</th><th>Instructions</th></tr><br />
<tr><td>Reference sequence</td><td>fasta</td><td>[[Preparing reference sequence]]</td></tr><br />
<tr><td>Annotation</td><td>GFF3</td><td>[[Preparing annotation]]</td></tr><br />
<tr><td>Read a alignments</td><td>BAM</td><td>[[Preparing read data]]</td></tr><br />
<tr><td>Variation</td><td>VCF</td><td>[[Preparing VCF data]]</td></tr><br />
<tr><td>Coverage summary - continuous values</td><td>TDF</td><td>[[Preparing value data]]</td></tr><br />
<tr><td>Whole genome alignments</td><td>MAF</td><td>[[Preparing whole genome alignments]]</td></tr><br />
<br />
</table><br />
<br />
== Supported data formats ==<br />
<br />
=== Reference sequence ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<br />
<tr><th></th><th></th><th></th><th>unindexed***</th><th>indexed</th><th></th></tr><br />
<br />
<tr><td valign="top" rowspan=2>Reference sequence</td><td><b>fasta</b> <sup>¤</sup></td><td>Recommended<br/>[[Index FASTA]]</td><td>50 Mb</td><td>unlimited</td><td>GenomeView will query the user create index for you if you don't have one and the file is very large.</td></tr><br />
<br />
<tr><td>embl, genbank</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.</td></tr><br />
<br />
</table><br />
=== Annotation ===<br />
<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=4>Annotation</td><td><b>gff</b> <sup>&#164;</sup></td><td>Not recommended<br />
[[Index GFF]]</td><td>50 Mb</td><td>unlimited</td><td></td></tr><br />
<br />
<tr><td>embl, genbank</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.</td></tr><br />
<br />
<tr><td>bed</td><td>Not recommended [[Index BED]]</td><td>50 Mb or less</td><td>unlimited</td><td>By default data from a bed file is added to the CDS track, if you want it in a different track, you have to add a line a the top of the file 'track name=Track_name'. No white-space is allowed in the track name.</td></tr><br />
<tr><td>ptt, tbl </td><td>Not possible</td><td>50 Mb or less</td><td>--</td><td>Other standard annotation formats GenomeView understands</td></tr><br />
<tr><td></td><td>various formats</td><td>Not possible</td><td>50 Mb or less</td><td>--</td><td>GenomeView can directly parse the output of the following programs: Blast, GeneMark, TransTermHP, FindPeaks, MaqSNP, tRNA-scan</td></tr><br />
</table><br />
<br />
=== Whole genome alignments ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=3>Multiple genome alignment</td><td><b>maf</b> <sup>&#164;</sup></td><td>Recommended</td><td>100 Mb</td><td>unlimited</td><td>GenomeView will prompt you to create a compressed maf file and index it for you, if you're trying to load an unindexed maf file.<br/>MAF is the recommended file format for whole genome alignemnt of large/complex genomes</td></tr><br />
<br />
<tr><td><b>multi-fasta</b> <sup>&#164;</sup></td><td>Not possible</td><td>100 Mb</td><td>--</td><td>Recommended for small/simple genomes with a near 1:1 relationship.</td></tr><br />
<br />
<tr><td>aln, ClustalW</td><td>Not possible</td><td>100 Mb</td><td>--</td></tr><br />
</table><br />
<br />
=== Read alignments ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=2>Sequence read alignment</td><td><b>bam</b> <sup>&#164;</sup><br>[[Preparing read data]]</td><td>Required</td><td>--</td><td>unlimited</td><td>GenomeView will prompt you if there is no index and will create one for you. GenomeView can not automatically sort BAM files.</td></tr><br />
<br />
<tr><td>MAQ, MapView, BroadSolexa</td><td>Not possible</td><td>100 Mb</td><td>--</td></tr><br />
</table><br />
<br />
=== Read coverage summary - continuous value data ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=4>Read coverage summary</td><td><b> [[tdf]]</b> <sup>&#164;</sup></td><td>Native</td><td>unlimited</td><td>unlimited</td><td>[[TDF]] files can be created with the [[bam2tdf]] tool that is available for [https://sourceforge.net/projects/genomeview/files/TDformat/ download.]</td></tr><br />
<br />
<tr><td>bigwig</td><td>Native</td><td>unlimited</td><td>unlimited</td><td>This format can be used for any wig file, not just read coverage</td></tr><br />
<br />
<tr><td>[[pileup]]</td><td>Required</td><td>--</td><td>unlimited</td><td>The pileup format becomes slow when you have extreme read depth (>5000 x coverage)</td></tr><br />
<br />
<tr><td>wig</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>We strongly recommend to [[wig2tdf|convert your wig files to TDF]]. <br />
GenomeView can automatically convert wig files to TDF. Caveats: 'track' information should all be on a single line, 'browser' lines will be ignored as the are specific to the UCSC Genome Browser. WIG files need to be sorted by chromosome and by genomic coordinate within the chromosome. BedGraph as well as Wiggle_0 format is supported. For the wiggle_0 type, both variableStep and fixedStep should work.</td></tr><br />
</table><br />
<br />
=== Genome variation and diversity ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td>Genome variation</td><td><b> [[vcf]]</b> <sup>&#164;</sup></td><td>Not recommended</td><td>--</td><td>unlimited</td><td>It is recommended to run [[reducevcf]] on VCF prior to loading them, this will speed up the loading time significantly.</td></tr><br />
<br />
</table><br />
<br />
<br />
=== Allele diversity ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td>Allele diversity summary</td><td><b> [[pileup]]</b> <sup>&#164;</sup></td><td>Required</td><td>--</td><td>unlimited</td><td>The pileup format becomes slow when you have extreme read depth (>5000 x coverage)</td></tr><br />
<br />
</table><br />
* Indicates whether this file format can/should be indexed. <br/><br />
** Recommended maximum file size. First value is without index, the second with index. This values are only guidelines. When loading multiple data sets, you should add the sizes.<br/><br />
*** Unindexed data files can be gzip compressed.<br />
<br />
<sup>&#164;</sup> Recommended file format for this data type.<br />
<br />
<br />
<br />
<br />
<h2>Output formats</h2><br />
(Modified) annotations can be saved as either GFF or EMBL.<br />
<br />
All data that is loaded can be exported in their original format. This will not include modifications.<br />
<br />
<h2>Converting formats</h2><br />
<a href="http://genomeview.org/loki/">We offer a few tools to convert files between formats.</a><br />
<br />
== Previous documentation pages ==<br />
<br />
<br />
<br />
[http://genomeview.org/content/data-formats Supported data formats]<br />
[http://genomeview.org/content/preparing-fasta-files Fasta files]<br />
<br />
[http://genomeview.org/content/preparing-feature-files Feature files]<br />
<br />
[http://genomeview.org/content/preparing-short-read-alignments Read data]<br />
<br />
[http://genomeview.org/content/preparing-pileup Coverage plots]</div>Thomashttps://manual.genomeview.org/index.php?title=Index_BED&diff=9994Index BED2013-10-18T15:05:26Z<p>Thomas: Redirected page to Preparing annotation</p>
<hr />
<div>#REDIRECT [[Preparing annotation]]</div>Thomashttps://manual.genomeview.org/index.php?title=Preparing_annotation&diff=9993Preparing annotation2013-10-18T15:04:54Z<p>Thomas: Created page with "Large feature files need to be indexed before you can use them properly in GenomeView. The definition of large is not strict in the sense that it depends on both the real siz..."</p>
<hr />
<div>Large feature files need to be indexed before you can use them properly in GenomeView.<br />
<br />
The definition of large is not strict in the sense that it depends on both the real size of the file, as well as the number of features in the file. <br />
<br />
<strong>Recommendations:</strong><br />
<ul><br />
<li>Only index feature files that are larger than 5 Mb or even 10 Mb when compressed with GZIP.</li><br />
<li>Only index feature files that do NOT contain composite features, i.e. feature consisting of multiple locations. The prime example are eukaryote genes. </li><br />
<li>Typically the annotation of a genome does not need to be indexed if it just contains genes, mRNA, CDS and exons</li><br />
<li>You should not included multiple types (mRNA,CDS, ...) of annotation in a single file as all features will be loaded in a single track with the file name as label. We suggest you put each type in its own file.</li><br />
</ul><br />
<br />
<strong>Instructions:</strong><br />
To index a file, you need to pre-process it with tabix, much like is done with pile-up files.<br />
<br />
Tabix can be downloaded from the [https://sourceforge.net/projects/samtools/files/tabix/ tabix download page].<br />
<br />
== BED formatted files ==<br />
<br />
sort -k1,1 -k2,2n input.bed | bgzip -c > compressed.bed.bgz<br />
tabix -p bed compressed.bed.bgz<br />
<br />
You will get two new files: (1) a bgz file and (2) a tbi file. Load the bgz file in GenomeView.<br />
<br />
<em>Note that indexing will not work with BED files that have a UCSC header ("track name=blah")</em><br />
<br />
== GFF formatted file ==<br />
<br />
<b>Warning!: Compound features will be broken up during indexing of gff files.</b><br />
<br />
sort -T /group/tmp -k1,1 -k4,4n input.gff | bgzip -c > compressed.gff.bgz<br />
tabix -p gff compressed.gff.bgz<br />
<br />
<br />
You will get two new files: (1) a bgz file and (2) a tbi file. Load the bgz file in GenomeView.<br />
<br />
<em>Note that the structure of genes and the type of annotation features will be lost when indexing gff files.</em></div>Thomashttps://manual.genomeview.org/index.php?title=Index_GFF&diff=9992Index GFF2013-10-18T15:04:41Z<p>Thomas: Redirected page to Preparing annotation</p>
<hr />
<div>#REDIRECT[[Preparing annotation]]</div>Thomashttps://manual.genomeview.org/index.php?title=Preparing_VCF_data&diff=9991Preparing VCF data2013-10-18T15:03:43Z<p>Thomas: Created page with "VCF (Variant Call Format) is a text file format that contains meta-information lines, a header line, and then data lines each each containing information about genetic variati..."</p>
<hr />
<div>VCF (Variant Call Format) is a text file format that contains meta-information lines, a header line, and then data lines each each containing information about genetic variation at a position in the genome. <br />
<br />
[http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41/ Formal specification]<br />
<br />
<br />
== Data preparation ==<br />
VCF files don't need to be indexed, but it is recommended to reduce their size with the tool [[reducevcf]]<br />
<br />
== Visualization ==<br />
--insert picture with insertion, deletion, ...--</div>Thomashttps://manual.genomeview.org/index.php?title=Vcf&diff=9990Vcf2013-10-18T15:03:32Z<p>Thomas: Redirected page to Preparing VCF data</p>
<hr />
<div>#REDIRECT[[Preparing VCF data]]</div>Thomashttps://manual.genomeview.org/index.php?title=Preparing_value_data&diff=9989Preparing value data2013-10-18T15:02:47Z<p>Thomas: Created page with "TDF is a binary file format that contains preprocessed summary information for display in a genome browser. This format is an alternative to wig and the bigwig formats and is..."</p>
<hr />
<div>TDF is a binary file format that contains preprocessed summary information for display in a genome browser.<br />
<br />
This format is an alternative to wig and the bigwig formats and is typically used for data that has a value per chromosomal position, like for example coverage data.<br />
<br />
You can create TDF files directly from BAM files or from wig files.<br />
<br />
== Creating TDF files ==<br />
* [[bam2tdf]]: convert read alignment to coverage plot<br />
* [[wig2tdf]]: convert wig formatted data to tdf formatted data</div>Thomashttps://manual.genomeview.org/index.php?title=Tdf&diff=9988Tdf2013-10-18T15:02:42Z<p>Thomas: Redirected page to Preparing value data</p>
<hr />
<div>#REDIRECT[[Preparing value data]]</div>Thomashttps://manual.genomeview.org/index.php?title=Tdf&diff=9987Tdf2013-10-18T15:02:24Z<p>Thomas: Blanked the page</p>
<hr />
<div></div>Thomashttps://manual.genomeview.org/index.php?title=Preparing_and_loading_data&diff=9986Preparing and loading data2013-10-18T15:01:59Z<p>Thomas: /* Recommended file formats */</p>
<hr />
<div><br />
{{TOC|align=right}}<br />
<br />
<br />
<br />
There are several easy ways to load up data into GenomeView. Before you load your data, you may want to make sure you're using a supported format from the list below. Generally, GenomeView will notify you if it doesn't understand your data.<br />
<br />
==Loading data ==<br />
You can load your data files ...<br />
* ... by selecting "work with my data" in the [[Genome Explorer]]<br />
* ... by dragging them onto GenomeView<br />
* ... by selecting the 'File' menu and then 'Load data...' ([http://genomeview.org/content/load-data tutorial])<br />
* ... by pressing CTRL+O<br />
* ... by specifying them as argument on the [[command-line use|command-line]]<br />
* ... by loading a [[session file]]<br />
<br />
<br />
You can load preloaded data ...<br />
* ... by selecting a genome from the [[Genome Explorer]]<br />
* ... following a link from a GenomeView enabled [[platform|website]]<br />
<br />
==Data preparation recipe==<br />
<br />
# Match identifiers: GenomeView uses the identifiers to link different sources, so make sure that the identifiers match (case-sensitive).<br />
# Create indices for data files that need it (check table below)<br />
# Convert file formats to get desired visuals (check table below)<br />
# Load data (see above)<br />
<br />
== Why index files? ==<br />
Indexing will create a look-up table for GenomeView to load data on-the-fly. This will will speed up browsing and loading speed, as well as significantly reduce the amount of memory you need. For some file formats we recommend you create indices, for other we do not. See the table below for more details and links to instructions.<br />
<br />
<br />
==Recommended file formats ==<br />
This is a list of file formats that are recommended for different data types. See the full list of data types in the section below.<br />
<br />
<br />
<br />
<table border=1><br />
<tr><th>Data type</th><th>Recommended file format</th><th>Instructions</th></tr><br />
<tr><td>Reference sequence</td><td>fasta</td><td>[[Preparing reference sequence]]</td></tr><br />
<tr><td>Annotation</td><td>GFF3</td><td>[[Preparing annotation]]</td></tr><br />
<tr><td>Read a alignments</td><td>BAM</td><td>[[Preparing read data]]</td></tr><br />
<tr><td>Variation</td><td>VCF</td><td>[[Preparing VCF data]]</td></tr><br />
<tr><td>Coverage summary - continuous values</td><td>TDF</td><td>[[Preparing value data]]</td></tr><br />
<tr><td>Whole genome alignments</td><td>MAF</td><td>[[Preparing whole genome alignments]]</td></tr><br />
<br />
</table><br />
<br />
== Supported data formats ==<br />
<br />
=== Reference sequence ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<br />
<tr><th></th><th></th><th></th><th>unindexed***</th><th>indexed</th><th></th></tr><br />
<br />
<tr><td valign="top" rowspan=2>Reference sequence</td><td><b>fasta</b> <sup>¤</sup></td><td>Recommended<br/>[[Index FASTA]]</td><td>50 Mb</td><td>unlimited</td><td>GenomeView will query the user create index for you if you don't have one and the file is very large.</td></tr><br />
<br />
<tr><td>embl, genbank</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.</td></tr><br />
<br />
</table><br />
=== Annotation ===<br />
<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=4>Annotation</td><td><b>gff</b> <sup>&#164;</sup></td><td>Not recommended<br />
[[Index GFF]]</td><td>50 Mb</td><td>unlimited</td><td></td></tr><br />
<br />
<tr><td>embl, genbank</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.</td></tr><br />
<br />
<tr><td>bed</td><td>Not recommended [[Index BED]]</td><td>50 Mb or less</td><td>unlimited</td><td>By default data from a bed file is added to the CDS track, if you want it in a different track, you have to add a line a the top of the file 'track name=Track_name'. No white-space is allowed in the track name.</td></tr><br />
<tr><td>ptt, tbl </td><td>Not possible</td><td>50 Mb or less</td><td>--</td><td>Other standard annotation formats GenomeView understands</td></tr><br />
<tr><td></td><td>various formats</td><td>Not possible</td><td>50 Mb or less</td><td>--</td><td>GenomeView can directly parse the output of the following programs: Blast, GeneMark, TransTermHP, FindPeaks, MaqSNP, tRNA-scan</td></tr><br />
</table><br />
<br />
=== Whole genome alignments ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=3>Multiple genome alignment</td><td><b>maf</b> <sup>&#164;</sup></td><td>Recommended</td><td>100 Mb</td><td>unlimited</td><td>GenomeView will prompt you to create a compressed maf file and index it for you, if you're trying to load an unindexed maf file.<br/>MAF is the recommended file format for whole genome alignemnt of large/complex genomes</td></tr><br />
<br />
<tr><td><b>multi-fasta</b> <sup>&#164;</sup></td><td>Not possible</td><td>100 Mb</td><td>--</td><td>Recommended for small/simple genomes with a near 1:1 relationship.</td></tr><br />
<br />
<tr><td>aln, ClustalW</td><td>Not possible</td><td>100 Mb</td><td>--</td></tr><br />
</table><br />
<br />
=== Read alignments ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=2>Sequence read alignment</td><td><b>bam</b> <sup>&#164;</sup><br>[[Preparing read data]]</td><td>Required</td><td>--</td><td>unlimited</td><td>GenomeView will prompt you if there is no index and will create one for you. GenomeView can not automatically sort BAM files.</td></tr><br />
<br />
<tr><td>MAQ, MapView, BroadSolexa</td><td>Not possible</td><td>100 Mb</td><td>--</td></tr><br />
</table><br />
<br />
=== Read coverage summary - continuous value data ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=4>Read coverage summary</td><td><b> [[tdf]]</b> <sup>&#164;</sup></td><td>Native</td><td>unlimited</td><td>unlimited</td><td>[[TDF]] files can be created with the [[bam2tdf]] tool that is available for [https://sourceforge.net/projects/genomeview/files/TDformat/ download.]</td></tr><br />
<br />
<tr><td>bigwig</td><td>Native</td><td>unlimited</td><td>unlimited</td><td>This format can be used for any wig file, not just read coverage</td></tr><br />
<br />
<tr><td>[[pileup]]</td><td>Required</td><td>--</td><td>unlimited</td><td>The pileup format becomes slow when you have extreme read depth (>5000 x coverage)</td></tr><br />
<br />
<tr><td>wig</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>We strongly recommend to [[wig2tdf|convert your wig files to TDF]]. <br />
GenomeView can automatically convert wig files to TDF. Caveats: 'track' information should all be on a single line, 'browser' lines will be ignored as the are specific to the UCSC Genome Browser. WIG files need to be sorted by chromosome and by genomic coordinate within the chromosome. BedGraph as well as Wiggle_0 format is supported. For the wiggle_0 type, both variableStep and fixedStep should work.</td></tr><br />
</table><br />
<br />
=== Genome variation and diversity ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td>Genome variation</td><td><b> [[vcf]]</b> <sup>&#164;</sup></td><td>Not recommended</td><td>--</td><td>unlimited</td><td>It is recommended to run [[reducevcf]] on VCF prior to loading them, this will speed up the loading time significantly.</td></tr><br />
<br />
</table><br />
<br />
<br />
=== Allele diversity ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td>Allele diversity summary</td><td><b> [[pileup]]</b> <sup>&#164;</sup></td><td>Required</td><td>--</td><td>unlimited</td><td>The pileup format becomes slow when you have extreme read depth (>5000 x coverage)</td></tr><br />
<br />
</table><br />
* Indicates whether this file format can/should be indexed. <br/><br />
** Recommended maximum file size. First value is without index, the second with index. This values are only guidelines. When loading multiple data sets, you should add the sizes.<br/><br />
*** Unindexed data files can be gzip compressed.<br />
<br />
<sup>&#164;</sup> Recommended file format for this data type.<br />
<br />
<br />
<br />
<br />
<h2>Output formats</h2><br />
(Modified) annotations can be saved as either GFF or EMBL.<br />
<br />
All data that is loaded can be exported in their original format. This will not include modifications.<br />
<br />
<h2>Converting formats</h2><br />
<a href="http://genomeview.org/loki/">We offer a few tools to convert files between formats.</a><br />
<br />
== Previous documentation pages ==<br />
<br />
<br />
<br />
[http://genomeview.org/content/data-formats Supported data formats]<br />
[http://genomeview.org/content/preparing-fasta-files Fasta files]<br />
<br />
[http://genomeview.org/content/preparing-feature-files Feature files]<br />
<br />
[http://genomeview.org/content/preparing-short-read-alignments Read data]<br />
<br />
[http://genomeview.org/content/preparing-pileup Coverage plots]</div>Thomashttps://manual.genomeview.org/index.php?title=Preparing_and_loading_data&diff=9985Preparing and loading data2013-10-18T14:59:21Z<p>Thomas: /* Read coverage summary */</p>
<hr />
<div><br />
{{TOC|align=right}}<br />
<br />
<br />
<br />
There are several easy ways to load up data into GenomeView. Before you load your data, you may want to make sure you're using a supported format from the list below. Generally, GenomeView will notify you if it doesn't understand your data.<br />
<br />
==Loading data ==<br />
You can load your data files ...<br />
* ... by selecting "work with my data" in the [[Genome Explorer]]<br />
* ... by dragging them onto GenomeView<br />
* ... by selecting the 'File' menu and then 'Load data...' ([http://genomeview.org/content/load-data tutorial])<br />
* ... by pressing CTRL+O<br />
* ... by specifying them as argument on the [[command-line use|command-line]]<br />
* ... by loading a [[session file]]<br />
<br />
<br />
You can load preloaded data ...<br />
* ... by selecting a genome from the [[Genome Explorer]]<br />
* ... following a link from a GenomeView enabled [[platform|website]]<br />
<br />
==Data preparation recipe==<br />
<br />
# Match identifiers: GenomeView uses the identifiers to link different sources, so make sure that the identifiers match (case-sensitive).<br />
# Create indices for data files that need it (check table below)<br />
# Convert file formats to get desired visuals (check table below)<br />
# Load data (see above)<br />
<br />
== Why index files? ==<br />
Indexing will create a look-up table for GenomeView to load data on-the-fly. This will will speed up browsing and loading speed, as well as significantly reduce the amount of memory you need. For some file formats we recommend you create indices, for other we do not. See the table below for more details and links to instructions.<br />
<br />
<br />
==Recommended file formats ==<br />
This is a list of file formats that are recommended for different data types. See the full list of data types in the section below.<br />
<br />
<br />
<br />
<table border=1><br />
<tr><th>Data type</th><th>Recommended file format</th><th>Instructions</th></tr><br />
<tr><td>Reference sequence</td><td>fasta</td><td>[[Preparing reference sequence]]</td></tr><br />
<tr><td>Annotation</td><td>gff3</td><td>[[Preparing annotation]]</td></tr><br />
<tr><td>Read a alignments</td><td>BAM</td><td>[[Preparing read data]]</td></tr><br />
<tr><td>Variation</td><td>VCF</td><td>[[Preparing VCF data]]</td></tr><br />
<tr><td></td><td></td><td></td></tr><br />
<tr><td></td><td></td><td></td></tr><br />
</table><br />
<br />
== Supported data formats ==<br />
<br />
=== Reference sequence ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<br />
<tr><th></th><th></th><th></th><th>unindexed***</th><th>indexed</th><th></th></tr><br />
<br />
<tr><td valign="top" rowspan=2>Reference sequence</td><td><b>fasta</b> <sup>¤</sup></td><td>Recommended<br/>[[Index FASTA]]</td><td>50 Mb</td><td>unlimited</td><td>GenomeView will query the user create index for you if you don't have one and the file is very large.</td></tr><br />
<br />
<tr><td>embl, genbank</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.</td></tr><br />
<br />
</table><br />
=== Annotation ===<br />
<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=4>Annotation</td><td><b>gff</b> <sup>&#164;</sup></td><td>Not recommended<br />
[[Index GFF]]</td><td>50 Mb</td><td>unlimited</td><td></td></tr><br />
<br />
<tr><td>embl, genbank</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.</td></tr><br />
<br />
<tr><td>bed</td><td>Not recommended [[Index BED]]</td><td>50 Mb or less</td><td>unlimited</td><td>By default data from a bed file is added to the CDS track, if you want it in a different track, you have to add a line a the top of the file 'track name=Track_name'. No white-space is allowed in the track name.</td></tr><br />
<tr><td>ptt, tbl </td><td>Not possible</td><td>50 Mb or less</td><td>--</td><td>Other standard annotation formats GenomeView understands</td></tr><br />
<tr><td></td><td>various formats</td><td>Not possible</td><td>50 Mb or less</td><td>--</td><td>GenomeView can directly parse the output of the following programs: Blast, GeneMark, TransTermHP, FindPeaks, MaqSNP, tRNA-scan</td></tr><br />
</table><br />
<br />
=== Whole genome alignments ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=3>Multiple genome alignment</td><td><b>maf</b> <sup>&#164;</sup></td><td>Recommended</td><td>100 Mb</td><td>unlimited</td><td>GenomeView will prompt you to create a compressed maf file and index it for you, if you're trying to load an unindexed maf file.<br/>MAF is the recommended file format for whole genome alignemnt of large/complex genomes</td></tr><br />
<br />
<tr><td><b>multi-fasta</b> <sup>&#164;</sup></td><td>Not possible</td><td>100 Mb</td><td>--</td><td>Recommended for small/simple genomes with a near 1:1 relationship.</td></tr><br />
<br />
<tr><td>aln, ClustalW</td><td>Not possible</td><td>100 Mb</td><td>--</td></tr><br />
</table><br />
<br />
=== Read alignments ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=2>Sequence read alignment</td><td><b>bam</b> <sup>&#164;</sup><br>[[Preparing read data]]</td><td>Required</td><td>--</td><td>unlimited</td><td>GenomeView will prompt you if there is no index and will create one for you. GenomeView can not automatically sort BAM files.</td></tr><br />
<br />
<tr><td>MAQ, MapView, BroadSolexa</td><td>Not possible</td><td>100 Mb</td><td>--</td></tr><br />
</table><br />
<br />
=== Read coverage summary - continuous value data ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=4>Read coverage summary</td><td><b> [[tdf]]</b> <sup>&#164;</sup></td><td>Native</td><td>unlimited</td><td>unlimited</td><td>[[TDF]] files can be created with the [[bam2tdf]] tool that is available for [https://sourceforge.net/projects/genomeview/files/TDformat/ download.]</td></tr><br />
<br />
<tr><td>bigwig</td><td>Native</td><td>unlimited</td><td>unlimited</td><td>This format can be used for any wig file, not just read coverage</td></tr><br />
<br />
<tr><td>[[pileup]]</td><td>Required</td><td>--</td><td>unlimited</td><td>The pileup format becomes slow when you have extreme read depth (>5000 x coverage)</td></tr><br />
<br />
<tr><td>wig</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>We strongly recommend to [[wig2tdf|convert your wig files to TDF]]. <br />
GenomeView can automatically convert wig files to TDF. Caveats: 'track' information should all be on a single line, 'browser' lines will be ignored as the are specific to the UCSC Genome Browser. WIG files need to be sorted by chromosome and by genomic coordinate within the chromosome. BedGraph as well as Wiggle_0 format is supported. For the wiggle_0 type, both variableStep and fixedStep should work.</td></tr><br />
</table><br />
<br />
=== Genome variation and diversity ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td>Genome variation</td><td><b> [[vcf]]</b> <sup>&#164;</sup></td><td>Not recommended</td><td>--</td><td>unlimited</td><td>It is recommended to run [[reducevcf]] on VCF prior to loading them, this will speed up the loading time significantly.</td></tr><br />
<br />
</table><br />
<br />
<br />
=== Allele diversity ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td>Allele diversity summary</td><td><b> [[pileup]]</b> <sup>&#164;</sup></td><td>Required</td><td>--</td><td>unlimited</td><td>The pileup format becomes slow when you have extreme read depth (>5000 x coverage)</td></tr><br />
<br />
</table><br />
* Indicates whether this file format can/should be indexed. <br/><br />
** Recommended maximum file size. First value is without index, the second with index. This values are only guidelines. When loading multiple data sets, you should add the sizes.<br/><br />
*** Unindexed data files can be gzip compressed.<br />
<br />
<sup>&#164;</sup> Recommended file format for this data type.<br />
<br />
<br />
<br />
<br />
<h2>Output formats</h2><br />
(Modified) annotations can be saved as either GFF or EMBL.<br />
<br />
All data that is loaded can be exported in their original format. This will not include modifications.<br />
<br />
<h2>Converting formats</h2><br />
<a href="http://genomeview.org/loki/">We offer a few tools to convert files between formats.</a><br />
<br />
== Previous documentation pages ==<br />
<br />
<br />
<br />
[http://genomeview.org/content/data-formats Supported data formats]<br />
[http://genomeview.org/content/preparing-fasta-files Fasta files]<br />
<br />
[http://genomeview.org/content/preparing-feature-files Feature files]<br />
<br />
[http://genomeview.org/content/preparing-short-read-alignments Read data]<br />
<br />
[http://genomeview.org/content/preparing-pileup Coverage plots]</div>Thomashttps://manual.genomeview.org/index.php?title=Preparing_and_loading_data&diff=9984Preparing and loading data2013-10-18T14:58:41Z<p>Thomas: /* Recommended file formats */</p>
<hr />
<div><br />
{{TOC|align=right}}<br />
<br />
<br />
<br />
There are several easy ways to load up data into GenomeView. Before you load your data, you may want to make sure you're using a supported format from the list below. Generally, GenomeView will notify you if it doesn't understand your data.<br />
<br />
==Loading data ==<br />
You can load your data files ...<br />
* ... by selecting "work with my data" in the [[Genome Explorer]]<br />
* ... by dragging them onto GenomeView<br />
* ... by selecting the 'File' menu and then 'Load data...' ([http://genomeview.org/content/load-data tutorial])<br />
* ... by pressing CTRL+O<br />
* ... by specifying them as argument on the [[command-line use|command-line]]<br />
* ... by loading a [[session file]]<br />
<br />
<br />
You can load preloaded data ...<br />
* ... by selecting a genome from the [[Genome Explorer]]<br />
* ... following a link from a GenomeView enabled [[platform|website]]<br />
<br />
==Data preparation recipe==<br />
<br />
# Match identifiers: GenomeView uses the identifiers to link different sources, so make sure that the identifiers match (case-sensitive).<br />
# Create indices for data files that need it (check table below)<br />
# Convert file formats to get desired visuals (check table below)<br />
# Load data (see above)<br />
<br />
== Why index files? ==<br />
Indexing will create a look-up table for GenomeView to load data on-the-fly. This will will speed up browsing and loading speed, as well as significantly reduce the amount of memory you need. For some file formats we recommend you create indices, for other we do not. See the table below for more details and links to instructions.<br />
<br />
<br />
==Recommended file formats ==<br />
This is a list of file formats that are recommended for different data types. See the full list of data types in the section below.<br />
<br />
<br />
<br />
<table border=1><br />
<tr><th>Data type</th><th>Recommended file format</th><th>Instructions</th></tr><br />
<tr><td>Reference sequence</td><td>fasta</td><td>[[Preparing reference sequence]]</td></tr><br />
<tr><td>Annotation</td><td>gff3</td><td>[[Preparing annotation]]</td></tr><br />
<tr><td>Read a alignments</td><td>BAM</td><td>[[Preparing read data]]</td></tr><br />
<tr><td>Variation</td><td>VCF</td><td>[[Preparing VCF data]]</td></tr><br />
<tr><td></td><td></td><td></td></tr><br />
<tr><td></td><td></td><td></td></tr><br />
</table><br />
<br />
== Supported data formats ==<br />
<br />
=== Reference sequence ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<br />
<tr><th></th><th></th><th></th><th>unindexed***</th><th>indexed</th><th></th></tr><br />
<br />
<tr><td valign="top" rowspan=2>Reference sequence</td><td><b>fasta</b> <sup>¤</sup></td><td>Recommended<br/>[[Index FASTA]]</td><td>50 Mb</td><td>unlimited</td><td>GenomeView will query the user create index for you if you don't have one and the file is very large.</td></tr><br />
<br />
<tr><td>embl, genbank</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.</td></tr><br />
<br />
</table><br />
=== Annotation ===<br />
<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=4>Annotation</td><td><b>gff</b> <sup>&#164;</sup></td><td>Not recommended<br />
[[Index GFF]]</td><td>50 Mb</td><td>unlimited</td><td></td></tr><br />
<br />
<tr><td>embl, genbank</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.</td></tr><br />
<br />
<tr><td>bed</td><td>Not recommended [[Index BED]]</td><td>50 Mb or less</td><td>unlimited</td><td>By default data from a bed file is added to the CDS track, if you want it in a different track, you have to add a line a the top of the file 'track name=Track_name'. No white-space is allowed in the track name.</td></tr><br />
<tr><td>ptt, tbl </td><td>Not possible</td><td>50 Mb or less</td><td>--</td><td>Other standard annotation formats GenomeView understands</td></tr><br />
<tr><td></td><td>various formats</td><td>Not possible</td><td>50 Mb or less</td><td>--</td><td>GenomeView can directly parse the output of the following programs: Blast, GeneMark, TransTermHP, FindPeaks, MaqSNP, tRNA-scan</td></tr><br />
</table><br />
<br />
=== Whole genome alignments ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=3>Multiple genome alignment</td><td><b>maf</b> <sup>&#164;</sup></td><td>Recommended</td><td>100 Mb</td><td>unlimited</td><td>GenomeView will prompt you to create a compressed maf file and index it for you, if you're trying to load an unindexed maf file.<br/>MAF is the recommended file format for whole genome alignemnt of large/complex genomes</td></tr><br />
<br />
<tr><td><b>multi-fasta</b> <sup>&#164;</sup></td><td>Not possible</td><td>100 Mb</td><td>--</td><td>Recommended for small/simple genomes with a near 1:1 relationship.</td></tr><br />
<br />
<tr><td>aln, ClustalW</td><td>Not possible</td><td>100 Mb</td><td>--</td></tr><br />
</table><br />
<br />
=== Read alignments ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=2>Sequence read alignment</td><td><b>bam</b> <sup>&#164;</sup><br>[[Preparing read data]]</td><td>Required</td><td>--</td><td>unlimited</td><td>GenomeView will prompt you if there is no index and will create one for you. GenomeView can not automatically sort BAM files.</td></tr><br />
<br />
<tr><td>MAQ, MapView, BroadSolexa</td><td>Not possible</td><td>100 Mb</td><td>--</td></tr><br />
</table><br />
<br />
=== Read coverage summary ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=4>Read coverage summary</td><td><b> [[tdf]]</b> <sup>&#164;</sup></td><td>Native</td><td>unlimited</td><td>unlimited</td><td>[[TDF]] files can be created with the [[bam2tdf]] tool that is available for [https://sourceforge.net/projects/genomeview/files/TDformat/ download.]</td></tr><br />
<br />
<tr><td>bigwig</td><td>Native</td><td>unlimited</td><td>unlimited</td><td>This format can be used for any wig file, not just read coverage</td></tr><br />
<br />
<tr><td>[[pileup]]</td><td>Required</td><td>--</td><td>unlimited</td><td>The pileup format becomes slow when you have extreme read depth (>5000 x coverage)</td></tr><br />
<br />
<tr><td>wig</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>We strongly recommend to [[wig2tdf|convert your wig files to TDF]]. <br />
GenomeView can automatically convert wig files to TDF. Caveats: 'track' information should all be on a single line, 'browser' lines will be ignored as the are specific to the UCSC Genome Browser. WIG files need to be sorted by chromosome and by genomic coordinate within the chromosome. BedGraph as well as Wiggle_0 format is supported. For the wiggle_0 type, both variableStep and fixedStep should work.</td></tr><br />
</table><br />
<br />
=== Genome variation and diversity ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td>Genome variation</td><td><b> [[vcf]]</b> <sup>&#164;</sup></td><td>Not recommended</td><td>--</td><td>unlimited</td><td>It is recommended to run [[reducevcf]] on VCF prior to loading them, this will speed up the loading time significantly.</td></tr><br />
<br />
</table><br />
<br />
<br />
=== Allele diversity ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td>Allele diversity summary</td><td><b> [[pileup]]</b> <sup>&#164;</sup></td><td>Required</td><td>--</td><td>unlimited</td><td>The pileup format becomes slow when you have extreme read depth (>5000 x coverage)</td></tr><br />
<br />
</table><br />
* Indicates whether this file format can/should be indexed. <br/><br />
** Recommended maximum file size. First value is without index, the second with index. This values are only guidelines. When loading multiple data sets, you should add the sizes.<br/><br />
*** Unindexed data files can be gzip compressed.<br />
<br />
<sup>&#164;</sup> Recommended file format for this data type.<br />
<br />
<br />
<br />
<br />
<h2>Output formats</h2><br />
(Modified) annotations can be saved as either GFF or EMBL.<br />
<br />
All data that is loaded can be exported in their original format. This will not include modifications.<br />
<br />
<h2>Converting formats</h2><br />
<a href="http://genomeview.org/loki/">We offer a few tools to convert files between formats.</a><br />
<br />
== Previous documentation pages ==<br />
<br />
<br />
<br />
[http://genomeview.org/content/data-formats Supported data formats]<br />
[http://genomeview.org/content/preparing-fasta-files Fasta files]<br />
<br />
[http://genomeview.org/content/preparing-feature-files Feature files]<br />
<br />
[http://genomeview.org/content/preparing-short-read-alignments Read data]<br />
<br />
[http://genomeview.org/content/preparing-pileup Coverage plots]</div>Thomashttps://manual.genomeview.org/index.php?title=Preparing_read_data&diff=9983Preparing read data2013-10-18T14:57:13Z<p>Thomas: /* With Picard (OS independent, recommended) */</p>
<hr />
<div>{{TOC|align=right}}<br />
The best format to present short read alignments to GenomeView is the SAM/BAM format. You need to have your read data in this format and it has to be aligned to a reference genome.<br />
<br />
== Aligning reads ==<br />
GenomeView is a visualization tool and does not do the computationally intensive read alignment. There are however dozens of tools already available to do this job.<br />
<br />
If you need help aligning your reads, you may want to have a look at the [[Recipe to align reads]] to get some ideas.<br />
<br />
== Sorting and indexing ==<br />
Before you can visualize read data you need to prepare the data. This needs to be done because those files are generally huge and we do not want to read the complete file if we're only looking at a small portion of it.<br />
<br />
There are two things to prepare data fresh from the aligner.<br />
<br />
# Sort reads based on genomic coordinates<br />
# Index sorted reads.<br />
<br />
<br />
=== With Picard (OS independent, recommended) ===<br />
You need to have a a recent version of Java installed and you need to run these commands on the [[command-line]]. There are download links with each program where you can fetch the exact version that these manual pages were tested with. Alternatively, you may want to install a more recent version of [http://picard.sourceforge.net/ Picard].<br />
<br />
In this example we have a read alignment in BAM format called 'alignment.sam'. We use the program [http://picard.sourceforge.net/command-line-overview.shtml#SortSam SortSam] from Picard to sort the file by coordinates. This also works if your aligner gives you a BAM file as output, i.e. 'alignment.bam'<br />
<br />
'''Sorting'''<br />
The instruction below will sort the SAM file by coordinate.<br />
java -Xmx500m -jar SortSam.jar I=aligned.sam O=sorted.bam SO=coordinate<br />
[http://genomeview.org/loki/picard-tools-recent/SortSam.jar Download SortSam.jar]<br />
<br />
Important:<br />
# You may need to put the full path for SortSam.jar to the location where you installed the Picard programs. <br />
# You may need to put the full path to where the aligned.sam file sits and where you want the sorted.bam file to end up.<br />
# You have to make sure to replace 'alignment.sam' with the actual name of you aligned SAM/BAM file.<br />
# You have to make sure to replace 'sorted.bam' with the actual name that you want your sorted file to have.<br />
<br />
Sanity checks:<br />
# Make sure there are no errors reported on the console when running this instruction.<br />
# Make sure that you now have a file 'sorted.bam' and that it's not completely empty, i.e. it has a size > 0<br />
<br />
'''Indexing'''<br />
After you have sorted the file, you can index the resulting BAM file with the instruction below:<br />
<br />
java -Xmx500m -jar BuildBamIndex.jar I=sorted.bam<br />
[http://genomeview.org/loki/picard-tools-recent/BuildBamIndex.jar Download BuildBamIndex.jar]<br />
<br />
Important:<br />
# You may need to put the full path for BuildBamIndex.jar to the location where you installed the Picard programs. <br />
# You may need to put the full path to where the 'sorted.bam' file sits.<br />
<br />
<br />
Sanity checks:<br />
# Make sure there are no errors reported on the console when running this instruction.<br />
# Make sure that you now have a file 'sorted.bam.'''bai'''' and that it's not completely empty, i.e. it has a size > 0. This file will be created in the same directory as the 'sorted.bam' file.<br />
<br />
=== With SAMtools (Mac OS and Linux) ===<br />
Steps to get from the various aligner formats to the SAM format are available on the [http://samtools.sourceforge.net/ SAMtools website].<br />
<br />
You need to have SAMtools installed.<br />
<br />
Steps to go from SAM to indexed BAM.<br />
<br />
samtools faidx reference.fasta (will create reference.fasta.fai for the next step)<br />
<br />
samtools view -bS -t reference.fasta.fai alignment.sam -o alignment.bam<br />
<br />
samtools sort alignment.bam sorted (will create sorted.bam)<br />
<br />
samtools index sorted.bam (will create sorted.bam.bai, which is read by GenomeView together with the bam file)<br />
<br />
== Summary visualizations ==<br />
=== Coverage plots ===<br />
If you are primarily interested in the read coverage and not in individual reads, you may want to [[Preparing coverage plots|create coverage plots]].<br />
<br />
=== Variants ===<br />
If you are investigating SNPs or other genetic variants, you also may want to [[Recipe variant calls|create variant calls]].<br />
<br />
<br />
[[Category:User]][[Category:Platform]]</div>Thomashttps://manual.genomeview.org/index.php?title=Preparing_read_data&diff=9959Preparing read data2013-10-17T18:27:55Z<p>Thomas: Created page with "The best format to present short read alignments to GenomeView is the BAM format. You need to have your read data in this format and it has to be aligned. == Aligning reads =..."</p>
<hr />
<div>The best format to present short read alignments to GenomeView is the BAM format. You need to have your read data in this format and it has to be aligned.<br />
<br />
== Aligning reads ==<br />
GenomeView is a visualization tool and does not do the computationally intensive read alignment. There are however dozens of tools already available to do this job.<br />
<br />
If you need help aligning your reads, you may want to have a look at the [[Recipe to align reads]] to get some ideas.<br />
<br />
== Preparing for visualization ==<br />
To prepare a BAM file straight from the aligner there are a few more steps you may have to take to get your data in the right format.<br />
<br />
# Sort reads based on genomic coordinates<br />
# Index sorted reads.<br />
<br />
=== Sorting and indexing with Picard (OS independent) ===<br />
You need to have a local copy of [[Install Picard|Picard]] installed and you need to run these commands on the [[Starting command-line|command-line]].<br />
<br />
In this example we have a read alignment in BAM format called 'alignment.bam'.<br />
<br />
<code lang='bash'><br />
<br />
</code><br />
<br />
<br />
=== Sorting and indexing with SAMtools (Mac OS and Linux) ===<br />
Steps to get from the various aligner formats to the SAM format are available on the SAMtools website.<br />
<br />
Steps to go from SAM to indexed BAM.<br />
<br />
samtools faidx reference.fasta (will create reference.fasta.fai for the next step)<br />
<br />
samtools view -bS -t reference.fasta.fai alignment.sam -o alignment.bam<br />
<br />
samtools sort alignment.bam sorted (will create sorted.bam)<br />
<br />
samtools index sorted.bam (will create sorted.bam.bai, which is read by GenomeView together with the bam file)<br />
<br />
<br />
<br />
== Summary visualizations ==<br />
=== Coverage plots ===<br />
If you are primarily interested in the read coverage and not in individual reads, you may want to [[Preparing coverage plots|create coverage plots]].<br />
<br />
=== Variants ===<br />
If you are investigating SNPs or other genetic variants, you also may want to [[Recipe variant calls|create variant calls]].</div>Thomashttps://manual.genomeview.org/index.php?title=Preloaded_data&diff=212Preloaded data2013-09-06T19:18:30Z<p>Thomas: </p>
<hr />
<div>GenomeView comes with a large set of preloaded data that is hosted on our servers. This data is available through the Genome Explorer.<br />
<br />
[[File:Genomeexplorer.png|thumb|500px|Screenshot of the Genome Explorer window.]]<br />
<br />
<br />
The Genome Explorer has several tabs, each with a number of genomes. You can load the reference and the annotations (and supplemental data) by clicking on the image.<br />
<br />
The NCBI tab has all microbial genomes that are available from NCBI, loading these requires you to double-click on the name.<br />
<br />
The buttons in the left-panel allow you to load your own data files or to restore the data you were working with previously.</div>Thomashttps://manual.genomeview.org/index.php?title=Preloaded_data&diff=211Preloaded data2013-09-06T19:16:59Z<p>Thomas: </p>
<hr />
<div>GenomeView comes with a large set of preloaded data that is hosted on our servers. This data is available through the Genome Explorer.<br />
<br />
[[File:Genomeexplorer.png|thumb|500px|Screenshot of the Genome Explorer window.]]<br />
<br />
<br />
The Genome Explorer has several tabs, each with a number of genomes. You can load the reference and the annotations (and supplemental data) by clicking on the image.</div>Thomashttps://manual.genomeview.org/index.php?title=File:Genomeexplorer.png&diff=210File:Genomeexplorer.png2013-09-06T19:15:24Z<p>Thomas: </p>
<hr />
<div></div>Thomashttps://manual.genomeview.org/index.php?title=Preparing_and_loading_data&diff=209Preparing and loading data2013-09-06T19:10:24Z<p>Thomas: /* Loading data */</p>
<hr />
<div><br />
{{TOC|align=right}}<br />
<br />
<br />
<br />
There are several easy ways to load up data into GenomeView. Before you load your data, you may want to make sure you're using a supported format from the list below. Generally, GenomeView will notify you if it doesn't understand your data.<br />
<br />
==Loading data ==<br />
You can load your data files ...<br />
* ... by dragging them onto GenomeView<br />
* ... by selecting the 'File' menu and then 'Load data...' ([http://genomeview.org/content/load-data tutorial])<br />
* ... by pressing CTRL+O<br />
* ... by specifying them on the [[command-line]]<br />
* ... by selecting "work with my data" in the [[Genome Explorer]]<br />
* ... by loading a [[session file]]<br />
<br />
<br />
You can load preloaded data ...<br />
* ... by selecting a genome from the [[Genome Explorer]]<br />
* ... following a link from a GenomeView enabled [[platform|website]]<br />
<br />
==Data preparation recipe==<br />
<br />
# Match identifiers: GenomeView uses the identifiers to link different sources, so make sure that the identifiers match (case-sensitive).<br />
# Create indices for data files that need it (check table below)<br />
# Convert file formats to get desired visuals (check table below)<br />
# Load data<br />
<br />
== Why index files? ==<br />
Indexing will create a look-up table for GenomeView to load data on-the-fly. This will will speed up browsing and loading speed, as well as significantly reduce the amount of memory you need. For some file formats we recommend you create indices, for other we do not. See the table below for more details and links to instructions.<br />
<br />
<br />
== Supported data formats ==<br />
<br />
=== Reference sequence ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<br />
<tr><th></th><th></th><th></th><th>unindexed***</th><th>indexed</th><th></th></tr><br />
<br />
<tr><td valign="top" rowspan=2>Reference sequence</td><td><b>fasta</b> <sup>¤</sup></td><td>Recommended<br/>[[Index FASTA]]</td><td>50 Mb</td><td>unlimited</td><td>GenomeView will query the user create index for you if you don't have one and the file is very large.</td></tr><br />
<br />
<tr><td>embl, genbank</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.</td></tr><br />
<br />
</table><br />
=== Annotation ===<br />
<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=4>Annotation</td><td><b>gff</b> <sup>&#164;</sup></td><td>Not recommended<br />
[[Index GFF]]</td><td>50 Mb</td><td>unlimited</td><td></td></tr><br />
<br />
<tr><td>embl, genbank</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.</td></tr><br />
<br />
<tr><td>bed</td><td>Not recommended [[Index BED]]</td><td>50 Mb or less</td><td>unlimited</td><td>By default data from a bed file is added to the CDS track, if you want it in a different track, you have to add a line a the top of the file 'track name=Track_name'. No white-space is allowed in the track name.</td></tr><br />
<tr><td>ptt, tbl </td><td>Not possible</td><td>50 Mb or less</td><td>--</td><td>Other standard annotation formats GenomeView understands</td></tr><br />
<tr><td></td><td>various formats</td><td>Not possible</td><td>50 Mb or less</td><td>--</td><td>GenomeView can directly parse the output of the following programs: Blast, GeneMark, TransTermHP, FindPeaks, MaqSNP, tRNA-scan</td></tr><br />
</table><br />
<br />
=== Whole genome alignments ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=3>Multiple genome alignment</td><td><b>maf</b> <sup>&#164;</sup></td><td>Recommended</td><td>100 Mb</td><td>unlimited</td><td>GenomeView will prompt you to create a compressed maf file and index it for you, if you're trying to load an unindexed maf file.<br/>MAF is the recommended file format for whole genome alignemnt of large/complex genomes</td></tr><br />
<br />
<tr><td><b>multi-fasta</b> <sup>&#164;</sup></td><td>Not possible</td><td>100 Mb</td><td>--</td><td>Recommended for small/simple genomes with a near 1:1 relationship.</td></tr><br />
<br />
<tr><td>aln, ClustalW</td><td>Not possible</td><td>100 Mb</td><td>--</td></tr><br />
</table><br />
<br />
=== Read alignments ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=2>Sequence read alignment</td><td><b>bam</b> <sup>&#164;</sup></td><td>Required [[Index bam]]</td><td>--</td><td>unlimited</td><td>GenomeView will prompt you if there is no index and will create one for you.</td></tr><br />
<br />
<tr><td>MAQ, MapView, BroadSolexa</td><td>Not possible</td><td>100 Mb</td><td>--</td></tr><br />
</table><br />
<br />
=== Read coverage summary ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=4>Read coverage summary</td><td><b> [[tdf]]</b> <sup>&#164;</sup></td><td>Native</td><td>unlimited</td><td>unlimited</td><td>[[TDF]] files can be created with the [[bam2tdf]] tool that is available for [https://sourceforge.net/projects/genomeview/files/TDformat/ download.]</td></tr><br />
<br />
<tr><td>bigwig</td><td>Native</td><td>unlimited</td><td>unlimited</td><td>This format can be used for any wig file, not just read coverage</td></tr><br />
<br />
<tr><td>[[pileup]]</td><td>Required</td><td>--</td><td>unlimited</td><td>The pileup format becomes slow when you have extreme read depth (>5000 x coverage)</td></tr><br />
<br />
<tr><td>wig</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>We strongly recommend to [[wig2tdf|convert your wig files to TDF]]. <br />
GenomeView can automatically convert wig files to TDF. Caveats: 'track' information should all be on a single line, 'browser' lines will be ignored as the are specific to the UCSC Genome Browser. WIG files need to be sorted by chromosome and by genomic coordinate within the chromosome. BedGraph as well as Wiggle_0 format is supported. For the wiggle_0 type, both variableStep and fixedStep should work.</td></tr><br />
</table><br />
<br />
=== Genome variation and diversity ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td>Genome variation</td><td><b> [[vcf]]</b> <sup>&#164;</sup></td><td>Not recommended</td><td>--</td><td>unlimited</td><td>It is recommended to run [[reducevcf]] on VCF prior to loading them, this will speed up the loading time significantly.</td></tr><br />
<br />
</table><br />
<br />
<br />
=== Allele diversity ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td>Allele diversity summary</td><td><b> [[pileup]]</b> <sup>&#164;</sup></td><td>Required</td><td>--</td><td>unlimited</td><td>The pileup format becomes slow when you have extreme read depth (>5000 x coverage)</td></tr><br />
<br />
</table><br />
* Indicates whether this file format can/should be indexed. <br/><br />
** Recommended maximum file size. First value is without index, the second with index. This values are only guidelines. When loading multiple data sets, you should add the sizes.<br/><br />
*** Unindexed data files can be gzip compressed.<br />
<br />
<sup>&#164;</sup> Recommended file format for this data type.<br />
<br />
<br />
<br />
<br />
<h2>Output formats</h2><br />
(Modified) annotations can be saved as either GFF or EMBL.<br />
<br />
All data that is loaded can be exported in their original format. This will not include modifications.<br />
<br />
<h2>Converting formats</h2><br />
<a href="http://genomeview.org/loki/">We offer a few tools to convert files between formats.</a><br />
<br />
== Previous documentation pages ==<br />
<br />
<br />
<br />
[http://genomeview.org/content/data-formats Supported data formats]<br />
[http://genomeview.org/content/preparing-fasta-files Fasta files]<br />
<br />
[http://genomeview.org/content/preparing-feature-files Feature files]<br />
<br />
[http://genomeview.org/content/preparing-short-read-alignments Read data]<br />
<br />
[http://genomeview.org/content/preparing-pileup Coverage plots]</div>Thomashttps://manual.genomeview.org/index.php?title=File:Step4.png&diff=195File:Step4.png2013-09-06T18:52:59Z<p>Thomas: </p>
<hr />
<div></div>Thomashttps://manual.genomeview.org/index.php?title=File:Step3.png&diff=194File:Step3.png2013-09-06T18:52:41Z<p>Thomas: </p>
<hr />
<div></div>Thomashttps://manual.genomeview.org/index.php?title=File:Step2.png&diff=193File:Step2.png2013-09-06T18:52:16Z<p>Thomas: </p>
<hr />
<div></div>Thomashttps://manual.genomeview.org/index.php?title=Webstart_step_by_step_instructions&diff=192Webstart step by step instructions2013-09-06T16:01:27Z<p>Thomas: /* Step 2: Select Open With Java Web Start */</p>
<hr />
<div><br />
Step by step instructions on what to expect when starting GenomeView for the first time.<br />
<br />
Prerequisite: Java 6<br />
Make sure you have Java 6 or more recent installed. You can test which version you have installed at http://javatester.org<br />
<br />
==Step 1: Click the Web Start button==<br />
<br />
Click the orange 'launch' button<br />
[[File:Webstart.gif|link=http://genomeview.org/start/launch.jnlp]]<br />
<br />
==Step 2: Select Open With Java Web Start==<br />
Once you click the orange launch button, you will likely get a pop-up that asks you with which program you want to open the link.<br />
<br />
Agree with what the pop-up windows ask. If you're unsure, [http://genomeview.org/content/browser-pop-windows detailed instructions for various browsers are available].<br />
<br />
==Step 3: Waiting for GenomeView to start==<br />
Once you have approved our digital signature, GenomeView will start loading, you will see the following screens without any further input.<br />
<br />
GenomeView startingGenomeView starting<br />
<br />
GenomeView startedGenomeView started<br />
<br />
==Step 4: Success!==<br />
You have now successfully started GenomeView!</div>Thomashttps://manual.genomeview.org/index.php?title=Webstart_step_by_step_instructions&diff=191Webstart step by step instructions2013-09-06T16:01:06Z<p>Thomas: Created page with " Step by step instructions on what to expect when starting GenomeView for the first time. Prerequisite: Java 6 Make sure you have Java 6 or more recent installed. You can tes..."</p>
<hr />
<div><br />
Step by step instructions on what to expect when starting GenomeView for the first time.<br />
<br />
Prerequisite: Java 6<br />
Make sure you have Java 6 or more recent installed. You can test which version you have installed at http://javatester.org<br />
<br />
==Step 1: Click the Web Start button==<br />
<br />
Click the orange 'launch' button<br />
[[File:Webstart.gif|link=http://genomeview.org/start/launch.jnlp]]<br />
<br />
==Step 2: Select Open With Java Web Start==<br />
Once you click the orange launch button, you will likely get a pop-up that asks you with which program you want to open the link.<br />
<br />
Agree with what the pop-up windows ask. If you're unsure, detailed instructions for various browsers are available.<br />
<br />
==Step 3: Waiting for GenomeView to start==<br />
Once you have approved our digital signature, GenomeView will start loading, you will see the following screens without any further input.<br />
<br />
GenomeView startingGenomeView starting<br />
<br />
GenomeView startedGenomeView started<br />
<br />
==Step 4: Success!==<br />
You have now successfully started GenomeView!</div>Thomashttps://manual.genomeview.org/index.php?title=Quick_start&diff=190Quick start2013-09-06T16:00:48Z<p>Thomas: /* 1 step guide */</p>
<hr />
<div>A four step guide to get going with GenomeView with your data<br />
<br />
== 1 step guide==<br />
Click button:<br />
[[File:Webstart.gif|link=http://genomeview.org/start/launch.jnlp]]<br />
<br />
== 4 step guide ==<br />
# [[Starting GenomeView]]<br />
# Check out the [[Preparing and loading data]] page to learn how to load data and the list of [[Preparing_and_loading_data#Supported_data_formats|supported data formats]].<br />
# Check out the [[Visualizations]] documentation to learn about the different windows, panels in GenomeView.<br />
# Check out the [[navigation]] documentation to get around your genome.<br />
<br />
[[Category:User]]</div>Thomashttps://manual.genomeview.org/index.php?title=Quick_start&diff=189Quick start2013-09-06T16:00:42Z<p>Thomas: </p>
<hr />
<div>A four step guide to get going with GenomeView with your data<br />
<br />
== 1 step guide==<br />
Click button below:<br />
[[File:Webstart.gif|link=http://genomeview.org/start/launch.jnlp]]<br />
<br />
== 4 step guide ==<br />
# [[Starting GenomeView]]<br />
# Check out the [[Preparing and loading data]] page to learn how to load data and the list of [[Preparing_and_loading_data#Supported_data_formats|supported data formats]].<br />
# Check out the [[Visualizations]] documentation to learn about the different windows, panels in GenomeView.<br />
# Check out the [[navigation]] documentation to get around your genome.<br />
<br />
[[Category:User]]</div>Thomashttps://manual.genomeview.org/index.php?title=Starting_GenomeView&diff=188Starting GenomeView2013-09-06T15:56:12Z<p>Thomas: </p>
<hr />
<div><br />
<br />
== Webstart ==<br />
<br />
The most straightforward way to start GenomeView is by Java Webstart. <br />
<br />
[[File:Webstart.gif|link=http://genomeview.org/start/launch.jnlp]]<br />
<br />
Clicking the above button will immediately launch the application.<br />
<br />
[[Webstart step by step instructions]]<br />
<br />
<br />
== Local installation == <br />
<br />
Download the latest version from http://sourceforge.net/projects/genomeview/<br />
<br />
Unpack the zip file to a directory and start the genomeview.jar file. You can start this file either by double clicking the jar file or by issuing the following instruction at the command line.<br />
java -Xmx1g -jar genomeview.jar<br />
<br />
<br />
== System requirements ==<br />
Recommended system specs: <br />
* 2 Gb of memory, minimum is 1 Gb<br />
* dual-core or better processor, but GenomeView will work with less.<br />
* To browse online data, it's recommended to have a high-speed, low-latency connection<br />
* Recent version of Windows, *nix or Mac OS<br />
* Recent version of Java 6 or 7, minimum is Java 1.6u10+. You can get a recent version from http://www.java.com. <br />
<br />
[http://genomeview.org/content/supported-platforms Supported platforms]<br />
<br />
<br />
[[Category:User]]</div>Thomashttps://manual.genomeview.org/index.php?title=Starting_GenomeView&diff=187Starting GenomeView2013-09-06T15:54:13Z<p>Thomas: </p>
<hr />
<div><br />
<br />
== Webstart ==<br />
<br />
The most straightforward way to start GenomeView is by Java Webstart. <br />
<br />
[[File:Webstart.gif|link=http://genomeview.org/start/launch.jnlp]]<br />
<br />
Clicking the above button will immediately launch the application.<br />
<br />
== Local installation == <br />
<br />
Download the latest version from http://sourceforge.net/projects/genomeview/<br />
<br />
Unpack the zip file to a directory and start the genomeview.jar file. You can start this file either by double clicking the jar file or by issuing the following instruction at the command line.<br />
java -Xmx1g -jar genomeview.jar<br />
<br />
<br />
== System requirements ==<br />
Recommended system specs: <br />
* 2 Gb of memory, minimum is 1 Gb<br />
* dual-core or better processor, but GenomeView will work with less.<br />
* To browse online data, it's recommended to have a high-speed, low-latency connection<br />
* Recent version of Windows, *nix or Mac OS<br />
* Recent version of Java 6 or 7, minimum is Java 1.6u10+. You can get a recent version from http://www.java.com. <br />
<br />
[http://genomeview.org/content/supported-platforms Supported platforms]<br />
<br />
<br />
[[Category:User]]</div>Thomashttps://manual.genomeview.org/index.php?title=Starting_GenomeView&diff=186Starting GenomeView2013-09-06T15:54:04Z<p>Thomas: </p>
<hr />
<div><br />
<br />
== Webstart ==<br />
<br />
The most straightforward way to start GenomeView is by Java Webstart. <br />
<br />
[[File:Webstart.png|link=http://genomeview.org/start/launch.jnlp]]<br />
<br />
Clicking the above button will immediately launch the application.<br />
<br />
== Local installation == <br />
<br />
Download the latest version from http://sourceforge.net/projects/genomeview/<br />
<br />
Unpack the zip file to a directory and start the genomeview.jar file. You can start this file either by double clicking the jar file or by issuing the following instruction at the command line.<br />
java -Xmx1g -jar genomeview.jar<br />
<br />
<br />
== System requirements ==<br />
Recommended system specs: <br />
* 2 Gb of memory, minimum is 1 Gb<br />
* dual-core or better processor, but GenomeView will work with less.<br />
* To browse online data, it's recommended to have a high-speed, low-latency connection<br />
* Recent version of Windows, *nix or Mac OS<br />
* Recent version of Java 6 or 7, minimum is Java 1.6u10+. You can get a recent version from http://www.java.com. <br />
<br />
[http://genomeview.org/content/supported-platforms Supported platforms]<br />
<br />
<br />
[[Category:User]]</div>Thomashttps://manual.genomeview.org/index.php?title=Starting_GenomeView&diff=185Starting GenomeView2013-09-06T15:53:54Z<p>Thomas: </p>
<hr />
<div><br />
<br />
== Webstart ==<br />
<br />
The most straightforward way to start GenomeView is by Java Webstart. <br />
<br />
[[File:Webstart.png|link="http://genomeview.org/start/launch.jnlp"]]<br />
<br />
Clicking the above button will immediately launch the application.<br />
<br />
== Local installation == <br />
<br />
Download the latest version from http://sourceforge.net/projects/genomeview/<br />
<br />
Unpack the zip file to a directory and start the genomeview.jar file. You can start this file either by double clicking the jar file or by issuing the following instruction at the command line.<br />
java -Xmx1g -jar genomeview.jar<br />
<br />
<br />
== System requirements ==<br />
Recommended system specs: <br />
* 2 Gb of memory, minimum is 1 Gb<br />
* dual-core or better processor, but GenomeView will work with less.<br />
* To browse online data, it's recommended to have a high-speed, low-latency connection<br />
* Recent version of Windows, *nix or Mac OS<br />
* Recent version of Java 6 or 7, minimum is Java 1.6u10+. You can get a recent version from http://www.java.com. <br />
<br />
[http://genomeview.org/content/supported-platforms Supported platforms]<br />
<br />
<br />
[[Category:User]]</div>Thomashttps://manual.genomeview.org/index.php?title=File:Webstart.gif&diff=184File:Webstart.gif2013-09-06T15:52:30Z<p>Thomas: </p>
<hr />
<div></div>Thomashttps://manual.genomeview.org/index.php?title=Quick_start&diff=183Quick start2013-09-06T15:51:32Z<p>Thomas: </p>
<hr />
<div>A four step guide to get going with GenomeView with your data<br />
<br />
# [[Starting GenomeView]]<br />
# Check out the [[Preparing and loading data]] page to learn how to load data and the list of [[Preparing_and_loading_data#Supported_data_formats|supported data formats]].<br />
# Check out the [[Visualizations]] documentation to learn about the different windows, panels in GenomeView.<br />
# Check out the [[navigation]] documentation to get around your genome.<br />
<br />
[[Category:User]]</div>Thomashttps://manual.genomeview.org/index.php?title=Quick_start&diff=182Quick start2013-09-06T15:51:15Z<p>Thomas: </p>
<hr />
<div>A four step guide to get going with GenomeView with your data<br />
<br />
# [[Starting GenomeView]]<br />
# Check out the [[Preparing and loading data]] page to learn how to load data and the list of [[Preparing_and_loading_data#Supported_data_formats|supported data formats]].<br />
# Check out the [[Visualizations]] documentation to learn about the different windows, panels in GenomeView.<br />
# Check out the [[navigation]] documentation to get around your genome.</div>Thomashttps://manual.genomeview.org/index.php?title=Preparing_and_loading_data&diff=181Preparing and loading data2013-09-06T15:50:26Z<p>Thomas: </p>
<hr />
<div><br />
{{TOC|align=right}}<br />
<br />
==Loading data ==<br />
[http://genomeview.org/content/load-data Load data tutorial]<br />
<br />
==Data preparation recipe==<br />
<br />
# Match identifiers: GenomeView uses the identifiers to link different sources, so make sure that the identifiers match (case-sensitive).<br />
# Create indices for data files that need it (check table below)<br />
# Convert file formats to get desired visuals (check table below)<br />
# Load data<br />
<br />
== Why index files? ==<br />
Indexing will create a look-up table for GenomeView to load data on-the-fly. This will will speed up browsing and loading speed, as well as significantly reduce the amount of memory you need. For some file formats we recommend you create indices, for other we do not. See the table below for more details and links to instructions.<br />
<br />
<br />
== Supported data formats ==<br />
<br />
=== Reference sequence ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<br />
<tr><th></th><th></th><th></th><th>unindexed***</th><th>indexed</th><th></th></tr><br />
<br />
<tr><td valign="top" rowspan=2>Reference sequence</td><td><b>fasta</b> <sup>¤</sup></td><td>Recommended<br/>[[Index FASTA]]</td><td>50 Mb</td><td>unlimited</td><td>GenomeView will query the user create index for you if you don't have one and the file is very large.</td></tr><br />
<br />
<tr><td>embl, genbank</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.</td></tr><br />
<br />
</table><br />
=== Annotation ===<br />
<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=4>Annotation</td><td><b>gff</b> <sup>&#164;</sup></td><td>Not recommended<br />
[[Index GFF]]</td><td>50 Mb</td><td>unlimited</td><td></td></tr><br />
<br />
<tr><td>embl, genbank</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.</td></tr><br />
<br />
<tr><td>bed</td><td>Not recommended [[Index BED]]</td><td>50 Mb or less</td><td>unlimited</td><td>By default data from a bed file is added to the CDS track, if you want it in a different track, you have to add a line a the top of the file 'track name=Track_name'. No white-space is allowed in the track name.</td></tr><br />
<tr><td>ptt, tbl </td><td>Not possible</td><td>50 Mb or less</td><td>--</td><td>Other standard annotation formats GenomeView understands</td></tr><br />
<tr><td></td><td>various formats</td><td>Not possible</td><td>50 Mb or less</td><td>--</td><td>GenomeView can directly parse the output of the following programs: Blast, GeneMark, TransTermHP, FindPeaks, MaqSNP, tRNA-scan</td></tr><br />
</table><br />
<br />
=== Whole genome alignments ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=3>Multiple genome alignment</td><td><b>maf</b> <sup>&#164;</sup></td><td>Recommended</td><td>100 Mb</td><td>unlimited</td><td>GenomeView will prompt you to create a compressed maf file and index it for you, if you're trying to load an unindexed maf file.<br/>MAF is the recommended file format for whole genome alignemnt of large/complex genomes</td></tr><br />
<br />
<tr><td><b>multi-fasta</b> <sup>&#164;</sup></td><td>Not possible</td><td>100 Mb</td><td>--</td><td>Recommended for small/simple genomes with a near 1:1 relationship.</td></tr><br />
<br />
<tr><td>aln, ClustalW</td><td>Not possible</td><td>100 Mb</td><td>--</td></tr><br />
</table><br />
<br />
=== Read alignments ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=2>Sequence read alignment</td><td><b>bam</b> <sup>&#164;</sup></td><td>Required [[Index bam]]</td><td>--</td><td>unlimited</td><td>GenomeView will prompt you if there is no index and will create one for you.</td></tr><br />
<br />
<tr><td>MAQ, MapView, BroadSolexa</td><td>Not possible</td><td>100 Mb</td><td>--</td></tr><br />
</table><br />
<br />
=== Read coverage summary ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td valign="top" rowspan=4>Read coverage summary</td><td><b> [[tdf]]</b> <sup>&#164;</sup></td><td>Native</td><td>unlimited</td><td>unlimited</td><td>[[TDF]] files can be created with the [[bam2tdf]] tool that is available for [https://sourceforge.net/projects/genomeview/files/TDformat/ download.]</td></tr><br />
<br />
<tr><td>bigwig</td><td>Native</td><td>unlimited</td><td>unlimited</td><td>This format can be used for any wig file, not just read coverage</td></tr><br />
<br />
<tr><td>[[pileup]]</td><td>Required</td><td>--</td><td>unlimited</td><td>The pileup format becomes slow when you have extreme read depth (>5000 x coverage)</td></tr><br />
<br />
<tr><td>wig</td><td>Not possible</td><td>50 Mb</td><td>--</td><td>We strongly recommend to [[wig2tdf|convert your wig files to TDF]]. <br />
GenomeView can automatically convert wig files to TDF. Caveats: 'track' information should all be on a single line, 'browser' lines will be ignored as the are specific to the UCSC Genome Browser. WIG files need to be sorted by chromosome and by genomic coordinate within the chromosome. BedGraph as well as Wiggle_0 format is supported. For the wiggle_0 type, both variableStep and fixedStep should work.</td></tr><br />
</table><br />
<br />
=== Genome variation and diversity ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td>Genome variation</td><td><b> [[vcf]]</b> <sup>&#164;</sup></td><td>Not recommended</td><td>--</td><td>unlimited</td><td>It is recommended to run [[reducevcf]] on VCF prior to loading them, this will speed up the loading time significantly.</td></tr><br />
<br />
</table><br />
<br />
<br />
=== Allele diversity ===<br />
<table border=1><br />
<tr><th>Data type</th><th>File format</th><th>Index*</th><th colspan=2>Max size**</th><th>Comments</th></tr><br />
<tr><td>Allele diversity summary</td><td><b> [[pileup]]</b> <sup>&#164;</sup></td><td>Required</td><td>--</td><td>unlimited</td><td>The pileup format becomes slow when you have extreme read depth (>5000 x coverage)</td></tr><br />
<br />
</table><br />
* Indicates whether this file format can/should be indexed. <br/><br />
** Recommended maximum file size. First value is without index, the second with index. This values are only guidelines. When loading multiple data sets, you should add the sizes.<br/><br />
*** Unindexed data files can be gzip compressed.<br />
<br />
<sup>&#164;</sup> Recommended file format for this data type.<br />
<br />
<br />
<br />
<br />
<h2>Output formats</h2><br />
(Modified) annotations can be saved as either GFF or EMBL.<br />
<br />
All data that is loaded can be exported in their original format. This will not include modifications.<br />
<br />
<h2>Converting formats</h2><br />
<a href="http://genomeview.org/loki/">We offer a few tools to convert files between formats.</a><br />
<br />
== Previous documentation pages ==<br />
<br />
<br />
<br />
[http://genomeview.org/content/data-formats Supported data formats]<br />
[http://genomeview.org/content/preparing-fasta-files Fasta files]<br />
<br />
[http://genomeview.org/content/preparing-feature-files Feature files]<br />
<br />
[http://genomeview.org/content/preparing-short-read-alignments Read data]<br />
<br />
[http://genomeview.org/content/preparing-pileup Coverage plots]</div>Thomashttps://manual.genomeview.org/index.php?title=Quick_start&diff=180Quick start2013-09-06T15:48:12Z<p>Thomas: </p>
<hr />
<div>A four step guide to get going with GenomeView with your data<br />
<br />
# [[Starting GenomeView]]<br />
# Check out the [[load data]] page to learn how to load data and the list of [[supported data formats]].<br />
# Check out the [[user interface]] documentation to learn about the different windows, panels in GenomeView.<br />
# Check out the [[navigation]] documentation to get around your genome.</div>Thomashttps://manual.genomeview.org/index.php?title=Starting_GenomeView&diff=179Starting GenomeView2013-09-06T15:47:28Z<p>Thomas: </p>
<hr />
<div><br />
<br />
== Webstart ==<br />
<br />
The most straightforward way to start GenomeView is by Java Webstart. <br />
<br />
<nowiki><br />
<a href="http://genomeview.org/start/launch.jnlp"><img style="border:none;" src="/webstart.gif" alt="Launch" /></a><br />
</nowiki><br />
Clicking the above button will immediately launch the application.<br />
<br />
== Local installation == <br />
<br />
Download the latest version from http://sourceforge.net/projects/genomeview/<br />
<br />
Unpack the zip file to a directory and start the genomeview.jar file. You can start this file either by double clicking the jar file or by issuing the following instruction at the command line.<br />
java -Xmx1g -jar genomeview.jar<br />
<br />
<br />
== System requirements ==<br />
Recommended system specs: <br />
* 2 Gb of memory, minimum is 1 Gb<br />
* dual-core or better processor, but GenomeView will work with less.<br />
* To browse online data, it's recommended to have a high-speed, low-latency connection<br />
* Recent version of Windows, *nix or Mac OS<br />
* Recent version of Java 6 or 7, minimum is Java 1.6u10+. You can get a recent version from http://www.java.com. <br />
<br />
[http://genomeview.org/content/supported-platforms Supported platforms]<br />
<br />
<br />
[[Category:User]]</div>Thomashttps://manual.genomeview.org/index.php?title=Quick_start&diff=178Quick start2013-09-06T15:47:01Z<p>Thomas: Created page with "A four step guide to get going with GenomeView with your data # Start GenomeView # Check out the load data page to learn how to load data and the list of support..."</p>
<hr />
<div>A four step guide to get going with GenomeView with your data<br />
<br />
# [[Start GenomeView]]<br />
# Check out the [[load data]] page to learn how to load data and the list of [[supported data formats]].<br />
# Check out the [[user interface]] documentation to learn about the different windows, panels in GenomeView.<br />
# Check out the [[navigation]] documentation to get around your genome.</div>Thomashttps://manual.genomeview.org/index.php?title=Main_Page&diff=177Main Page2013-09-06T15:46:10Z<p>Thomas: </p>
<hr />
<div>{{TOC|align=right}}<br />
<br />
Welcome to the GenomeView manual. These pages aim to answer any questions you may have as an end-user, a platform-user or as a contributing developer.<br />
<br />
This documentation is completely open for anyone to contribute to, just click the edit-button near the top and you can help make this a better resource for everyone.<br />
<br />
'''At the moment we are transitioning the current documentation from genomeview.org/manual to this wiki. So not all information may be here yet. [[User:Thomas|Thomas]] ([[User talk:Thomas|talk]]) 19:18, 30 August 2013 (CEST)'''<br />
<br />
== [[Quick start|Getting started with GenomeView in 5 minutes]] ==<br />
<br />
<div style="float:left;width:49%"><br />
{{Box-header|title=User manual}}<br />
These manual pages describe how to use GenomeView and how to prepare your data to be usable in GenomeView.<br />
<br />
[[Starting GenomeView]]<br />
<br />
[[Preparing and loading data]] -- [[Preloaded data]]<br />
<br />
[[Navigation]] -- [[Keyboard short-cuts]]<br />
<br />
[[Visualizations]]<br />
<br />
[[Search for...]]<br />
<br />
[[Manipulating and configuring the views]]<br />
<br />
[[Editing annotations]]<br />
<br />
[[Exporting data and saving changes]]<br />
<br />
[[More functionality with plugins]]<br />
<br />
</div><br />
</div><br />
<br />
<div style="float:right;width:49%"><br />
{{Box-header|title=Getting help}}<br />
<br />
[[Frequently asked questions]]<br />
<br />
[[I have a problem, help me, please]]<br />
<br />
[[Report a bug or request a feature]]<br />
<br />
</div><br />
</div><br />
<br />
<br />
<br />
<div style="clear:both;"></div><br />
<br />
<br />
<div style="float:left;width:49%"><br />
{{Box-header|title=Platform-user manual}}<br />
These manual pages describe how to integrate GenomeView with your web-based platform or how to communicate with GenomeView from 3-rd party software. <br />
<br />
[[Using GenomeView from the command-line]]<br />
<br />
[[Integration]]<br />
<br />
[[Session files]] -- [[Configuration options]]<br />
<br />
[[Communicating with GenomeView]]<br />
<br />
[[Programming with GenomeView]]<br />
<br />
[[Setting up authentication and encryption]]<br />
<br />
[[Making a plugin]]<br />
<br />
[[Integrating GenomeView as an editor]]<br />
<br />
</div><br />
</div><br />
<br />
<div style="float:right;width:49%"><br />
{{Box-header|title=Developer manual}}<br />
These pages are aimed towards contributors to GenomeView and developers who work directly on the GenomeView code.<br />
<br />
Currently this documentation lives on [https://sourceforge.net/p/genomeview/wiki/ Sourceforge].<br />
<br />
</div><br />
</div><br />
<div style="clear:both;"></div><br />
<br />
<br />
<br />
== Workshops, presentations and training ==</div>Thomashttps://manual.genomeview.org/index.php?title=Starting_GenomeView&diff=176Starting GenomeView2013-09-06T15:39:38Z<p>Thomas: /* Webstart */</p>
<hr />
<div>== System requirements ==<br />
Recommended system specs: <br />
* 2 Gb of memory, minimum is 1 Gb<br />
* dual-core or better processor, but GenomeView will work with less.<br />
* To browse online data, it's recommended to have a high-speed, low-latency connection<br />
* Recent version of Windows, *nix or Mac OS<br />
* Recent version of Java 6 or 7, minimum is Java 1.6u10+. You can get a recent version from http://www.java.com. <br />
<br />
[http://genomeview.org/content/supported-platforms Supported platforms]<br />
<br />
== Webstart ==<br />
<br />
The most straightforward way to start GenomeView is by Java Webstart. <br />
<br />
<nowiki><br />
<a href="http://genomeview.org/start/launch.jnlp"><img style="border:none;" src="/webstart.gif" alt="Launch" /></a><br />
</nowiki><br />
Clicking the above button will immediately launch the application.<br />
<br />
== Local installation == <br />
<br />
Download the latest version from http://sourceforge.net/projects/genomeview/<br />
<br />
Unpack the zip file to a directory and start the genomeview.jar file. You can start this file either by double clicking the jar file or by issuing the following instruction at the command line.<br />
java -Xmx1g -jar genomeview.jar<br />
<br />
<br />
<br />
[[Category:User]]</div>Thomashttps://manual.genomeview.org/index.php?title=Starting_GenomeView&diff=175Starting GenomeView2013-09-06T15:39:14Z<p>Thomas: </p>
<hr />
<div>== System requirements ==<br />
Recommended system specs: <br />
* 2 Gb of memory, minimum is 1 Gb<br />
* dual-core or better processor, but GenomeView will work with less.<br />
* To browse online data, it's recommended to have a high-speed, low-latency connection<br />
* Recent version of Windows, *nix or Mac OS<br />
* Recent version of Java 6 or 7, minimum is Java 1.6u10+. You can get a recent version from http://www.java.com. <br />
<br />
[http://genomeview.org/content/supported-platforms Supported platforms]<br />
<br />
== Webstart ==<br />
<br />
The most straightforward way to start GenomeView is by Java Webstart. <br />
<br />
<a href="http://genomeview.org/start/launch.jnlp"><img style="border:none;" src="/webstart.gif" alt="Launch" /></a><br />
<br />
Clicking the above button will immediately launch the application.<br />
<br />
== Local installation == <br />
<br />
Download the latest version from http://sourceforge.net/projects/genomeview/<br />
<br />
Unpack the zip file to a directory and start the genomeview.jar file. You can start this file either by double clicking the jar file or by issuing the following instruction at the command line.<br />
java -Xmx1g -jar genomeview.jar<br />
<br />
<br />
<br />
[[Category:User]]</div>Thomashttps://manual.genomeview.org/index.php?title=Starting_GenomeView&diff=174Starting GenomeView2013-09-06T15:36:21Z<p>Thomas: </p>
<hr />
<div>== System requirements ==<br />
<strong>Java 1.6u10+</strong> is required to run the application. You can get a recent version from http://www.java.com. It is recommended that you have <strong>1 Gb of memory</strong>, but GenomeView will work with less. Similarly it is recommended to have a <strong>dual-core</strong> or better processor, but GenomeView will work with less.<br />
<br />
== Webstart ==<br />
<br />
The most straightforward way to start GenomeView is by Java Webstart. <br />
<br />
<a href="http://genomeview.org/start/launch.jnlp"><img style="border:none;" src="/webstart.gif" alt="Launch" /></a><br />
<br />
Clicking the above button will immediately launch the application.<br />
<br />
== Local installation == <br />
<br />
Download the latest version from http://sourceforge.net/projects/genomeview/<br />
<br />
Unpack the zip file to a directory and start the genomeview.jar file. You can start this file either by double clicking the jar file or by issuing the following instruction at the command line.<br />
java -Xmx1g -jar genomeview.jar<br />
<br />
[http://genomeview.org/content/supported-platforms Supported platforms]<br />
<br />
<br />
[[Category:User]]</div>Thomashttps://manual.genomeview.org/index.php?title=Main_Page&diff=173Main Page2013-09-06T15:23:30Z<p>Thomas: </p>
<hr />
<div>{{TOC|align=right}}<br />
<br />
Welcome to the GenomeView manual. These pages aim to answer any questions you may have as an end-user, a platform-user or as a contributing developer.<br />
<br />
This documentation is completely open for anyone to contribute to, just click the edit-button near the top and you can help make this a better resource for everyone.<br />
<br />
'''At the moment we are transitioning the current documentation from genomeview.org/manual to this wiki. So not all information may be here yet. [[User:Thomas|Thomas]] ([[User talk:Thomas|talk]]) 19:18, 30 August 2013 (CEST)'''<br />
<br />
== [http://genomeview.org/content/quick-start-guide Getting started with GenomeView in 5 minutes] ==<br />
<br />
<div style="float:left;width:49%"><br />
{{Box-header|title=User manual}}<br />
These manual pages describe how to use GenomeView and how to prepare your data to be usable in GenomeView.<br />
<br />
[[Starting GenomeView]]<br />
<br />
[[Preparing and loading data]] -- [[Preloaded data]]<br />
<br />
[[Navigation]] -- [[Keyboard short-cuts]]<br />
<br />
[[Visualizations]]<br />
<br />
[[Search for...]]<br />
<br />
[[Manipulating and configuring the views]]<br />
<br />
[[Editing annotations]]<br />
<br />
[[Exporting data and saving changes]]<br />
<br />
[[More functionality with plugins]]<br />
<br />
</div><br />
</div><br />
<br />
<div style="float:right;width:49%"><br />
{{Box-header|title=Getting help}}<br />
<br />
[[Frequently asked questions]]<br />
<br />
[[I have a problem, help me, please]]<br />
<br />
[[Report a bug or request a feature]]<br />
<br />
</div><br />
</div><br />
<br />
<br />
<br />
<div style="clear:both;"></div><br />
<br />
<br />
<div style="float:left;width:49%"><br />
{{Box-header|title=Platform-user manual}}<br />
These manual pages describe how to integrate GenomeView with your web-based platform or how to communicate with GenomeView from 3-rd party software. <br />
<br />
[[Using GenomeView from the command-line]]<br />
<br />
[[Integration]]<br />
<br />
[[Session files]] -- [[Configuration options]]<br />
<br />
[[Communicating with GenomeView]]<br />
<br />
[[Programming with GenomeView]]<br />
<br />
[[Setting up authentication and encryption]]<br />
<br />
[[Making a plugin]]<br />
<br />
[[Integrating GenomeView as an editor]]<br />
<br />
</div><br />
</div><br />
<br />
<div style="float:right;width:49%"><br />
{{Box-header|title=Developer manual}}<br />
These pages are aimed towards contributors to GenomeView and developers who work directly on the GenomeView code.<br />
<br />
Currently this documentation lives on [https://sourceforge.net/p/genomeview/wiki/ Sourceforge].<br />
<br />
</div><br />
</div><br />
<div style="clear:both;"></div><br />
<br />
<br />
<br />
== Workshops, presentations and training ==</div>Thomashttps://manual.genomeview.org/index.php?title=Main_Page&diff=172Main Page2013-09-06T15:23:10Z<p>Thomas: </p>
<hr />
<div>{{TOC|align=right}}<br />
<br />
Welcome to the GenomeView manual. These pages aim to answer any questions you may have as an end-user, a platform-user or as a contributing developer.<br />
<br />
This documentation is completely open for anyone to contribute to, just click the edit-button near the top and you can help make this a better resource for everyone.<br />
<br />
'''At the moment we are transitioning the current documentation from genomeview.org/manual to this wiki. So not all information may be here yet. [[User:Thomas|Thomas]] ([[User talk:Thomas|talk]]) 19:18, 30 August 2013 (CEST)'''<br />
<br />
== [http://genomeview.org/content/quick-start-guide Getting started in 5 minutes with GenomeView] ==<br />
<br />
<div style="float:left;width:49%"><br />
{{Box-header|title=User manual}}<br />
These manual pages describe how to use GenomeView and how to prepare your data to be usable in GenomeView.<br />
<br />
[[Starting GenomeView]]<br />
<br />
[[Preparing and loading data]] -- [[Preloaded data]]<br />
<br />
[[Navigation]] -- [[Keyboard short-cuts]]<br />
<br />
[[Visualizations]]<br />
<br />
[[Search for...]]<br />
<br />
[[Manipulating and configuring the views]]<br />
<br />
[[Editing annotations]]<br />
<br />
[[Exporting data and saving changes]]<br />
<br />
[[More functionality with plugins]]<br />
<br />
</div><br />
</div><br />
<br />
<div style="float:right;width:49%"><br />
{{Box-header|title=Getting help}}<br />
<br />
[[Frequently asked questions]]<br />
<br />
[[I have a problem, help me, please]]<br />
<br />
[[Report a bug or request a feature]]<br />
<br />
</div><br />
</div><br />
<br />
<br />
<br />
<div style="clear:both;"></div><br />
<br />
<br />
<div style="float:left;width:49%"><br />
{{Box-header|title=Platform-user manual}}<br />
These manual pages describe how to integrate GenomeView with your web-based platform or how to communicate with GenomeView from 3-rd party software. <br />
<br />
[[Using GenomeView from the command-line]]<br />
<br />
[[Integration]]<br />
<br />
[[Session files]] -- [[Configuration options]]<br />
<br />
[[Communicating with GenomeView]]<br />
<br />
[[Programming with GenomeView]]<br />
<br />
[[Setting up authentication and encryption]]<br />
<br />
[[Making a plugin]]<br />
<br />
[[Integrating GenomeView as an editor]]<br />
<br />
</div><br />
</div><br />
<br />
<div style="float:right;width:49%"><br />
{{Box-header|title=Developer manual}}<br />
These pages are aimed towards contributors to GenomeView and developers who work directly on the GenomeView code.<br />
<br />
Currently this documentation lives on [https://sourceforge.net/p/genomeview/wiki/ Sourceforge].<br />
<br />
</div><br />
</div><br />
<div style="clear:both;"></div><br />
<br />
<br />
<br />
== Workshops, presentations and training ==</div>Thomashttps://manual.genomeview.org/index.php?title=Main_Page&diff=171Main Page2013-09-06T15:23:01Z<p>Thomas: </p>
<hr />
<div>{{TOC|align=right}}<br />
<br />
Welcome to the GenomeView manual. These pages aim to answer any questions you may have as an end-user, a platform-user or as a contributing developer.<br />
<br />
This documentation is completely open for anyone to contribute to, just click the edit-button near the top and you can help make this a better resource for everyone.<br />
<br />
'''At the moment we are transitioning the current documentation from genomeview.org/manual to this wiki. So not all information may be here yet. [[User:Thomas|Thomas]] ([[User talk:Thomas|talk]]) 19:18, 30 August 2013 (CEST)'''<br />
<br />
== [http://genomeview.org/content/quick-start-guide Getting started in 5 minutes with GenomeView] ==<br />
<br />
== Manuals and support ==<br />
<div style="float:left;width:49%"><br />
{{Box-header|title=User manual}}<br />
These manual pages describe how to use GenomeView and how to prepare your data to be usable in GenomeView.<br />
<br />
[[Starting GenomeView]]<br />
<br />
[[Preparing and loading data]] -- [[Preloaded data]]<br />
<br />
[[Navigation]] -- [[Keyboard short-cuts]]<br />
<br />
[[Visualizations]]<br />
<br />
[[Search for...]]<br />
<br />
[[Manipulating and configuring the views]]<br />
<br />
[[Editing annotations]]<br />
<br />
[[Exporting data and saving changes]]<br />
<br />
[[More functionality with plugins]]<br />
<br />
</div><br />
</div><br />
<br />
<div style="float:right;width:49%"><br />
{{Box-header|title=Getting help}}<br />
<br />
[[Frequently asked questions]]<br />
<br />
[[I have a problem, help me, please]]<br />
<br />
[[Report a bug or request a feature]]<br />
<br />
</div><br />
</div><br />
<br />
<br />
<br />
<div style="clear:both;"></div><br />
<br />
<br />
<div style="float:left;width:49%"><br />
{{Box-header|title=Platform-user manual}}<br />
These manual pages describe how to integrate GenomeView with your web-based platform or how to communicate with GenomeView from 3-rd party software. <br />
<br />
[[Using GenomeView from the command-line]]<br />
<br />
[[Integration]]<br />
<br />
[[Session files]] -- [[Configuration options]]<br />
<br />
[[Communicating with GenomeView]]<br />
<br />
[[Programming with GenomeView]]<br />
<br />
[[Setting up authentication and encryption]]<br />
<br />
[[Making a plugin]]<br />
<br />
[[Integrating GenomeView as an editor]]<br />
<br />
</div><br />
</div><br />
<br />
<div style="float:right;width:49%"><br />
{{Box-header|title=Developer manual}}<br />
These pages are aimed towards contributors to GenomeView and developers who work directly on the GenomeView code.<br />
<br />
Currently this documentation lives on [https://sourceforge.net/p/genomeview/wiki/ Sourceforge].<br />
<br />
</div><br />
</div><br />
<div style="clear:both;"></div><br />
<br />
<br />
<br />
== Workshops, presentations and training ==</div>Thomashttps://manual.genomeview.org/index.php?title=Main_Page&diff=170Main Page2013-09-06T15:22:33Z<p>Thomas: </p>
<hr />
<div>{{TOC|align=right}}<br />
<br />
Welcome to the GenomeView manual. These pages aim to answer any questions you may have as an end-user, a platform-user or as a contributing developer.<br />
<br />
This documentation is completely open for anyone to contribute to, just click the edit-button near the top and you can help make this a better resource for everyone.<br />
<br />
'''At the moment we are transitioning the current documentation from genomeview.org/manual to this wiki. So not all information may be here yet. [[User:Thomas|Thomas]] ([[User talk:Thomas|talk]]) 19:18, 30 August 2013 (CEST)'''<br />
<br />
== The 5 minute quick-start guide ==<br />
[http://genomeview.org/content/quick-start-guide Getting started in 5 minutes with GenomeView]<br />
<br />
== Manuals and support ==<br />
<div style="float:left;width:49%"><br />
{{Box-header|title=User manual}}<br />
These manual pages describe how to use GenomeView and how to prepare your data to be usable in GenomeView.<br />
<br />
[[Starting GenomeView]]<br />
<br />
[[Preparing and loading data]] -- [[Preloaded data]]<br />
<br />
[[Navigation]] -- [[Keyboard short-cuts]]<br />
<br />
[[Visualizations]]<br />
<br />
[[Search for...]]<br />
<br />
[[Manipulating and configuring the views]]<br />
<br />
[[Editing annotations]]<br />
<br />
[[Exporting data and saving changes]]<br />
<br />
[[More functionality with plugins]]<br />
<br />
</div><br />
</div><br />
<br />
<div style="float:right;width:49%"><br />
{{Box-header|title=Getting help}}<br />
<br />
[[Frequently asked questions]]<br />
<br />
[[I have a problem, help me, please]]<br />
<br />
[[Report a bug or request a feature]]<br />
<br />
</div><br />
</div><br />
<br />
<br />
<br />
<div style="clear:both;"></div><br />
<br />
<br />
<div style="float:left;width:49%"><br />
{{Box-header|title=Platform-user manual}}<br />
These manual pages describe how to integrate GenomeView with your web-based platform or how to communicate with GenomeView from 3-rd party software. <br />
<br />
[[Using GenomeView from the command-line]]<br />
<br />
[[Integration]]<br />
<br />
[[Session files]] -- [[Configuration options]]<br />
<br />
[[Communicating with GenomeView]]<br />
<br />
[[Programming with GenomeView]]<br />
<br />
[[Setting up authentication and encryption]]<br />
<br />
[[Making a plugin]]<br />
<br />
[[Integrating GenomeView as an editor]]<br />
<br />
</div><br />
</div><br />
<br />
<div style="float:right;width:49%"><br />
{{Box-header|title=Developer manual}}<br />
These pages are aimed towards contributors to GenomeView and developers who work directly on the GenomeView code.<br />
<br />
Currently this documentation lives on [https://sourceforge.net/p/genomeview/wiki/ Sourceforge].<br />
<br />
</div><br />
</div><br />
<div style="clear:both;"></div><br />
<br />
<br />
<br />
== Workshops, presentations and training ==</div>Thomashttps://manual.genomeview.org/index.php?title=Index_GFF&diff=169Index GFF2013-09-06T14:10:04Z<p>Thomas: </p>
<hr />
<div>Large feature files need to be indexed before you can use them properly in GenomeView.<br />
<br />
The definition of large is not strict in the sense that it depends on both the real size of the file, as well as the number of features in the file. <br />
<br />
<strong>Recommendations:</strong><br />
<ul><br />
<li>Only index feature files that are larger than 5 Mb or even 10 Mb when compressed with GZIP.</li><br />
<li>Only index feature files that do NOT contain composite features, i.e. feature consisting of multiple locations. The prime example are eukaryote genes. </li><br />
<li>Typically the annotation of a genome does not need to be indexed if it just contains genes, mRNA, CDS and exons</li><br />
<li>You should not included multiple types (mRNA,CDS, ...) of annotation in a single file as all features will be loaded in a single track with the file name as label. We suggest you put each type in its own file.</li><br />
</ul><br />
<br />
<strong>Instructions:</strong><br />
To index a file, you need to pre-process it with tabix, much like is done with pile-up files.<br />
<br />
Tabix can be downloaded from the [https://sourceforge.net/projects/samtools/files/tabix/ tabix download page].<br />
<br />
== BED formatted files ==<br />
<br />
sort -k1,1 -k2,2n input.bed | bgzip -c > compressed.bed.bgz<br />
tabix -p bed compressed.bed.bgz<br />
<br />
You will get two new files: (1) a bgz file and (2) a tbi file. Load the bgz file in GenomeView.<br />
<br />
<em>Note that indexing will not work with BED files that have a UCSC header ("track name=blah")</em><br />
<br />
== GFF formatted file ==<br />
<br />
<b>Warning!: Compound features will be broken up during indexing of gff files.</b><br />
<br />
sort -T /group/tmp -k1,1 -k4,4n input.gff | bgzip -c > compressed.gff.bgz<br />
tabix -p gff compressed.gff.bgz<br />
<br />
<br />
You will get two new files: (1) a bgz file and (2) a tbi file. Load the bgz file in GenomeView.<br />
<br />
<em>Note that the structure of genes and the type of annotation features will be lost when indexing gff files.</em></div>Thomashttps://manual.genomeview.org/index.php?title=Index_GFF&diff=168Index GFF2013-09-06T14:06:09Z<p>Thomas: </p>
<hr />
<div>Large feature files need to be indexed before you can use them properly in GenomeView.<br />
<br />
The definition of large is not strict in the sense that it depends on both the real size of the file, as well as the number of features in the file. <br />
<br />
<strong>Recommendations:</strong><br />
<ul><br />
<li>Only index feature files that are larger than 5 Mb or even 10 Mb when compressed with GZIP.</li><br />
<li>Only index feature files that do contain composite features, i.e. feature consisting of multiple locations. The prime example are eukaryote genes. </li><br />
<li>Typically the annotation of a genome does not need to be indexed if it just contains genes, mRNA, CDS and exons</li><br />
<li>You should not included multiple types (mRNA,CDS, ...) of annotation in a single file as all features will be loaded in a single track with the file name as label. We suggest you put each type in its own file.</li><br />
</ul><br />
<br />
<strong>Instructions:</strong><br />
To index a file, you need to pre-process it with tabix, much like is done with pile-up files.<br />
<br />
Tabix can be downloaded from the [https://sourceforge.net/projects/samtools/files/tabix/ tabix download page].<br />
<br />
== BED formatted files ==<br />
<br />
sort -k1,1 -k2,2n input.bed | bgzip -c > compressed.bed.bgz<br />
tabix -p bed compressed.bed.bgz<br />
<br />
You will get two new files: (1) a bgz file and (2) a tbi file. Load the bgz file in GenomeView.<br />
<br />
<em>Note that indexing will not work with BED files that have a UCSC header ("track name=blah")</em><br />
<br />
== GFF formatted file ==<br />
<br />
<b>Warning!: Compound features will be broken up during indexing of gff files.</b><br />
<br />
sort -T /group/tmp -k1,1 -k4,4n input.gff | bgzip -c > compressed.gff.bgz<br />
tabix -p gff compressed.gff.bgz<br />
<br />
<br />
You will get two new files: (1) a bgz file and (2) a tbi file. Load the bgz file in GenomeView.<br />
<br />
<em>Note that the structure of genes and the type of annotation features will be lost when indexing gff files.</em></div>Thomashttps://manual.genomeview.org/index.php?title=Index_GFF&diff=167Index GFF2013-09-06T13:54:41Z<p>Thomas: </p>
<hr />
<div>Large feature files need to be indexed before you can use them properly in GenomeView.<br />
<br />
The definition of large is not strict in the sense that it depends on both the real size of the file, as well as the number of features in the file. <br />
<br />
<strong>Recommendations:</strong><br />
<ul><br />
<li>Only index feature files that are larger than 5 Mb or even 10 Mb when compressed with GZIP.</li><br />
<li>Only index feature files that do contain composite features, i.e. feature consisting of multiple locations. The prime example are eukaryote genes. <b>Compound features will be broken up during indexing.</b></li><br />
<li>Typically the annotation of a genome does not need to be indexed if it just contains genes, mRNA, CDS and exons</li><br />
<li>You should not included multiple types (mRNA,CDS, ...) of annotation in a single file as all features will be loaded in a single track with the file name as label. We suggest you put each type in its own file.</li><br />
</ul><br />
<br />
<strong>Instructions:</strong><br />
To index a file, you need to pre-process it with tabix, much like is done with pile-up files.<br />
<br />
Tabix can be downloaded from the [https://sourceforge.net/projects/samtools/files/tabix/ tabix download page].<br />
<br />
== BED formatted files ==<br />
<br />
sort -k1,1 -k2,2n input.bed | bgzip -c > compressed.bed.bgz<br />
tabix -p bed compressed.bed.bgz<br />
<br />
You will get two new files: (1) a bgz file and (2) a tbi file. Load the bgz file in GenomeView.<br />
<br />
<em>Note that indexing will not work with BED files that have a UCSC header ("track name=blah")</em><br />
<br />
== GFF formatted file ==<br />
<br />
sort -T /group/tmp -k1,1 -k4,4n input.gff | bgzip -c > compressed.gff.bgz<br />
tabix -p gff compressed.gff.bgz<br />
<br />
<br />
You will get two new files: (1) a bgz file and (2) a tbi file. Load the bgz file in GenomeView.<br />
<br />
<em>Note that the structure of genes and the type of annotation features will be lost when indexing gff files.</em></div>Thomashttps://manual.genomeview.org/index.php?title=Index_GFF&diff=166Index GFF2013-09-06T13:53:45Z<p>Thomas: </p>
<hr />
<div>Large feature files need to be indexed before you can use them properly in GenomeView.<br />
<br />
The definition of large is not strict in the sense that it depends on both the real size of the file, as well as the number of features in the file. <br />
<br />
<strong>Recommendations:</strong><br />
<ul><br />
<li>Only index feature files that are larger than 5 Mb or even 10 Mb when compressed with GZIP.</li><br />
<li>Only index feature files that do contain composite features, i.e. feature consisting of multiple locations. The prime example are eukaryote genes. <b>Compound features will be broken up during indexing.</b></li><br />
<li>Typically the annotation of a genome does not need to be indexed if it just contains genes, mRNA, CDS and exons</li><br />
<li>You should not included multiple types (mRNA,CDS, ...) of annotation in a single file as all features will be loaded in a single track with the file name as label. We suggest you put each type in its own file.</li><br />
</ul><br />
<br />
<strong>Instructions:</strong><br />
To index a file, you need to pre-process it with tabix, much like is done with pile-up files.<br />
<br />
Tabix can be downloaded from the [https://sourceforge.net/projects/samtools/files/tabix/ tabix download page].<br />
<br />
== BED formatted files ==<br />
<br />
sort -k1,1 -k2,2n input.bed | bgzip -c > compressed.bed.gz<br />
tabix -p bed compressed.bed.gz<br />
<br />
<br />
<em>Note that indexing will not work with BED files that have a UCSC header ("track name=blah")</em><br />
<br />
== GFF formatted file ==<br />
<br />
sort -T /group/tmp -k1,1 -k4,4n input.gff | bgzip -c > compressed.gff.gz<br />
tabix -p gff compressed.gff.gz<br />
<br />
<br />
In both cases, you will get two new files: (1) a gz file and (2) a tbi file.<br />
Load the gz file in GenomeView.<br />
<br />
<strong>Caveat: The structure of genes and the type of annotation features will be lost when indexing gff files.</strong></div>Thomashttps://manual.genomeview.org/index.php?title=Index_GFF&diff=165Index GFF2013-09-06T13:53:12Z<p>Thomas: </p>
<hr />
<div>Large feature files need to be indexed before you can use them properly in GenomeView.<br />
<br />
The definition of large is not strict in the sense that it depends on both the real size of the file, as well as the number of features in the file. <br />
<br />
<strong>Recommendations:</strong><br />
<ul><br />
<li>Only index feature files that are larger than 5 Mb or even 10 Mb when compressed with GZIP.</li><br />
<li>Only index feature files that do contain composite features, i.e. feature consisting of multiple locations. The prime example are eukaryote genes. <b>Compound features will be broken up during indexing.</b></li><br />
<li>Typically the annotation of a genome does not need to be indexed if it just contains genes, mRNA, CDS and exons</li><br />
<li>You should not included multiple types (mRNA,CDS, ...) of annotation in a single file as all features will be loaded in a single track with the file name as label. We suggest you put each type in its own file.</li><br />
</ul><br />
<br />
<strong>Instructions:</strong><br />
To index a file, you need to pre-process it with tabix, much like is done with pile-up files.<br />
<br />
Tabix can be downloaded from the [https://sourceforge.net/projects/samtools/files/tabix/ tabix download page].<br />
<br />
For BED formatted files:<br />
<br />
sort -k1,1 -k2,2n input.bed | bgzip -c > compressed.bed.gz<br />
tabix -p bed compressed.bed.gz<br />
<br />
<br />
<em>Note that indexing will not work with BED files that have a UCSC header ("track name=blah")</em><br />
<br />
For GFF formatted files:<br />
<br />
sort -T /group/tmp -k1,1 -k4,4n input.gff | bgzip -c > compressed.gff.gz<br />
tabix -p gff compressed.gff.gz<br />
<br />
<br />
In both cases, you will get two new files: (1) a gz file and (2) a tbi file.<br />
Load the gz file in GenomeView.<br />
<br />
<strong>Caveat:</strong><br />
The structure of genes and the type of annotation features will be lost when indexing gff files.</div>Thomashttps://manual.genomeview.org/index.php?title=Index_GFF&diff=164Index GFF2013-09-06T13:53:05Z<p>Thomas: </p>
<hr />
<div>Large feature files need to be indexed before you can use them properly in GenomeView.<br />
<br />
The definition of large is not strict in the sense that it depends on both the real size of the file, as well as the number of features in the file. <br />
<br />
<strong>Recommendations:</strong><br />
<ul><br />
<li>Only index feature files that are larger than 5 Mb or even 10 Mb when compressed with GZIP.</li><br />
<li>Only index feature files that do contain composite features, i.e. feature consisting of multiple locations. The prime example are eukaryote genes. <b>Compound features will be broken up during indexing.</b></li><br />
<li>Typically the annotation of a genome does not need to be indexed if it just contains genes, mRNA, CDS and exons</li><br />
<li>You should not included multiple types (mRNA,CDS, ...) of annotation in a single file as all features will be loaded in a single track with the file name as label. We suggest you put each type in its own file.</li><br />
</ul><br />
<br />
<strong>Instructions:</strong><br />
To index a file, you need to pre-process it with tabix, much like is done with pile-up files.<br />
<br />
Tabix can be downloaded from the "[https://sourceforge.net/projects/samtools/files/tabix/ tabix download page].<br />
<br />
For BED formatted files:<br />
<br />
sort -k1,1 -k2,2n input.bed | bgzip -c > compressed.bed.gz<br />
tabix -p bed compressed.bed.gz<br />
<br />
<br />
<em>Note that indexing will not work with BED files that have a UCSC header ("track name=blah")</em><br />
<br />
For GFF formatted files:<br />
<br />
sort -T /group/tmp -k1,1 -k4,4n input.gff | bgzip -c > compressed.gff.gz<br />
tabix -p gff compressed.gff.gz<br />
<br />
<br />
In both cases, you will get two new files: (1) a gz file and (2) a tbi file.<br />
Load the gz file in GenomeView.<br />
<br />
<strong>Caveat:</strong><br />
The structure of genes and the type of annotation features will be lost when indexing gff files.</div>Thomas