Wrangling Genomics: Setup

Table of Content

Amazon Cloud

Most of the genomics lessons from data carpentry currently use amazon cloud.

We don’t currently know if we’ll keep using amazon cloud or not.

Required software

FastQC

FastQC provides a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.

FastQC is available for Linux, MacOS and Windows.

Trimmomatic

Trimmomatic is a java based program that can remove sequencer specific reads and nucleotides that fall below a certain threshold. Trimmomatic can be multithreaded to run quickly.

Trimmomatic is available for Linux, MacOS and Windows.

BWA

Bwa is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Bwa is available for Linux and MacOS.

SAMtools

SAMtools is a suite of programs for interacting with high-throughput sequencing data. Samtools can read/write/edit/index/view SAM/BAM/CRAM format.

SAMtools is available for Linux and MacOS.

bcftools

BCFtools is a program for variant calling and manipulating files in the Variant Call Format (VCF) and its binary counterpart BCF.

BCFtools is available for Linuc and MacOS

IGV

IGV is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets.

IGV is available for Linux, MacOS and Windows.

Required Data

You will also need to download a data tarball of a reference genome and fastq files for E. coli: