West Virginia University Genomics Core Facility

Bioinformatics
You've got data. We turn it into information


This is how I do some of the standard analyses. I am constantly evaluating and changing my procedures, so this may be out of date. You will want to change file names, directory structure, and such. If running on the HPC, you'll need to wrap the commands in a script to submit. You may (will) have to load modules, and some of the commands may be (are) slightly different on different machines.

16S Metagenomics

Merge forward and reverse reads

for f in *R1*; do echo $f; f2=${f/R1/R2}; n=`echo $f | cut -d '-' -f 2 | cut -d '_' -f 1`; echo $n; echo ''; flash $f $f2 -d ../qiime/paired -o ../qiime/$n -M 120;  done

Split Libraries

Do this step even if your files are already split. It puts all the sequence in a single file, with correct names. This makes everything easier later. The map_not_multiplexed.txt file must exist, but it doesn't need to have anything in it. This produces a file called seqs.fa.
split_libraries_fastq.py -i pre701.fastq,pre702.fastq,post701.fastq,post702.fastq -o otu --sample_id pre701,pre702,post701,post702 -m map_not_multiplexed.txt --barcode_type 'not-barcoded'

Pick OTUs

pick_open_reference_otus.py -i seqs.fna -o otus -a -O 48

Add metadata to the biom file

Metadata should be in a tab delimited file like:
#SampleID       Num     PrePost Post    Number  phinchID
post701 	s701    post    1       701     post701
post702 	s702    post    1       702     post702
biom add-metadata -i otu_table_mc2_w_tax_no_pynast_failures.biom -o wMeta.biom --sample-metadata-fp metadata

Convert to old biom format

Do this if you want to visualize in phinch or just want a text based file.
biom convert -i wMeta.biom -o Cuff_July2015.biom --to-json

Do Diversity

This command does several analyses, and creates a webpage to visualize them, including Principal Component Analysis and rarefaction curves.
core_diversity_analyses.py -o cdout -i ../qiime/otu_table_mc2_w_tax_no_pynast_failures.biom -m ../qiime/metadata -t ../qiime/rep_set.tre -e 219000
The -e is the minimum number of reads a sample must have to be included in the analysis. You can get an idea of what is good by running biom summarize-table



For questions, help, or to offer a beer, get in touch with the bioinformatician, Niel Infante