West Virginia University Genomics Core Facility

You've got data. We turn it into information

This is how I do some of the standard analyses. I am constantly evaluating and changing my procedures, so this may be out of date. You will want to change file names, directory structure, and such. If running on the HPC, you'll need to wrap the commands in a script to submit. You may (will) have to load modules, and some of the commands may be (are) slightly different on different machines.

Get sequence from fasta

Get one sequence

perl -ne 'if(/^>\S+\s+(\d+)\s+/){$c= ($1 > 150000)? 1 : 0}print if $c' A1-scaffolds.fa > A1_big.fa

Get sequence that fits criterion

perl -ne 'if(/^>(\w+)\s+/){$p=($1 eq 'comp18584_c0_seq2') ? 1 : 0}print if $p' Trinity.fasta > c18584

Get many named sequences

cat contig.list | while read c; do perl -sne 'if(/^>(\w+)\s+/){$p=($1 eq $con) ? 1 : 0}print if $p' -- -con=$c Trinity.fasta  >> out.fa ; done

Different way to do the same thing

Build an index first
samtools faidx in.fa
xargs samtools faidx in.fa < names > out.fa

For questions, help, or to offer a beer, get in touch with the bioinformatician, Niel Infante