West Virginia University Genomics Core Facility

Bioinformatics
You've got data. We turn it into information


This is how I do some of the standard analyses. I am constantly evaluating and changing my procedures, so this may be out of date. You will want to change file names, directory structure, and such. If running on the HPC, you'll need to wrap the commands in a script to submit. You may (will) have to load modules, and some of the commands may be (are) slightly different on different machines.

Random Tasks

Filter from sam file


samtools view  -f 2 -F 2828 RNA2.sam > out.sam

-f Keep flag
-F Skip flag
Decimal	Description
1	template having multiple segments in sequencing
2	each segment properly aligned according to the aligner
4	segment unmapped
8	next segment in the template unmapped
16	SEQ being reverse complemented
32	SEQ of the next segment in the template being reversed
64	the first segment in the template
128	the last segment in the template
256	secondary alignment
512	not passing quality controls
1024	PCR or optical duplicate
2048	supplementary alignment
Online tool here. Before using, make sure your mapper sets the flags you are interested in.

Prepare UCSC gtf file for use

The UCSC gtf files have bad names. They need to be changed. Do this by also downloading the kgXref file, edit so that it is just UCSC,Name
cut -f1,5 kgXref.table | sed '1d' > conversion.tmp
tr "\t" "," < conversion.tmp > conversion.csv
sed -i 's/ /_/' conversion.csv
then run:
awk 'NR==FNR{A[$1]=$2;next} $2 in A{$2=A[$2]}1' FS=, conversion.csv FS=\" OFS=\" mm10.gtf > mm10_geneid.gtf
Got this from here



For questions, help, or to offer a beer, get in touch with the bioinformatician, Niel Infante