Unix Basics Quick Review and Problem Set

useful commands

command	description
`ls`	list directory contents
`cd`	change directory
`mkdir`	make a directory
`rm`	remove, or delete files and directories. Use caution, it is easy to delete more that you want.
`head`	prints the top few lines to the terminal window
`tail`	prints the last few lines to the terminal window
`sort`	sorts the lines
`uniq`	prints the unique lines
`grep`	filnds the lines that contain a pattern
`wc`	counts the number of lines, characters and words
`mv`	move files
`cp`	copy files
`date`	returns the current date and time
`pwd`	return working directory name
`ssh`	remote login
`scp`	remote secure copy
`~`	shortcut for your home directory
`man <command>`	manual page for the command e.g. `man ls` to get the man page for `ls`

Try these examples

The files you need later in this review are in our github repository. There will be direction on how to retrieve them

Let's go to a directory with a lot of files in it and list those files

cd /bin/
ls

What's the difference between these two commands?

Try them both!!

ls -l
ls -lt

Pipes

You can string more than one command together with a pipe | , such that the standard output of the first command is 'piped' into the standard input of the second command.

Try it!!

ls -lt | head

Semicolons

You can string more than one command together by putting a semi-colon ; after the each command. Here, the commands will be run sequentially, but any output does not get passed from one command to the next.

Try it!!

date ; sleep 2 ; date

If you want to know more about sleep type man sleep

Download a file.

Change directory to your home directory. You likely have permissions to write to your home directory. Now use wget or curl to download files. On some systems only one of these may be available

cd ~
curl -O https://raw.githubusercontent.com/prog4biol/pfb2019/master/files/cuffdiff.txt

Redirect STDOUT
You can redirect the output of a command into a file.

cd ~
grep Chr7 cuffdiff.txt > fav_chr_cuffdiff.txt

Append STDOUT to the end of a file that already exists

You can append the output of a command to a file

grep Chr9 cuffdiff.txt >> fav_chr_cuffdiff.txt

Redirect STDERR

You can redirect STDERR to a file.

Let's review what STDERR actually is.

cat blablabla.txt

file blablabla.txt does not exist so we get cat: blablabla.txt: No such file or directory printed to the terminal. This message is labeled by the operating system as an error message or STDERR.

STDERR is a labeled type of output we can redirect

cat blablabla.txt 2> errors.txt

We can redirect the error messages, A.K.A. STDERR, to a new file called anything we want

What happens when you try to redirect STDOUT?

cat blablabla.txt > errors.txt

cat: blablabla.txt: No such file or directory still gets printed to the screen because we only redirect STDOUT to our file. There is no STDOUT in this case and our file will be empty. How would you verify this?

Redirect STDOUT and STDERR

You can redirect both STDOUT and STDERR to two separate files in one command.

# just print it to the terminal first
cat fav_chr_cuffdiff.txt blablabla.file

# redirect to two files, STDOUT to out.txt, STDERR to err.txt 
cat fav_chr_cuffdiff.txt blablabla.file 1> out.txt 2> err.txt

# this does the same, do you see the difference?
cat fav_chr_cuffdiff.txt blablabla.file > out.txt 2> err.txt

Examine the contents of out.txt and err.txt

You can also redirect both STDOUT and STDERR to the same file.

cat fav_chr_cuffdiff.txt blablabla.file &> all_out_err.txt

Check out what is in the all_out_err.txt

Problem Set

Log into your machine.
What is the full path to your home directory?
Go up one directory?
- How many files does it contain?
- How many directories?
Make a directory called problemsets in your home directory.
Navigate into this new directory called problemsets. Verify that you are in the correct directory by using pwd.
Use wget to copy https://raw.githubusercontent.com/prog4biol/pfb2019/master/files/sequences.nt.fa from the web into your problemsets directory. If wget is not available on your system, use curl -O as an alternative.
Without using a text editor use unix commands to find these qualities for the file sequences.nt.fa. This file can be found here https://raw.githubusercontent.com/prog4biol/pfb2019/master/files/sequences.nt.fa
- How many lines does this file contain?
- How many characters? (Hint: check out the options of wc)
- What is the first line of this file? (Hint: read the man page of head)
- What are the last 3 lines? (Hint: read the man page of tail)
- How many sequences are in the file? (Hint: use grep) (Note: The start of a sequence is indicated by a > character.)
Rename sequences.nt.fa to cancer_genes.fasta. (Hint: read the man page for mv)
Copy this remote file, cuffdiff.txt, to your problemset directory. Here is the url you can use: https://raw.githubusercontent.com/prog4biol/pfb2019/master/files/cuffdiff.txt

Use wget to copy https://raw.githubusercontent.com/prog4biol/pfb2019/master/files/cuffdiff.txt from the web into your problemsets directory. If wget is not available on your system, use curl -O as an alternative.

Do the following to cuffdiff.txt. The descriptions of each column in the file are in the table below.
- Look at the first few lines of the file
- Sort the file by log fold change 'log2(fold_change)', from highest to lowest, and save in a new file in your directory called sorted.cuffdiff.out
- Sort the file (log fold change highest to lowest) then print out only the first 100 lines. Save in a file called top100.sorted.cuffdiff.out.
- Sort the file by log fold change, print out the top 100, print only first column. This will be a list of the top 100 genes with the largest change in expression. Make sure your list is sorted by gene name and is unique. Save this curated list in a file called differentially.expressed.genes.txt.

Cuffdiff file format

Column number	Column name	Example	Description
1	Tested id	XLOC_000001	A unique identifier describing the transcipt, gene, primary transcript, or CDS being tested
2	Tested id	XLOC_000001	A unique identifier describing the transcipt, gene, primary transcript, or CDS being tested
3	gene	Lypla1	The gene_name(s) or gene_id(s) being tested
4	locus	chr1:4797771-4835363	Genomic coordinates for easy browsing to the genes or transcripts being tested.
5	sample 1	Liver	Label (or number if no labels provided) of the first sample being tested
6	sample 2	Brain	Label (or number if no labels provided) of the second sample being tested
7	Test status	NOTEST	Can be one of OK (test successful), NOTEST (not enough alignments for testing), LOWDATA (too complex or shallowly sequenced), HIDATA (too many fragments in locus), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing.
8	FPKMx	8.01089	FPKM of the gene in sample x
9	FPKMy	8.551545	FPKM of the gene in sample y
10	log2(FPKMy/FPKMx)	0.06531	The (base 2) log of the fold change y/x
11	test stat	0.860902	The value of the test statistic used to compute significance of the observed change in FPKM
12	p value	0.389292	The uncorrected p-value of the test statistic
13	q value	0.985216	The FDR-adjusted p-value of the test statistic
14	significant	no	Can be either "yes" or "no", depending on whether p is greater then the FDR after Benjamini-Hochberg correction for multiple-testing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unix_01_problemset.md

Unix_01_problemset.md

Unix Basics Quick Review and Problem Set

useful commands

Try these examples

Problem Set

Files

Unix_01_problemset.md

Latest commit

History

Unix_01_problemset.md

File metadata and controls

Unix Basics Quick Review and Problem Set

useful commands

Try these examples

Problem Set