Terminally Chill: Getting Comfortable with the Command Line
The command line can feel intimidating when you’re new. But trust me, it doesn’t have to be. In fact, getting comfortable with the terminal is one of the best things you can do as a bioinformatician. Let’s make it less scary and a bit more fun.
Why Bother with the Terminal?
- Most bioinformatics tools are command-line (CLI) only
- You’ll save time compared to using Graphic User Interfaces (GUIs)
- It’s more reproducible** (easier to paste a command than describe a series of clicks)
- Scripts and pipelines (like Snakemake or Nextflow) are terminal-native
Tips for Beginners
Know Where You Are
pwd # Print working directory
ls # List files
cd folder/ # Change directories
Navigation Shortcuts
cd .. # Go up one directory
cd - # Go to previous directory
cd ~ # Go to your home folder
Tab Completion
Just hit the Tab
key to autocomplete file/folder names. Double Tab
shows options. This can save you so much time when filling out long file names or paths.
Command History
- Press the
Up
key to scroll through previous commands - You can also search through the command history with:
ctrl + r
Reuse Previous Commands
!! # Repeat last command
!sam # Repeat last command starting with 'sam'
Common Commands Every Bioinformatician Should Know
cp file1.txt file2.txt # Create a copy of a file in the same directory
mv file1.txt file2.txt # Rename a file
mv path1/file1.txt path2/ # Move the file and keep the same name
nano file.txt # Edit a file using the terminal editor nano
head file.txt # View the first 10 lines of a file
tail file.txt # View the last 10 lines
cat file.txt # Print the entire file
less file.txt # Scroll through large files
wc -l file.txt # Count the number of lines
cut -f1,3 file.txt # Cut specific fields from a tab-delimited file
grep "pattern" file # Find matching lines
sort file.txt # Sort the lines in a file
uniq file.txt # Remove duplicate lines
find . -name "*.fq" # Find all FASTQ files recursively
Useful Aliases for Productivity
Here are some of my favorite aliases (shortcuts) that I use. Add these to your ~/.bashrc
or ~/.zshrc
depending on which shell you are using:
alias ll='ls -lah'
alias la='ls -A'
alias gs='git status'
alias mkdircd='f() { mkdir -p "$1"; cd "$1"; }; f'
alias grep='grep --color=auto'
alias lsg='ls | grep'
Apply changes with:
source ~/.bashrc # or source ~/.zshrc
Favorite Tools & Utilities
htop
– Great task manager to view resource usetree
– Shows folder structurestmux
– Split terminal into panes and keep sessions alivebat
– Prettiercat
with syntax highlighting
Learning Resources
- The Art of Command Line - Great resource for various commands and use cases
- Explainshell – Explains complex bash commands
man command
– Manual pages for different commands (must replace command with actual name)
Bioinformatics-Specific Tricks
- Use
zcat
to preview gzipped FASTQ files without having to unzip the file:zcat sample.fastq.gz | head -n 8
- Count reads in a FASTQ file:
zcat sample.fastq.gz | echo $((`wc -l` / 4))
- Loop through samples:
for file in *.fastq.gz; do echo "Processing $file"; done
Final Thoughts
You don’t need to be a Linux wizard to get stuff done in bioinformatics. Just learning a few commands and habits can seriously boost your productivity and confidence Start with the basics and you’ll become more comfortable as you go!
Let me know in the comments what are some of your favorite shortcuts or commands!
Next up: we kick off a series of posts that will be covering reference-based vs de novo assembly strategies. We will start off by explaining the difference between short reads and long reads, focusing on when to use either (or both!) options.
Comments