3 minute read

The command line can feel intimidating when you’re new. But trust me, it doesn’t have to be. In fact, getting comfortable with the terminal is one of the best things you can do as a bioinformatician. Let’s make it less scary and a bit more fun.

linux_terminal.jpg

Why Bother with the Terminal?

  • Most bioinformatics tools are command-line (CLI) only
  • You’ll save time compared to using Graphic User Interfaces (GUIs)
  • It’s more reproducible** (easier to paste a command than describe a series of clicks)
  • Scripts and pipelines (like Snakemake or Nextflow) are terminal-native

Tips for Beginners

Know Where You Are

pwd           # Print working directory
ls            # List files
cd folder/    # Change directories
cd ..   # Go up one directory
cd -    # Go to previous directory
cd ~    # Go to your home folder

Tab Completion

Just hit the Tab key to autocomplete file/folder names. Double Tab shows options. This can save you so much time when filling out long file names or paths.

Command History

  • Press the Up key to scroll through previous commands
  • You can also search through the command history with:
ctrl + r

Reuse Previous Commands

!!       # Repeat last command
!sam     # Repeat last command starting with 'sam'

Common Commands Every Bioinformatician Should Know

cp file1.txt file2.txt       # Create a copy of a file in the same directory
mv file1.txt file2.txt       # Rename a file
mv path1/file1.txt path2/    # Move the file and keep the same name
nano file.txt                # Edit a file using the terminal editor nano
head file.txt                # View the first 10 lines of a file
tail file.txt                # View the last 10 lines
cat file.txt                 # Print the entire file
less file.txt                # Scroll through large files
wc -l file.txt               # Count the number of lines
cut -f1,3 file.txt           # Cut specific fields from a tab-delimited file
grep "pattern" file          # Find matching lines
sort file.txt                # Sort the lines in a file
uniq file.txt                # Remove duplicate lines
find . -name "*.fq"          # Find all FASTQ files recursively

Useful Aliases for Productivity

Here are some of my favorite aliases (shortcuts) that I use. Add these to your ~/.bashrc or ~/.zshrc depending on which shell you are using:

alias ll='ls -lah'
alias la='ls -A'
alias gs='git status'
alias mkdircd='f() { mkdir -p "$1"; cd "$1"; }; f'
alias grep='grep --color=auto'
alias lsg='ls | grep'

Apply changes with:

source ~/.bashrc    # or source ~/.zshrc

Favorite Tools & Utilities

  • htop – Great task manager to view resource use
  • tree – Shows folder structures
  • tmux – Split terminal into panes and keep sessions alive
  • bat – Prettier cat with syntax highlighting

Learning Resources

  • The Art of Command Line - Great resource for various commands and use cases
  • Explainshell – Explains complex bash commands
  • man command – Manual pages for different commands (must replace command with actual name)

Bioinformatics-Specific Tricks

  • Use zcat to preview gzipped FASTQ files without having to unzip the file:
    zcat sample.fastq.gz | head -n 8
    
  • Count reads in a FASTQ file:
    zcat sample.fastq.gz | echo $((`wc -l` / 4))
    
  • Loop through samples:
    for file in *.fastq.gz;
    do echo "Processing $file";
    done
    

Final Thoughts

You don’t need to be a Linux wizard to get stuff done in bioinformatics. Just learning a few commands and habits can seriously boost your productivity and confidence Start with the basics and you’ll become more comfortable as you go!

Let me know in the comments what are some of your favorite shortcuts or commands!


Next up: we kick off a series of posts that will be covering reference-based vs de novo assembly strategies. We will start off by explaining the difference between short reads and long reads, focusing on when to use either (or both!) options.

Comments