My Early Journey in TACC, Navigating Linux, and More
Brief notes over my workflow organization and linux terminal commands
The Workflow Layout
All computational work is done on the TACC supercomputer at UT. I was allocated to the Frontera machine, and my first task was to ssh into the supercomputer so that I could set up my files and directories in order to get working.
ssh -X ekun@frontera.tacc.utexas.edu
The main directories I use are
- /home1/08068/ekun
and
- /work2/08068/ekun/frontera
as well as
- /scratch1/08068/ekun
The main website used to create Jupyter Notebooks for FastAI and creating my own classifiers. All Jupyter Notebooks created on vis.tacc.utexas.edu are stored on the home directory
All UKBiobank directory metadata and images are located at
- /corral-repl/utexas/UKB-Imaging-Genetics/
This repo has a lot which I still need to explore, but all the DXA Images are located under
- /corral-repl/utexas/UKB-Imaging-Genetics/Imaging_Data/DXA/DXA_Images
This folder has all patient EID zip files with their images.
- /corral-repl/utexas/UKB-Imaging-Genetics/unzipped_DXA_Images
This folder has all patient EID files unzipped with their images as well as DXA images by body part
Other Directories of Interest:
Generated CSV files are stored in
- /work2/08068/ekun/frontera/output_files
Currently a copy of all unzipped DXA images and the images separated by body parts are located in
- /scratch1/08068/ekun/unzipped_DXA_Images
Navigating The Linux Terminal
My first issues arose after ssh-ing into TACC as the TACC UI is a Linux command line terminal. The most useful commands I learned were as follows:
- cd /directory - Enters the directory specified
- cd .. - Navigates back a directory
- pwd - Displays path of current directory
- ls - Displays files of current directory
- ls -la - Displays all files of current directory and additional information
- wc -l /file - gives line count of a file
- ls | wc -l allows you to pipe the line count command to the directory, giving the file count of the directory
- head filename.txt or .csv - Displays first few lines of most files
- less -S filename.txt or .csv - Allows you to easily view files
- vim filename.txt or .sh allows you to begin writing a file whether it be a text file or bash script
- i - allows you to start writing inside vim
- :wq - save and close the file
- scp -r /directory /destination - recursively copies a directory or file to a destination
- rm /file - deletes a file
- rm -r - recursively deletes a folder and everything inside
- grep "term" /file - searches a file for given term
- ls | grep "term" - searches current directory for a given term
Writing My First Bash Scripts
The DXA Images were all stored in .zip files with the following syntax: PatientEID_DataField_2_0.zip (ex: 1003186_20158_2_0.zip)
I first had to write a bash script to unzip the files and rename the folders containing the images to retain the patient EID. (ex: 1003186_20158_2_0_unzip)
filename unzip.sh
#!/bin/bash
echo "This unzips all the files in the directory and puts the unzipped files into directories named after the original zip folder"
for f in *.zip; do
unzip -d "${f%.zip}_unzip" "$f"
done
# To unzip the files and put them into a new directory outside of the current directory while renaming them:
for f in *.zip; do
unzip -d "/corral-repl/utexas/UKB-Imaging-Genetics/unzipped_DXA_Images/${f%.zip}_unzip" "$f"
done