The Workflow Layout

All computational work is done on the TACC supercomputer at UT. I was allocated to the Frontera machine, and my first task was to ssh into the supercomputer so that I could set up my files and directories in order to get working.

ssh -X ekun@frontera.tacc.utexas.edu

The main directories I use are

/home1/08068/ekun

and

/work2/08068/ekun/frontera

as well as

/scratch1/08068/ekun

The main website used to create Jupyter Notebooks for FastAI and creating my own classifiers. All Jupyter Notebooks created on vis.tacc.utexas.edu are stored on the home directory

All UKBiobank directory metadata and images are located at

/corral-repl/utexas/UKB-Imaging-Genetics/

This repo has a lot which I still need to explore, but all the DXA Images are located under

/corral-repl/utexas/UKB-Imaging-Genetics/Imaging_Data/DXA/DXA_Images

This folder has all patient EID zip files with their images.

/corral-repl/utexas/UKB-Imaging-Genetics/unzipped_DXA_Images

This folder has all patient EID files unzipped with their images as well as DXA images by body part

Other Directories of Interest:

Generated CSV files are stored in

/work2/08068/ekun/frontera/output_files

Currently a copy of all unzipped DXA images and the images separated by body parts are located in

/scratch1/08068/ekun/unzipped_DXA_Images

Navigating The Linux Terminal

My first issues arose after ssh-ing into TACC as the TACC UI is a Linux command line terminal. The most useful commands I learned were as follows:

cd /directory - Enters the directory specified
cd .. - Navigates back a directory
pwd - Displays path of current directory
ls - Displays files of current directory
ls -la - Displays all files of current directory and additional information
wc -l /file - gives line count of a file
- ls | wc -l allows you to pipe the line count command to the directory, giving the file count of the directory
head filename.txt or .csv - Displays first few lines of most files
less -S filename.txt or .csv - Allows you to easily view files
vim filename.txt or .sh allows you to begin writing a file whether it be a text file or bash script
- i - allows you to start writing inside vim
- :wq - save and close the file
scp -r /directory /destination - recursively copies a directory or file to a destination
rm /file - deletes a file
- rm -r - recursively deletes a folder and everything inside
grep "term" /file - searches a file for given term
- ls | grep "term" - searches current directory for a given term

Writing My First Bash Scripts

The DXA Images were all stored in .zip files with the following syntax: PatientEID_DataField_2_0.zip (ex: 1003186_20158_2_0.zip)

I first had to write a bash script to unzip the files and rename the folders containing the images to retain the patient EID. (ex: 1003186_20158_2_0_unzip)

filename unzip.sh
#!/bin/bash

echo "This unzips all the files in the directory and puts the unzipped files into directories named after the original zip folder"

for f in *.zip; do
    unzip -d "${f%.zip}_unzip" "$f"    
done

# To unzip the files and put them into a new directory outside of the current directory while renaming them:

for f in *.zip; do
    unzip -d "/corral-repl/utexas/UKB-Imaging-Genetics/unzipped_DXA_Images/${f%.zip}_unzip" "$f"       
done