Problem 1 - What File Name Corresponds to What Image?

All DXA Images were obtained from the UK Biobank bulk field ID 20158 in the format of Patient-EID_Bulk-field-ID_Visit-Number.zip (6024685_20158_2_0.zip). These folders typically had 8 images, labeled in this format:

Problem 2 - Full Body Xrays are Poorly Cropped

After the initial QC process and dividing up the images into a folder for each body part, I began to look at the full body transparent xrays in preparation for our project involving Human Pose Estimation (HPE). Many of the full body xrays had parts of the arm cut off from the final image, resulting in poor pose estimation from the deep learning architecture (HRnet/Resnet). In order to remove images that were cut off at the arm, I created a binary classifier to distinguish between the two types of images and applied it to the entire dataset of ~40,000 images.

Problem 3 - Xray Sizes Vary a Lot