We can visualize our training and validation processes via Nsight Systems. The comparison of the two approaches, using basic PyTorch data loader and NVIDIA DALI, is shown below. The gaps between the epochs are seen in PyTorch Data Loader, while there are no gaps between epochs while using DALI DataLoader. We can zoom into this graph when using this Nsight systems to figure out if our processes are suffering through GPU starvation, i.e. GPU in halt stage waiting for data to be loaded, or insufficient CPU parallelization, and many more.
More info: https://developer.nvidia.com/nsight-systems
At the end of this page, the process for profiling is given. The difference in performance and processing time is seen working with large amount of data.
The demonstration below is shown for 128 files running on ResNet34 for 4 epochs.
1. Nsight systems using PyTorch Data Loader. CUDA ran from 1.58 to 63.19s
2. Nsight systems using Dali Data Loader. CUDA ran from 5.68 to 65.61s.
Make a script file named nvidia_bash.sh. The process for profiling.
#!/bin/bash
cd {path_of_directory_that_contains_your_bash_file}
source {path_to_our_environment}/bin/activate
python {path_to_your_python_script_file}/main.py