Intel® VTune Profiler 2022.2.0

Recommendations:

EU Array Stalled/Idle: 71.0% of Elapsed time with GPU busy
GPU metrics detect some kernel issues. Use GPU Compute/Media Hotspots (preview) to understand how well your application runs on the specified hardware.
Execution % of Total Time: 46.7%
Execution time on the device is less than memory transfer time. Make sure your offload schema is optimal. Use Intel Advisor tool to get an insight into possible causes for inefficient offload.