Recommendations:
GPU Time, % of Elapsed time: 10.3%GPU utilization is low. Switch to the for in-depth analysis of host activity. Poor GPU utilization can prevent the application from offloading effectively.EU Array Stalled/Idle: 67.7% of Elapsed time with GPU busy
GPU metrics detect some kernel issues. Use GPU Compute/Media Hotspots (preview) to understand how well your application runs on the specified hardware.Execution % of Total Time: 45.3%
Execution time on the device is less than memory transfer time. Make sure your offload schema is optimal. Use Intel Advisor tool to get an insight into possible causes for inefficient offload.
GPU utilization is low. Consider offloading more work to the GPU to increase overall application performance.
| Function | Module | CPU Time |
|---|---|---|
| [Outside any known module] | [Unknown] | 4.349s |
| [Skipped stack frame(s)] | [Unknown] | 1.454s |
| Intel::OpenCL::Utils::AtomicCounter::operator long | libcpu_device_emu.so.2022.13.3.0 | 1.184s |
| std::__malloc_alloc::allocate | libstlport-dynamic.so | 0.697s |
| memcmp | libc-dynamic.so | 0.620s |
| [Others] | N/A | 7.581s |
| Host Task | Task Time | % of Elapsed Time(%) | Task Count |
|---|---|---|---|
| tbb_parallel_for | 4.222s | 25.4% | 42 |
| clCreateContext | 3.194s | 19.3% | 3 |
| clWaitForEvents | 1.155s | 7.0% | 1,690 |
| clEnqueueMemcpyINTEL | 0.457s | 2.8% | 1,448 |
| tbb_custom | 0.316s | 1.9% | 5 |
| [Others] | 0.415s | 2.5% | 1,227 |
| Computing Task | Total Time | Execution Time | % of Total Time(%) | SIMD Width | Peak EU Threads Occupancy(%) | EU Threads Occupancy(%) | SIMD Utilization(%) |
|---|---|---|---|---|---|---|---|
| dppyPy_dppy_py_devfn__5F__5F_main_5F__5F__2E_calCentroidsSum2_24_4_2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_array_28_int32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_array_28_int32_2C__20_1d_2C__20_C_29_ | 0.672s | 0.418s | 62.2% | 8 | 100.0% | 76.9% | 100.0% |
| dppyPy_dppy_py_devfn__5F__5F_main_5F__5F__2E_groupByCluster_24_2_2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_array_28_int32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_int64_2E_int64 | 0.511s | 0.294s | 57.6% | 8 | 100.0% | 90.2% | 100.0% |
| [Outside any task] | 0.193s | 0s | 0.0% | ||||
| dppyPy_dppy_py_devfn__5F__5F_main_5F__5F__2E_updateCentroids_24_5_2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_array_28_int32_2C__20_1d_2C__20_C_29__2E_int64 | 0.112s | 0.001s | 0.9% | 8 | 1.2% | 0.0% | 100.0% |
| dppyPy_dppy_py_devfn__5F__5F_main_5F__5F__2E_calCentroidsSum1_24_3_2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_array_28_int32_2C__20_1d_2C__20_C_29_ | 0.078s | 0.001s | 0.6% | 8 | 1.2% | 0.0% | 100.0% |
| [Others] | 0.010s | 0.000s | 0.2% | N/A | N/A | N/A | N/A |