Recommendations:
GPU Time, % of Elapsed time: 35.0%GPU utilization is low. Switch to the for in-depth analysis of host activity. Poor GPU utilization can prevent the application from offloading effectively.
GPU utilization is low. Consider offloading more work to the GPU to increase overall application performance.
| Function | Module | CPU Time |
|---|---|---|
| [Outside any known module] | [Unknown] | 2.159s |
| Intel::OpenCL::Utils::AtomicCounter::operator long | libcpu_device_emu.so.2022.13.3.0 | 1.089s |
| [Skipped stack frame(s)] | [Unknown] | 0.963s |
| Intel::OpenCL::CPUDevice::AffinitizeThreads::ExecuteIteration | libcpu_device_emu.so.2022.13.3.0 | 0.631s |
| memcmp | libc-dynamic.so | 0.596s |
| [Others] | N/A | 6.051s |
| Host Task | Task Time | % of Elapsed Time(%) | Task Count |
|---|---|---|---|
| clWaitForEvents | 6.066s | 34.2% | 42 |
| tbb_parallel_for | 4.220s | 23.8% | 38 |
| clCreateContext | 3.136s | 17.7% | 3 |
| tbb_custom | 0.307s | 1.7% | 5 |
| clBuildProgram | 0.094s | 0.5% | 1 |
| [Others] | 0.018s | 0.1% | 74 |
| Computing Task | Total Time | Execution Time | % of Total Time(%) | SIMD Width | Peak EU Threads Occupancy(%) | EU Threads Occupancy(%) | SIMD Utilization(%) |
|---|---|---|---|---|---|---|---|
| dppyPy_dppy_py_devfn_gaussian_5F_weighted_5F_pair_5F_counts_2E_count_5F_weighted_5F_pairs_5F_3d_5F_intel_5F_no_5F_slm_5F_ker_24_1_2E_int64_2E_int64_2E_int64_2E_int64_2E_array_28_float32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_1d_2C__20_C_29_ | 6.014s | 5.998s | 99.7% | 8 | 100.0% | 94.8% | 100.0% |
| [Outside any task] | 0.000s | 0s | 0.0% |