Recommendations:
GPU Time, % of Elapsed time: 4.3%GPU utilization is low. Switch to the for in-depth analysis of host activity. Poor GPU utilization can prevent the application from offloading effectively.EU Array Stalled/Idle: 26.7% of Elapsed time with GPU busy
GPU metrics detect some kernel issues. Use GPU Compute/Media Hotspots (preview) to understand how well your application runs on the specified hardware.
GPU utilization is low. Consider offloading more work to the GPU to increase overall application performance.
| Function | Module | CPU Time |
|---|---|---|
| [Outside any known module] | [Unknown] | 7.589s |
| [Skipped stack frame(s)] | [Unknown] | 1.504s |
| std::__malloc_alloc::allocate | libstlport-dynamic.so | 0.679s |
| memcmp | libc-dynamic.so | 0.564s |
| memmove | libc-dynamic.so | 0.450s |
| [Others] | N/A | 5.235s |
| Host Task | Task Time | % of Elapsed Time(%) | Task Count |
|---|---|---|---|
| clCreateContext | 5.330s | 23.2% | 2 |
| tbb_parallel_for | 2.355s | 10.3% | 35 |
| clWaitForEvents | 0.955s | 4.2% | 12 |
| tbb_custom | 0.436s | 1.9% | 3 |
| clBuildProgram | 0.105s | 0.5% | 1 |
| [Others] | 0.016s | 0.1% | 27 |
| Computing Task | Total Time | Execution Time | % of Total Time(%) | SIMD Width | Peak EU Threads Occupancy(%) | EU Threads Occupancy(%) | SIMD Utilization(%) |
|---|---|---|---|---|---|---|---|
| dppyPy_dppy_py_devfn__5F__5F_main_5F__5F__2E_run_5F_knn_5F_kernel_24_1_2E_array_28_float64_2C__20_2d_2C__20_C_29__2E_array_28_int64_2C__20_1d_2C__20_C_29__2E_array_28_float64_2C__20_2d_2C__20_C_29__2E_int64_2E_int64_2E_int64_2E_array_28_float64_2C__20_1d_2C__20_C_29__2E_array_28_float64_2C__20_2d_2C__20_C_29__2E_int64 | 0.966s | 0.949s | 98.2% | 8 | 100.0% | 93.4% | 100.0% |
| [Outside any task] | 0.000s | 0s | 0.0% |