Recommendations:
EU Array Stalled/Idle: 71.0% of Elapsed time with GPU busyGPU metrics detect some kernel issues. Use GPU Compute/Media Hotspots (preview) to understand how well your application runs on the specified hardware.Execution % of Total Time: 46.7%
Execution time on the device is less than memory transfer time. Make sure your offload schema is optimal. Use Intel Advisor tool to get an insight into possible causes for inefficient offload.
| Function | Module | CPU Time |
|---|---|---|
| [Outside any known module] | [Unknown] | 4.567s |
| [Skipped stack frame(s)] | [Unknown] | 1.496s |
| Intel::OpenCL::Utils::AtomicCounter::operator long | libcpu_device_emu.so.2022.13.3.0 | 1.264s |
| std::__malloc_alloc::allocate | libstlport-dynamic.so | 0.648s |
| _PyEval_EvalFrameDefault | python3.9 | 0.643s |
| [Others] | N/A | 9.999s |
| Host Task | Task Time | % of Elapsed Time(%) | Task Count |
|---|---|---|---|
| clWaitForEvents | 59.706s | 46.6% | 85,345 |
| clEnqueueMemcpyINTEL | 23.575s | 18.4% | 73,124 |
| clEnqueueNDRangeKernel | 4.470s | 3.5% | 12,221 |
| tbb_parallel_for | 3.832s | 3.0% | 42 |
| clCreateContext | 3.221s | 2.5% | 3 |
| [Others] | 3.243s | 2.5% | 48,905 |
| Computing Task | Total Time | Execution Time | % of Total Time(%) | SIMD Width | Peak EU Threads Occupancy(%) | EU Threads Occupancy(%) | SIMD Utilization(%) |
|---|---|---|---|---|---|---|---|
| dppyPy_dppy_py_devfn__5F__5F_main_5F__5F__2E_calCentroidsSum2_24_4_2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_array_28_int32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_array_28_int32_2C__20_1d_2C__20_C_29_ | 35.690s | 22.697s | 63.6% | 8 | 100.0% | 72.1% | 100.0% |
| dppyPy_dppy_py_devfn__5F__5F_main_5F__5F__2E_groupByCluster_24_2_2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_array_28_int32_2C__20_1d_2C__20_C_29__2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_int64_2E_int64 | 26.339s | 15.073s | 57.2% | 8 | 100.0% | 90.2% | 100.0% |
| [Outside any task] | 9.684s | 0s | 0.0% | ||||
| dppyPy_dppy_py_devfn__5F__5F_main_5F__5F__2E_updateCentroids_24_5_2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_array_28_int32_2C__20_1d_2C__20_C_29__2E_int64 | 5.421s | 0.052s | 1.0% | 8 | 1.2% | 0.1% | 100.0% |
| dppyPy_dppy_py_devfn__5F__5F_main_5F__5F__2E_calCentroidsSum1_24_3_2E_array_28_float32_2C__20_2d_2C__20_C_29__2E_array_28_int32_2C__20_1d_2C__20_C_29_ | 3.746s | 0.025s | 0.7% | 8 | 1.2% | 0.0% | 100.0% |
| [Others] | 0.217s | 0.001s | 0.5% | N/A | N/A | N/A | N/A |