You code efficiency on this platform is too low. Possible cause: memory stalls, instruction starvation, branch misprediction or long latency instructions. Next steps: Run Microarchitecture Exploration analysis to identify the cause of the low microarchitecture usage efficiency.
The CPI may be too high. This could be caused by issues such as memory stalls, instruction starvation, branch misprediction or long latency instructions. Explore the other hardware-related metrics to identify what is causing high CPI.
| Function | Module | CPU Time | % of CPU Time(%) |
|---|---|---|---|
| mm_kernel(cl::sycl::queue&, std::vector<float, std::allocator<float>>&, std::vector<float, std::allocator<float>>&, std::vector<float, std::allocator<float>>&, unsigned long, unsigned long)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda(cl::sycl::nd_item<(int)2>)#1} | 20b3523f30a8bcbf | 46534.222s | 98.9% |
| [Outside any known module] | [Unknown] | 507.644s | 1.1% |
| Intel::OpenCL::Utils::AtomicCounter::operator long | libcpu_device.so.2021.13.11.0 | 2.220s | 0.0% |
| Intel::OpenCL::CPUDevice::AffinitizeThreads::ExecuteIteration | libcpu_device.so.2021.13.11.0 | 1.047s | 0.0% |
| __intel_avx_rep_memset | mm_dpcpp_ndrange_var | 0.752s | 0.0% |
| [Others] | N/A | 2.636s | 0.0% |
| Task Type | Task Time | Task Count | Average Task Time |
|---|---|---|---|
| tbb_parallel_for | 47778.395s | 130 | 367.526s |
| tbb_custom | 746.885s | 11 | 67.899s |