You code efficiency on this platform is too low. Possible cause: memory stalls, instruction starvation, branch misprediction or long latency instructions. Next steps: Run Microarchitecture Exploration analysis to identify the cause of the low microarchitecture usage efficiency.
The CPI may be too high. This could be caused by issues such as memory stalls, instruction starvation, branch misprediction or long latency instructions. Explore the other hardware-related metrics to identify what is causing high CPI.
| Function | Module | CPU Time | % of CPU Time(%) |
|---|---|---|---|
| [Outside any known module] | [Unknown] | 0.922s | 50.5% |
| Intel::OpenCL::Utils::AtomicCounter::operator long | libcpu_device.so.2021.13.11.0 | 0.216s | 11.8% |
| Intel::OpenCL::Utils::OclNonReentrantSpinMutex::Lock | libcpu_device.so.2021.13.11.0 | 0.145s | 8.0% |
| Intel::OpenCL::CPUDevice::AffinitizeThreads::ExecuteIteration | libcpu_device.so.2021.13.11.0 | 0.085s | 4.7% |
| hw_pause | libcpu_device.so.2021.13.11.0 | 0.055s | 3.0% |
| [Others] | N/A | 0.401s | 22.0% |
| Task Type | Task Time | Task Count | Average Task Time |
|---|---|---|---|
| tbb_parallel_for | 0.356s | 12 | 0.030s |
| tbb_custom | 0.299s | 10 | 0.030s |
The metric value is low, which may signal a poor physical CPU cores utilization caused by:
- load imbalance
- threading runtime overhead
- contended synchronization
- thread/process underutilization
- incorrect affinity that utilizes logical cores instead of physical cores
Explore sub-metrics to estimate the efficiency of MPI and OpenMP parallelism or run the Locks and Waits analysis to identify parallel bottlenecks for other parallel runtimes.The metric value is low, which may signal a poor logical CPU cores utilization. Consider improving physical core utilization as the first step and then look at opportunities to utilize logical cores, which in some cases can improve processor throughput and overall performance of multi-threaded applications.