| Function | Module | CPU Time | % of CPU Time(%) |
|---|---|---|---|
| [MKL BLAS]@avx512_sgemm_kernel_0 | libmkl_avx512.so.2 | 11.231s | 80.1% |
| [Outside any known module] | [Unknown] | 1.869s | 13.3% |
| main | mm_dpcpp_mkl | 0.155s | 1.1% |
| Intel::OpenCL::Utils::OclNonReentrantSpinMutex::Lock | libcpu_device.so.2021.13.11.0 | 0.130s | 0.9% |
| [MKL BLAS]@avx512_sgemm_scopy_right8_ea | libmkl_avx512.so.2 | 0.115s | 0.8% |
| [Others] | N/A | 0.516s | 3.7% |
| Task Type | Task Time | Task Count | Average Task Time |
|---|---|---|---|
| tbb_custom | 24.961s | 10 | 2.496s |
| tbb_parallel_for | 0.155s | 12 | 0.013s |
The metric value is low, which may signal a poor physical CPU cores utilization caused by:
- load imbalance
- threading runtime overhead
- contended synchronization
- thread/process underutilization
- incorrect affinity that utilizes logical cores instead of physical cores
Explore sub-metrics to estimate the efficiency of MPI and OpenMP parallelism or run the Locks and Waits analysis to identify parallel bottlenecks for other parallel runtimes.The metric value is low, which may signal a poor logical CPU cores utilization. Consider improving physical core utilization as the first step and then look at opportunities to utilize logical cores, which in some cases can improve processor throughput and overall performance of multi-threaded applications.