Intel
®
VTune
™
Profiler 2022.0.0
Elapsed Time:
49.559s
GPU Time:
42.561s
EU Array Stalled/Idle:
3.9% of Elapsed time with GPU busy
GPU L3 Bandwidth Bound:
37.0% of peak value
Hottest GPU Computing Tasks Bound by GPU L3 Bandwidth:
Computing Task
Total Time
Sampler Busy:
0.0% of peak value
Hottest GPU Computing Tasks with High Sampler Usage:
Computing Task
Total Time
FPU Utilization:
89.0% of Elapsed time with GPU busy
High utilization of FPUs is limiting performance. Consider reducing computations.
Hottest GPU Computing Tasks with High FPU Utilization:
Computing Task
Total Time
sgemm_kernel_16_32
33.644s
sgemm_kernel_16_32
8.420s
Collection and Platform Info:
Application Command Line:
/home/u49991/DPCPP_Performance_Portability/lab/mm_dpcpp_mkl "-n" "20480" "-m" "16"
User Name:
u49991
Operating System:
5.4.0-80-generic DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS"
Computer Name:
s001-n157
Result Size:
836.6 MB
Collection start time:
02:33:21 13/01/2022 UTC
Collection stop time:
02:34:11 13/01/2022 UTC
Collector Type:
Event-based sampling driver,User-mode sampling and tracing
CPU:
Name:
Intel(R) microarchitecture code named Coffeelake
Frequency:
3.696 GHz
Logical CPU Count:
12
GPU:
Name:
HD Graphics P630
Vendor:
Intel Corporation
EU Count:
24
Max EU Thread Count:
7
Max Core Frequency:
1.200 GHz
GPU OpenCL Info:
Version:
Max Compute Units:
24
Max Work Group Size:
256
Local Memory:
65.5 KB
SVM Capabilities: