First Profiling Run:

(base) C:\Users\Spencer\PycharmProjects\GPUAccelerating>nvprof python Spirals.py
[N]ew or [I]mport: N
Image Configuration:  2021-02-26-12-01-05
Center:  (1319, 1383)
Symmetry:  5
Curve:  59.35
Curve multiplier:  0
Max radius:  1550
==13248== NVPROF is profiling process 13248, command: python Spirals.py
==13248== Profiling application: python Spirals.py
==13248== Warning: 4 API trace records have same start and end timestamps.
This can happen because of short execution duration of CUDA APIs and low timer resolution on the underlying operating system.
==13248== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   46.23%  10.222ms         1  10.222ms  10.222ms  10.222ms  cudapy::__main__::create_image_cuda$241(Array<unsigned char, int=3, C, mutable, aligned>, __int64, __int64, __int64, double, __int64, __int64)
                   36.09%  7.9806ms         1  7.9806ms  7.9806ms  7.9806ms  [CUDA memcpy HtoD]
                   17.68%  3.9098ms         1  3.9098ms  3.9098ms  3.9098ms  [CUDA memcpy DtoH]
      API calls:   87.07%  180.09ms         1  180.09ms  180.09ms  180.09ms  cuDevicePrimaryCtxRetain
                    7.00%  14.473ms         1  14.473ms  14.473ms  14.473ms  cuMemcpyDtoH
                    3.83%  7.9267ms         1  7.9267ms  7.9267ms  7.9267ms  cuMemcpyHtoD
                    0.67%  1.3954ms         1  1.3954ms  1.3954ms  1.3954ms  cuModuleGetFunction
                    0.55%  1.1288ms         1  1.1288ms  1.1288ms  1.1288ms  cuMemAlloc
                    0.47%  976.40us         1  976.40us  976.40us  976.40us  cuModuleLoadDataEx
                    0.19%  391.50us         1  391.50us  391.50us  391.50us  cuLinkAddData
                    0.10%  212.10us         1  212.10us  212.10us  212.10us  cuLaunchKernel
                    0.05%  107.10us         1  107.10us  107.10us  107.10us  cuLinkComplete
                    0.02%  46.800us         1  46.800us  46.800us  46.800us  cuLinkCreate
                    0.01%  18.700us         1  18.700us  18.700us  18.700us  cuMemGetInfo
                    0.01%  17.600us         1  17.600us  17.600us  17.600us  cuDeviceTotalMem
                    0.01%  15.400us       101     152ns       0ns  1.0000us  cuDeviceGetAttribute
                    0.00%  7.9000us         1  7.9000us  7.9000us  7.9000us  cuDeviceGetPCIBusId
                    0.00%  5.0000us         7     714ns     300ns  1.8000us  cuCtxGetCurrent
                    0.00%  3.4000us         1  3.4000us  3.4000us  3.4000us  cuInit
                    0.00%  2.0000us         3     666ns     200ns  1.5000us  cuDeviceGet
                    0.00%  1.7000us         1  1.7000us  1.7000us  1.7000us  cuLinkDestroy
                    0.00%  1.5000us         5     300ns     100ns     500ns  cuFuncGetAttribute
                    0.00%  1.4000us         6     233ns     100ns     400ns  cuCtxGetDevice
                    0.00%  1.3000us         2     650ns     600ns     700ns  cuDeviceGetName
                    0.00%  1.2000us         1  1.2000us  1.2000us  1.2000us  cuDeviceComputeCapability
                    0.00%  1.1000us         1  1.1000us  1.1000us  1.1000us  cudaRuntimeGetVersion
                    0.00%  1.0000us         4     250ns     100ns     400ns  cuDeviceGetCount
                    0.00%  1.0000us         1  1.0000us  1.0000us  1.0000us  cuCtxPushCurrent
                    0.00%     700ns         1     700ns     700ns     700ns  cuDriverGetVersion
                    0.00%     300ns         1     300ns     300ns     300ns  cuDeviceGetLuid
                    0.00%     200ns         1     200ns     200ns     200ns  cuDeviceGetUuid


Initialized pixel array on device (no copy to device necessary):

(base) C:\Users\Spencer\PycharmProjects\GPUAccelerating>nvprof python Spirals.py
==21116== NVPROF is profiling process 21116, command: python Spirals.py
[N]ew or [I]mport: N
Center:  (1918, 879)
Symmetry:  7
Curve:  56.86
Curve multiplier:  0
Max radius:  1722
Save file as: image.png
==21116== Profiling application: python Spirals.py
==21116== Warning: 2 API trace records have same start and end timestamps.
This can happen because of short execution duration of CUDA APIs and low timer resolution on the underlying operating system.
==21116== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   88.50%  14.122ms         1  14.122ms  14.122ms  14.122ms  cudapy::__main__::create_image_cuda$241(Array<unsigned char, int=2, C, mutable, aligned>, __int64, __int64, __int64, double, __int64, __int64)
                   11.50%  1.8343ms         1  1.8343ms  1.8343ms  1.8343ms  [CUDA memcpy DtoH]
      API calls:   88.43%  142.15ms         1  142.15ms  142.15ms  142.15ms  cuDevicePrimaryCtxRetain
                   10.21%  16.407ms         1  16.407ms  16.407ms  16.407ms  cuMemcpyDtoH
                    0.59%  947.20us         1  947.20us  947.20us  947.20us  cuModuleLoadDataEx
                    0.34%  549.70us         1  549.70us  549.70us  549.70us  cuMemAlloc
                    0.26%  424.40us         1  424.40us  424.40us  424.40us  cuLinkAddData
                    0.07%  113.10us         1  113.10us  113.10us  113.10us  cuLinkComplete
                    0.03%  50.100us         1  50.100us  50.100us  50.100us  cuLinkCreate
                    0.02%  35.600us         1  35.600us  35.600us  35.600us  cuLaunchKernel
                    0.01%  19.100us         1  19.100us  19.100us  19.100us  cuMemGetInfo
                    0.01%  14.700us         1  14.700us  14.700us  14.700us  cuDeviceTotalMem
                    0.01%  14.000us       101     138ns       0ns     900ns  cuDeviceGetAttribute
                    0.00%  6.4000us         1  6.4000us  6.4000us  6.4000us  cuDeviceGetPCIBusId
                    0.00%  5.4000us         7     771ns     100ns  1.6000us  cuCtxGetCurrent
                    0.00%  4.0000us         1  4.0000us  4.0000us  4.0000us  cuInit
                    0.00%  3.5000us         1  3.5000us  3.5000us  3.5000us  cuModuleGetFunction
                    0.00%  1.7000us         2     850ns     700ns  1.0000us  cuDeviceGetName
                    0.00%  1.6000us         1  1.6000us  1.6000us  1.6000us  cuLinkDestroy
                    0.00%  1.5000us         3     500ns     100ns  1.2000us  cuDeviceGet
                    0.00%  1.5000us         5     300ns     200ns     500ns  cuCtxGetDevice
                    0.00%  1.5000us         5     300ns     100ns     700ns  cuFuncGetAttribute
                    0.00%  1.3000us         1  1.3000us  1.3000us  1.3000us  cuDeviceComputeCapability
                    0.00%  1.1000us         1  1.1000us  1.1000us  1.1000us  cudaRuntimeGetVersion
                    0.00%  1.0000us         4     250ns     200ns     300ns  cuDeviceGetCount
                    0.00%  1.0000us         1  1.0000us  1.0000us  1.0000us  cuCtxPushCurrent
                    0.00%     700ns         1     700ns     700ns     700ns  cuDriverGetVersion
                    0.00%     400ns         1     400ns     400ns     400ns  cuDeviceGetLuid
                    0.00%     200ns         1     200ns     200ns     200ns  cuDeviceGetUuid


Using shared memory for pixel color computations:

(base) C:\Users\Spencer\PycharmProjects\GPUAccelerating>nvprof python Spirals.py
==16624== NVPROF is profiling process 16624, command: python Spirals.py
[N]ew or [I]mport: N
Center:  (1297, 1438)
Symmetry:  8
Curve:  -27.0
Curve multiplier:  1
Max radius:  1983
Save file as: test.png
==16624== Profiling application: python Spirals.py
==16624== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   85.44%  12.767ms         1  12.767ms  12.767ms  12.767ms  cudapy::__main__::create_image_cuda$241(Array<unsigned char, int=2, C, mutable, aligned>, __int64, __int64, __int64, double, __int64, __int64)
                   14.56%  2.1755ms         1  2.1755ms  2.1755ms  2.1755ms  [CUDA memcpy DtoH]
      API calls:   83.40%  131.11ms         1  131.11ms  131.11ms  131.11ms  cuDevicePrimaryCtxRetain
                    9.85%  15.480ms         1  15.480ms  15.480ms  15.480ms  cuMemcpyDtoH
                    5.56%  8.7485ms         1  8.7485ms  8.7485ms  8.7485ms  cuLinkAddData
                    0.65%  1.0236ms         1  1.0236ms  1.0236ms  1.0236ms  cuModuleLoadDataEx
                    0.35%  555.90us         1  555.90us  555.90us  555.90us  cuMemAlloc
                    0.07%  105.80us         1  105.80us  105.80us  105.80us  cuLinkComplete
                    0.04%  66.900us         1  66.900us  66.900us  66.900us  cuLaunchKernel
                    0.03%  43.800us         1  43.800us  43.800us  43.800us  cuLinkCreate
                    0.01%  15.600us         1  15.600us  15.600us  15.600us  cuDeviceTotalMem
                    0.01%  15.100us         1  15.100us  15.100us  15.100us  cuMemGetInfo
                    0.01%  14.700us       101     145ns     100ns     800ns  cuDeviceGetAttribute
                    0.00%  6.0000us         7     857ns     200ns  1.8000us  cuCtxGetCurrent
                    0.00%  5.9000us         1  5.9000us  5.9000us  5.9000us  cuDeviceGetPCIBusId
                    0.00%  3.1000us         1  3.1000us  3.1000us  3.1000us  cuModuleGetFunction
                    0.00%  2.7000us         1  2.7000us  2.7000us  2.7000us  cuInit
                    0.00%  1.8000us         1  1.8000us  1.8000us  1.8000us  cuLinkDestroy
                    0.00%  1.5000us         5     300ns     200ns     500ns  cuCtxGetDevice
                    0.00%  1.4000us         4     350ns     200ns     500ns  cuDeviceGetCount
                    0.00%  1.4000us         2     700ns     600ns     800ns  cuDeviceGetName
                    0.00%  1.2000us         3     400ns     100ns     900ns  cuDeviceGet
                    0.00%  1.2000us         1  1.2000us  1.2000us  1.2000us  cuDeviceComputeCapability
                    0.00%  1.2000us         5     240ns     100ns     500ns  cuFuncGetAttribute
                    0.00%  1.1000us         1  1.1000us  1.1000us  1.1000us  cudaRuntimeGetVersion
                    0.00%     900ns         1     900ns     900ns     900ns  cuCtxPushCurrent
                    0.00%     500ns         1     500ns     500ns     500ns  cuDriverGetVersion
                    0.00%     300ns         1     300ns     300ns     300ns  cuDeviceGetLuid
                    0.00%     100ns         1     100ns     100ns     100ns  cuDeviceGetUuid