The dimension of input matrix should be multiple size of block size. The block
size can be obtained from lud_kernel.cu

******Adjustable work group size*****
The kernel has square shape 
RD_WG_SIZE_0 or RD_WG_SIZE_0_0 describe one dimension 
The actually dimension = RD_WG_SIZE_0 * RD_WG_SIZE_0

USAGE:
make clean
make KERNEL_DIM="-DRD_WG_SIZE_0=16"
