Setup DimGrid and DimBlock in CUDA -

i doing matrix multiplication in cuda. following setup works:

int tile = 8; dim3 dimgrid((numccolumns - 1)/tile + 1, (numcrows - 1)/tile + 1, 1); dim3 dimblock(tile, tile, 1);

but if use 1 block whole image, returns zero. reason that? assume 1 block can contain whole image ( input 64x64).

dim3 dimgrid(1,1,1); dim3 dimblock(numccolumns, numcrows, 1);

this how call kernel in main function:

matrixmultiply<<<dimgrid, dimblock>>>(devicea, deviceb, devicec,                                         numarows, numacolumns,                                         numbrows, numbcolumns,                                         numcrows, numccolumns);

and kernel:

__global__ void matrixmultiply(float * a, float * b, float * c,                    int numarows, int numacolumns,                    int numbrows, int numbcolumns,                    int numcrows, int numccolumns) {     //@@ insert code implement matrix multiplication here     int row = blockidx.y * blockdim.y + threadidx.y;     int col = blockidx.x * blockdim.x + threadidx.x;      if ((row < numcrows) && (col < numccolumns))     {         float value = 0.0;         (int = 0; < numacolumns; i++)             value += a[row * numacolumns + i] * b[i*numbcolumns + col];         c[row * numccolumns + col] = value;     } }

but if use 1 block whole image, returns zero. reason that?

a cuda threadblock limited maximum of 1024 threads (refer "maximum number of threads per block "). multidimensional threadblock, means product of dimensions must less or equal 1024 (for cc2.x , newer gpus.)

for 64x64 image, not work:

dim3 dimblock(numccolumns, numcrows, 1);

since numccolumns * numcrows greater 1024.

if proper cuda error checking in code, you'll indication of (that kernel launch failing due invalid kernel configuration parameter).

Search This Blog

hj

Setup DimGrid and DimBlock in CUDA -

Popular posts from this blog

title2

debugging - Reference - What does this error mean in PHP? -

c++ - Why doesn't unordered_set provide an array access operator -