opencl - Fast algorithm for compacting a buffer -


i performing image compression.

the image i broken k code blocks {bi}.

each block has fixed size mxn pixels.

each block independently compressed.

all compressed blocks {ci}, compressed sizes {pi}, stored in linear buffer b, of size k * m, m fixed size greater sizes pi.

now, pack buffer b buffer c, , rid of empty space @ end of each compressed code block ci.

so, need kernel will:

  • for each block ci, find sum of pk k < i, (call offset_i)
  • copy data each ci, b c, @ offset_i, of size pi

any ideas on how appreciated!!

here code snippet, (i suppose) stream compaction. contains tons of arithmetic, can parallelized desired measure.

#include <time.h> #include <stdio.h> #include <stdlib.h>  typedef struct block {     int size;     int buf[8]; } block;  typedef struct blockpos {     int t_size; //temporary size compaction     int f_size; //actual size     int pos;    //position } blockpos;  int main() {     const int num_blocks = 16;     block blocks[num_blocks];     blockpos pos[num_blocks];      srand(time(null));     (int = 0; < num_blocks; i++) {         //every block has non-zero length, that's easier         blocks[i].size = rand() % 7 + 1;          printf("block %d len %d:\t", i, blocks[i].size);         for(int j=0; j<blocks[i].size; j++){             //just make print easier             blocks[i].buf[j] = rand() % 33;             printf("%d, ", blocks[i].buf[j]);         }         printf("\n");     }      for(int i=0; i<num_blocks; i++){         pos[i].f_size = blocks[i].size;         pos[i].t_size = pos[i].f_size;         pos[i].pos = 0;     }      int step = 2;     /* @ every step reduce number of blocks, being processed, 2 times.      * loop can't done in parallel. */     (int count = 1; count < num_blocks; count *= 2) {          /* odd-numbered blocks compacting nearest left-side neighbour.          * loop can done in parallel. */         (int = count; < num_blocks; += step) {             int dif = pos[i].pos;             pos[i].pos = pos[i - count].pos + pos[i - count].t_size;             pos[i - count].t_size += pos[i].t_size;             dif -= pos[i].pos;              // "replace" compacted blocks             (int j = i+1; count > 1 && j < i+count; j++) {                 pos[j].pos = pos[j-1].pos + pos[j-1].f_size;             }         }         step *= 2;     }      printf("\npos,\tlen:\n");     for(int i=0; i<num_blocks; i++){         printf("%d,\t%d\n", pos[i].pos, pos[i].f_size);     }      printf("\n");     return 0; } 

inner loop (line 54) may implemented opencl kernel until number of processed elements big enough. after have array of structures, each element show place compacted block. can done in parallel.


Popular posts from this blog