opencl - Fast algorithm for compacting a buffer -
i performing image compression.
the image i broken k code blocks {bi}.
each block has fixed size mxn pixels.
each block independently compressed.
all compressed blocks {ci}, compressed sizes {pi}, stored in linear buffer b, of size k * m, m fixed size greater sizes pi.
now, pack buffer b buffer c, , rid of empty space @ end of each compressed code block ci.
so, need kernel will:
- for each block ci, find sum of pk k < i, (call offset_i)
- copy data each ci, b c, @ offset_i, of size pi
any ideas on how appreciated!!
here code snippet, (i suppose) stream compaction. contains tons of arithmetic, can parallelized desired measure.
#include <time.h> #include <stdio.h> #include <stdlib.h> typedef struct block { int size; int buf[8]; } block; typedef struct blockpos { int t_size; //temporary size compaction int f_size; //actual size int pos; //position } blockpos; int main() { const int num_blocks = 16; block blocks[num_blocks]; blockpos pos[num_blocks]; srand(time(null)); (int = 0; < num_blocks; i++) { //every block has non-zero length, that's easier blocks[i].size = rand() % 7 + 1; printf("block %d len %d:\t", i, blocks[i].size); for(int j=0; j<blocks[i].size; j++){ //just make print easier blocks[i].buf[j] = rand() % 33; printf("%d, ", blocks[i].buf[j]); } printf("\n"); } for(int i=0; i<num_blocks; i++){ pos[i].f_size = blocks[i].size; pos[i].t_size = pos[i].f_size; pos[i].pos = 0; } int step = 2; /* @ every step reduce number of blocks, being processed, 2 times. * loop can't done in parallel. */ (int count = 1; count < num_blocks; count *= 2) { /* odd-numbered blocks compacting nearest left-side neighbour. * loop can done in parallel. */ (int = count; < num_blocks; += step) { int dif = pos[i].pos; pos[i].pos = pos[i - count].pos + pos[i - count].t_size; pos[i - count].t_size += pos[i].t_size; dif -= pos[i].pos; // "replace" compacted blocks (int j = i+1; count > 1 && j < i+count; j++) { pos[j].pos = pos[j-1].pos + pos[j-1].f_size; } } step *= 2; } printf("\npos,\tlen:\n"); for(int i=0; i<num_blocks; i++){ printf("%d,\t%d\n", pos[i].pos, pos[i].f_size); } printf("\n"); return 0; }
inner loop (line 54) may implemented opencl kernel until number of processed elements big enough. after have array of structures, each element show place compacted block. can done in parallel.