Wednesday, March 6, 2019

OpenCL Notes

If you're interested in OpenCL please please watch Houdini 16.5 Masterclass | OpenCL by Jeff Lait
After that, I'm sure you'll find most of the stuff I'll write in this post redundant. I'm writing this post mostly as a note to myself with code snippets (cause my memory sucks) and I thought you guys might find it useful as well. 

On top of that, I want your opinion. Please correct me if I say or write something wrong, or you know a better way. I'm really looking forward to learn.

I'll keep updating this same post with new code snippets periodically, so stay tuned.


Ok let's start with an example:
I want to move all the points in my geometry up , in the positive y direction , by 1 unit.

This is how you do it in VEX ...
  • drop a Point Wrangle node
  • connect your geometry to input 1
  • write the following code in the parameter VEXpression:

... now in  OpenCL:

  • drop an OpenCL node
  • go in the bindings Tab and set it up as follows. The purpose of this tab is make geometry attributes available to the GPU. Differently from Wrangle nodes, in order to access the attributes you've to bind whatever attribute you want to read / write to a variable. In this case we bound the attribute P to the openCL variable P ( I could have chosen a different name, like for instance 'ciccio').
    It's important to note that the OpenCL node ny default will loop over every element of the first parameter appearing in the Binding Tab (as specified in the 'Options' Tab) . So , in this case, will loop over P (just like a Point Wrangle) !
  • go in the Kernel Tab and check the parameter 'Use Code Snippet'. This will allow you to write code directly on this node, opposed to point the node to a file containing the code.
  • now press the button "Generate Kernel". A window will pop up with some code. Copy the content of this window, close it and paste it in the parameter 'Kernel Code'.
  • now modify the code as follows:

You should see your geometry moving up by 1 unit.

Let's translate the OpenCL code above in English:

kernel void KernelName ( int P_length, global float * P )
Dear Houdini, please define a kernel function that I'll submit to the GPU if I press CTRL+Enter. Along with this function I'll deliver two pieces of data : one is a long list of numbers (float *) that is coming from the input connection (global) and it's called P (between me and you, this is a list of 3 numbers at a time, cause it's a list of vectors, but you don't have to worry about that for now). The other one is just an integer number (int), named P_length and will contain the length of the list P, so we know when we run out of elements.

{, let's start looping over P using idx as index

int idx = get_global_id(0);  
if (idx >= P_length) 
To start, kindly put the index of the data you're currently looping over in the variable idx. I know you'll be increasing idx at every iteration, but please if you run out of elements to loop over (if (idx>=P_length)...) , just stop (return) so we don't get one of those annoying crashes.

float3 pos = vload3(idx, P);
... fetch the current vector P in a variable called pos ...

pos.y += 1;
... now take whatever number was in the component y of the vector pos , and add 1 to it. Then store it back into pos.

vstore3 (pos, idx, P);
... and finally replace the current P with the content of pos.

... let's go to the next element in P and start a new loop iteration.


I want to put the number -100 in the 3rd element of the array attribute A attached to point with @ptnum 14.


In OpenCL:

Why ?
Well, let's pretend we have this array point attribute named A :

in Vex the syntax would be 

and this is how we're used to see this data in the Geometry Spreadsheet:

in OpenCL the syntax would be :
global int * A
and the data structure looks like this :

mem loc 45 : 395  
mem loc 46 : 302
mem loc 47 : 296
mem loc 48 : 396
mem loc 49 : 631
mem loc 50 : 614
mem loc 51 : 397
mem loc 52 : 326
mem loc 53 : 396
mem loc 54 : 634
mem loc 55 : 618
mem loc 56 : 649
mem loc 57 : 326
mem loc 58 : 396

In other words, all the arrays are concatenated in a long array somewhere on some memory location in the GPU. For this reason we need an additional array that contains the index of the first element of each array in order to re-build the original data structure.

global int * A_index
idx 13 : mem loc 45
idx 14 : mem loc 51
idx 15 : mem loc 56

so, for instance, if we want to access the 3rd element of the array A[14] (which is number 396), first off we need to find the memory location of that array:

int mem_loc = A_index[14]

and then count 3 elements from that memory location, like so :

int element = A[mem_loc+3]


Let's say we have a vector grid called "vel".
First off, make sure this is a Volume Grid (not a VDB). (If you find a way to pass VDBs into OpenCL please I'd love to know how).
In other words we're going to need 3 float grids (vel.x, vel.y, vel.z).
In the example below, I chose to add the Voxel Resolution, because I wanted to know weather or not I was outside of the grid boundaries, but it's not strictly necessary.

When you import a volume grid , for instance in this case it's called 'vel.x', this will result in this code in the Kernel:

int velx_stride_x
int velx_stride_y
int velx_stride_z
int velx_stride_offset
global float * velx ,

If you select "Voxel Resolution" for the grid 'vel.x', this will result in this code in the Kernel:

int velx_res_x
int velx_res_y
int velx_res_z

If you select "Volume Transform to Voxel" for the grid 'vel.x', this will result in this code in the Kernel:

float16 velx_xformtovoxel
(note this is a 4x4 matrix)

Access voxels from a world position
If you want to find the 3 voxel indices starting from a world position P, here's how (this was kindly described in the Masterclass mentioned above).

// first off we need to fetch pos from the P attribute ....
float3 pos = vload3(idx,P);

// now this suuuuper weird formula to find the volume indice
float4 voxelpos = pos.x * velx_xformtovoxel.lo.lo + pos.y * velx_xformtovoxel.lo.hi + pos.z * velx_xformtovoxel.hi.lo + 1 * velx_xformtovoxel.hi.hi;

// and of course the indices must be integers so ...
// NOTE : if you are not 100% sure P is within the grid boundaries you better modify this code to make sure you're not trying to fetch a non existing voxel (Houdini might very easily crash if you do). In this case let's pretend I'm sure P is within the volume boundaries.
int x = (int)(floor(voxelpos.x));
int y = (int)(floor(voxelpos.y));
int z = (int)(floor(voxelpos.z));

// and finally we can access the content of the voxel with indices (x,y,z)
float vx = velx[ velx_stride_offset + velx_stride_x * x + velx_stride_y * y + velz_stride_z * z ];

No comments :

Post a Comment