Tutorial: Compute Shaders & Procedural Meshes In Unity URP (Part 1)

Introduction

Hello! This is the first part of a multi-part tutorial series walking through the process of procedurally generating a mesh along a bezier curve. This is a super awesome and powerful design tool with a multitude of different uses. In particular, games with user building and design mechanics, like Planet Coaster, Astroneer, Satisfactory, Cities Skylines, and etc all use some form of bezier based mesh generation for everything from roller coasters, cables, and roads, to pipes and conveyor belts, but there's multitudes of other applications across gaming.

Our goal here is to create a single custom renderer component and accompanying shader that can be used in the most use cases possible, and along the way, learn some really powerful techniques in Unity. This tutorial is best for those with intermediate and above experience with Unity and a working knowledge of C#, HLSL, and vector/matrix math.

In this tutorial, we're going to start by quickly going over the equation we'll be using for constructing a Bezier curve, then we're going to use Unity Gizmos gizmos to help us see our curve and the coordinate system around it. Finally, we'll cover the basics of building and running a compute shader to actually do the work for us on the GPU.

The Theory

^{hey, if the next area already looks scary to you, don't worry, you can skip directly to the next part; I won't tell if you wont...}

Obviously, the first step in building our Bezier Curve Mesh Renderer is, well, making a bezier curve! If you'd like a comprehensive primer on Bezier curves, I cannot recommend enough Freya Holmer's The Beauty of Bezier Curves video. I'm not going to be deriving the equation for the curve, but this video is a more elegant exploration of every concept we're going to be using in this tutorial than I could come up with anyways.

On the most basic level, bezier curves can be used to define a path in 3D space. They can be chained together in order to create a longer and more complicated curve that can cross itself and even do loop de loops, and most importantly, the math driving them is also suited for high computationally efficiency. While there's a couple kinds of bezier curves out there, a cubic bezier curve is most suited to our needs. Cubic bezier curves are defined using 4 control points; two points for each end point of the curve.

This is one of may different equations used to calculate a bezier curve, where P₀ - P₃ are the control points and t is a value between 0 and 1 denoting where we are on the curve.

Okay cool... let's all agree that rote equation is ugly though. How do we get from that to our use case? What's going to really help us is to break up this equation into a more elegant representation of it's moving parts. Remember, our goal here is to identify the mathematical elements that we'll use to develop a compute shader. Let's think about what are variables are for a minute:

The value t is continuous here, but we're generating a mesh; there's only so many points we're going to actually get on this curve, and any given "vertex" on our generated mesh is only going to be associated with one t-value. That means the t values only need to be calculated when we change the granularity and in-turn, change the vertex count.
The P_n variables are control points. We could directly pipe those in from the editor, so they only need to be recalculated when we change the points.

So let's take this equation and turn it into a matrix operation. Why use a matrix? In HLSL and any other standard language you write your shaders in, you intrinsically have access data types and operations for matrices just like C# intrinsically has data types for numbers and strings. This is because GPU hardware specializes performing exactly that kind of math efficiently.

Even if you don't really know what this equation means, it still just looks a lot nicer from an aesthetic perspective, don't you think? But more importantly, this is exactly what we need! The first figure is a vector representing our t values. We'll need to calculate one t vector for each "segment" of our mesh (more on that in a bit). The last vector with our P_n control points might look like another vector, but remember, the P variables are actually points, so what you're actually looking at is another matrix where each point is really a row of 3 values, the (x,y,z) values of each point. If you multiply this all out, what you get is a single Vector3 representing our position on the curve at t.

Unity.Mathematics

Before we get started, we're going to import the Unity.Mathematics from the Unity package manager to help us out.

But wait, doesn't Unity already have a Math Library?

Two, actually, three if you count Unity's Vector classes. Those classes are perfectly fine for most uses. However, as the documentation explains:

The main goal of this library is to provide a friendly Math API familiar to SIMD and graphic/shaders developers, using the well known float4, float3 types...etc. with all intrinsic functions provided by a static class math that can be imported easily into your C# program with using static Unity.Mathematics.math...

...After years of feedback and experience with the previous API, we believe that providing an API that is closer to the way graphics developers have been using math libraries should better help its adoption and the ease of its usage. HLSL / GLSL math library is a very well designed, well understood math library leading to greater consistency.

In other words, while Unity's built in math classes introduces structs such as Vector3, Vector4, and math functions, shader languages use types like float3, float4, and matrices, and has nuanced differences in their intrinsic math functions. This package helps us by allowing us to use that naming scheme and directly reference shader-style intrinsic functions and types directly in C# code, which will make it a lot easier for us to prototype our code in C# before we put it into a shader, which are more difficult to debug.

Bezier Mesh Renderer

Now, we can finally start building our script. Create a script where you want it, and name it whatever you like. I'm going to name my script BezierMeshRenderer, and don't forget to slap a "using static Unity.Mathematics.math;" at the top of the file so we can use our shader-style functions intrinsically. Let's start making our properties:

Pretty straightforward. We have 3 fields that represent the 3 parts of the equation:

BEZIER_MATRIX is the center matrix of our curve equation. Since these values are constant, we can go ahead and define this as static readonly.
_tValueSegmentArray is an array of t value vectors for the first part of an equation. Why an array? Remember, we're generating a mesh in segments along this curve, so each vector in this array is going to represent a single value of t on our curve.
_controlPoints is the matrix on the right side of our equation. Notice we define it as a 4x3 matrix, the 3 columns here is because we're using points in 3D space, but if you have an eye for Linear Algebra, you might have noticed that any other number of columns would work here as well, and you'd be correct! You can use a 4x2 matrix if you wanted to work in 2D, and even a 4x4 matrix if you're a lunatic and wanted to work in 4D. For our needs, 3D is what we need here.

Additionally, we need to expose a couple more fields to the inspector. The variable curveSegmentCount will be how many segments we want to define our mesh with. A value of "1" would essentially be a straight line between the first and fourth control point, while any other value would evenly subdivide the t values along the curve. This value will also be used to define the length of our _tValueSegmentArray.

There's a couple different ways we could acquire our control points. The most straightforward and obvious way would be just to expose 4 points and just slide them into our _controlPoints matrix.

The way I choose to do it is to use two transform fields that we will use to acquire the first and fourth row of the points, and use each transform's forward direction and a length to derive the second and third control points, which we'll do in a minute, but first, let's create a new object in the scene and add our component, so we can see our results when we're ready.

For testing purposes, the transforms I chose here are just two empty transforms called "A" and "B" I setup underneath our bezier mesh renderer, but that doesn't have to be the case. I also decided to start with 8 segments and a spline length of 1 for each end of the curve.

If you take a peek at your scene view, you'll see... well, nothing. So let's change that.

Unity Gizmos

You've no doubt noticed that in Unity, the scene view can render special lines, geometry, and icons to help you visualize certain properties of the selected object. For example, selected cameras will show the camera's view frustum and clip planes, and selected colliders will use wireframes to show their enclosed volume. These are called Gizmos.

This gizmo uses a sun **icon** to show that it's a **directional light**, and it's **color** represents the color of the light source. When selected, **line rays** are rendered to show what **direction the light is shining**

You can actually make your own Gizmos using two methods available on all classes that inherit from MonoBehaviour: OnDrawGizmos() and OnDrawGizmosSelected(). As you might have guessed, OnDrawGizmos() renders on anything you can see in the scene view, and OnDrawGizmosSelected() only renders on objects when they're selected or are children of the selected object.

OnDrawGizmos() is particularly useful for objects that are not normally "clickable" in the scene view. Light sources and cameras, for example, don't actually render anything themselves in-game like MeshRenderers do, but you can still select them in the scene by clicking on their icons. OnDrawGizmosSelected() should be used to give more detailed information about the object when selected. After all, you don't want to overcrowd your scene view with a bunch of superfluous wireframes.

The Gizmos static class is used to actually draw things using these methods. You can draw textures, icons, lines, shapes, and more using this class. Check out the docs to see a more exhaustive list of methods.

OnDrawGizmos()

In our OnDrawGizmos(), we use a conditional to check if we actually have the two endpoints assigned. If we do, we draw an icon on each point to show where our curve will go. Else, we draw a different icon to show that they need to be assigned. Gizmos.DrawIcon() also takes a string for the filename of the icon you want to draw. The method will look in the "Gizmos" folder in your assets directory for that file. which you will need to create yourself if it doesn't already exist.

Let's take a look in the scene to see our work in action.

And if we unassign on of our end points:

OnDrawGizmosSelected()

Now lets move onto our OnDrawGizmosSelected():

We start by acquiring all 4 of our control points using the methodology we discussed and drawing simple icons for all of them. You'll notice the code uses an extra argument here. This is for coloring the icon when it is rendered. By setting this tint, you can mix color into the icon to further customize it's use. The actual icon file is a white square, but we're going to color it black to represent the actual control points

Next, lets connect all 4 points with lines, and then stop and see how it looks in the editor:

With these Gizmos, we can get a much better visual understanding of the relationship between our properties and our control points directly in the editor!

Now, lets actually render our curve segments. First, let's make a couple helper methods.

Pretty straightforward. BuildTValues() gives us our tValue array, like we went over earlier, and BuildControlPoints() packs our set of control points into a matrix for us. Now we can finally draw the lines for our curve segments:

This code is a bit heavier, so let's follow it step by step.

First, we use our helper methods from above to build our tVector array and our control points matrix.

But why did we create brand new variables instead of using the _tValueSegmentArray and _controlPoints fields we already have in the class? Remember that we're running this code on OnDrawGizmosSelected(). This method is only called by the editor, and as we've already seen, it doesn't need the scene to be running for this method to be executed, so we can't be sure that those fields are actually assigned. We could assign them directly in OnDrawGizmosSelected(), but that's heavily discouraged. OnDrawGizmosSelected() could run in play mode too, and editing values directly could interfere with running game code. For this reason, It's generally best to write DrawGizmo code that treats all class properties as if they are read only and possibly null. Don't worry, performance in DrawGizmo code is not super critical, because this code will never be executed in a shipped project.

Then, we run the matrix calculations we have been building up to all this time. It might look a little bit anticlimactic, but this single line of code is where all of the magic happens, and where most of this tutorial will iterate on. The method mul() here comes from our Unity.Mathematics package we imported earlier, and corresponds directly to the HLSL intrinsic function of the same name. You can think of it as a sort of one-size-fits-all multiplier method, and can multiply any combination of vectors, matrices, and (in C#) even quaternions. Out of this loop, we're creating an array of our points on the curve.

Finally, we run through another loop, but this time, we use the points we generated to draw more lines and icons.

You might have noticed that using two for() loops is probably not the most efficient way to do this. Indeed, we could do the exact same thing faster by combining the loops first, but hold that thought, because as you'll see in the next part, that first loop is going to be entirely replaced with a compute shader. For now, lets pop back into the editor and see the fruits of our labor:

Awesome! Now you can really start playing with the properties and see how it affects our curve. We can even see how, as we adjust our segment count, the granularity our our curve changes:

Building a Coordinate System

The next question is: how do we build a mesh around this curve? After all, meshes are sets of points that need coordinates in 3D, so a single curve isn't enough. Lets build a coordinate system on our curve.

A Coordinate System allows us to navigate along the curve in 3D space. In other words, we need to know, at every point on the curve, which way is forward, up, and right from that point.

To do this, we're going to create a struct that will hold our curve points we're already calculating, and also unit tangent (forward), up, and right vectors that we will use as a basis for our coordinate system.

This struct has 4 fields representing the variables we just talked about, and a [] style indexer that takes 3 float values and gives us back a 3D point in that frame's coordinate system. Let's also create a method for drawing our frame as a gizmo.

In this method you can see our indexer at work. The points are all at z == 0, meaning they're all laying on the normal plane to the curve at that point.

A single frame, the lines jut out 2m, with the halfway point marked with a shorter line at 1m. Also notice that the green line is aligned with the "up" transform and the red is aligned with the "right" transform, just like unity does.

‍

Now that we have our struct we're going to need to generate the frames.

Getting the Tangent

We can get the tangent just like we do in 2D; by taking the derivative of the curve, and normalizing it. Luckily for us, we can do that by simply using the same equation but slotting in a different constant matrix!

Back in our curve generation code:

Getting the Normals

The problem with 3D normals on a line curve is that there's an infinite number of them at any point on the curve. Luckily, once we have one, we can use the cross product to find another. But which one do we start with?

In my research on the web, generally these are the most common options that are well documented:

Take the tangent and cross it with the universal "right" direction (Vector3.right) to get an "up" direction that is perpendicular to the tangent, then cross the new up with the tangent to get the corresponding "right" vector. This is sufficient for all of the use cases where the mesh is mostly facing up, like roads and conveyor belts, but obviously this method breaks if you try to go upside down.
The same process as above, except instead of using the universal up vector, take the second derivative, acceleration, and cross that with the tangent get the other vector in the frame. This is more flexible, but unfortunately, this causes problems with some curves because if the curve changes the direction of curvature, like in the case of an "s" curve, the acceleration vector also inverts, causing the frame to suddenly "flip" around the curve. What we want are frames that smoothly rotate orientations. This is can be solved using advanced techniques like Rotation Minimizing Frames, but that algorithm is unfortunately unparallelizeable, making it a poor fit for something we want to power through on the GPU.

‍

I also don't like these methods because they don't take into account the rotation at the ends of the curve, which is something I don't think I've ever seen documented in my research, so let's pull it off!

I'm going to introduce you to a third option, and it's going to sound alarming; we're going to calculate a related curve in quaternion space. Okay, hear me out: that might sound like the literal definition of hell if you're a person with any experience using quaternions, but I promise, it's actually not as painful as you think it is.

The naive approach to pull this off would be to treat the quaternions as 4D points and just use them like control points. That works with many setups, but it's very unstable, and can often create rotations that don't fit on our curve at all. The problem with that approach is that quaternion space is non-euclidean; the entire set of unit quaternions (the ones that represent true rotation) actually lie on a 4D hypersphere, so we'll have to do this one a little different.

Let's make another quick helper method under BuildControlPoints()

This method might look familiar to you: this is similar to De Casteljau's Algorithm for a bezier curve, except instead of using lerp, we're using a similar interpolation function called slerp, which stands for "spherical linear interpolation." That might sound like an oxymoron, but it's not. A slerp is just like a lerp, it's a straight line. The difference is that lerp can draw a straight line on a flat, euclidean surface, while a slerp can draw a "straight" line on the curved surface of a sphere.

It should be noted that the bezier curve function is not designed for use in non-euclidean (non-affine) space, so in reality, this curve does not meet some of the extra requirements for this to be a real bezier curve, but it's a close enough approximation that works for our needs in this application.

Unity provides an implementation of slerp on the standard Quaternion class as a way to interpolate between Quaternions, and works just finewith De Casteljau's Bezier Curve Algorithm. The one downside is that, unlike lerp, slerp doesn't have a cool, closed-form matrix style representation that we could use to boil down the entire curve generation to one or two matrix multiplications like we did with the points, but this still works.

Now we finally got that out of the way, lets take a look at how we can use this:

Perfect! We can now calculate the up vector using our bezier interpolated quaternion, and use that to find the right vector by crossing it with the tangent.

But we're still missing something; to generate a bezier curve, we needed 4 control points to help guide the line from start to end. Likewise, we need 4 control rotations to guide the rotation of each frame from the start rotation to the end rotation. Additionally, this "rotation" curve needs to pick rotations that is oriented correctly to the tangent of the curve line at each point (or, at the very least, is close enough so that we can correct the frame with cross products).

The Control Rotations

In this tutorial, we started with 2 points, and extruded those points using forward vectors to find the other 2 points. As I said before, I want this to be a system that takes into account the endpoint rotations as well, so we can just take the startPoint and endPoint rotations for our first and fourth control rotation just like we did with the points, but that leaves us to figure out the second and third control rotations. This is another place where you could go your own way; while some setups are better than others, I've never found a one-size fits all solution.

For example, maybe you're making a rollercoaster sim and you want to provide the ability to apply banking angles to the curve, you're best bet would be something like this:

You can then expose bankingAngle in the inspector here here, allowing you to bank the angle of the mesh without disturbing the start and end rotations!

The tall green and red lines are the "up" and "right" directions for each control rotation.

Another option I like is using Quaternion.LookRotation() to orient each rotation in the direction of the next spline, and the up direction to a vector lerp between the start- and end- up directions at 1/3 and 2/3 the way respectively. I find that this setup yields the most stable results for our curve.

To help manage these options, lets make an enum that we can expose to the inspector.

Finally, let's create the BuildControlRotations() method. Let's have the method take the control points instead, and then, using the current selected rotationMode in the inspector, generate the control rotations using varying methodologies. and output them via a 4-tuple.

Feel free to experiment with these modes in the scene, and add your own too!

Getting Started With Compute Shaders

‍

For most of the history of GPUs, their only purpose were rendering graphics on a display, usually using vertex and fragment shaders. They do this using mass parallelization, allowing shaders to execute operations on thousands and millions of vertices, pixels, strips, triangles, and etc all at once. Compute shaders allow us to take advantage of this mass parallelization for arbitrary uses, like calculating curves.

Compute shaders are particularly useful in game design when we want to do extra work on the GPU that normal shaders can't accommodate. Every application with shaders will involve uploading data from the CPU to the GPU, and so most architectures are designed to do that efficiently. However, reading back that data takes more time, somewhere in the neighborhood of 2-3ms. It's for this reason that you almost never want to give work to the GPU where you'll need to read it back on a frame by frame basis outside of testing and development purposes (e.g. OnDrawGizmos()).

In Unity, there are 5 steps to setting up and running a compute shader

Find the kernel(s). In the most simple terms, a kernel is like a "method" in the compute shader code that you can dispatch execution from C#. Since C# and shader code are not linked, C# needs to know where the method is on the GPU so it knows what to command it to do. Unity makes this pretty easy: you can simply call FindKernel() on the compute shader object and pass it the string name of the kernel.
Create your compute buffers. Buffers are just a fancy word for array usually used when referring to lower level hardware code. A ComputeBuffer object represents a reference to a compute buffer on your GPU. Unlike an array in native C#, you can't access it's elements without using storage that's on the CPU first, which is why compute buffers have a GetData() and SetData() methods that both take an array reference.
Bind your compute buffers. Creating a compute buffer object in C# doesn't magically tell your compute kernel where it is. Like kernels, you first need to find the reference to the compute shader variable from C# using it's string name, this time using Shader.PropertyToID() to actually get the ID, then use that ID to bind the compute shader. Compute shaders are bound per kernel, meaning that if you use more than one kernel, you'll need to bind your buffers to each one individually.
Upload everything else that isn't a buffer. This is a lot like binding your buffers. You'll need Shader.PropertyToID() again, then use the ID to set the variable using one of the other Set methods on the compute shader. This time you don't need the kernel though. Buffers are the only things that need to be set on a per kernel basis.
Dispatch your shader. Executing the Dispatch() method tells your GPU to execute the shader. It takes 4 arguments: the kernel ID we got in step 1), and three values representing the thread size in 3 dimensions. We'll talk more about thread groups in the next section.
Dispose of our buffers when we're done with them. This needs to be done at some point before they go out of scope. Unity's console will give us a warning if we forget to do this.

‍

Let's start by making our compute shader. First, find or create a folder called Resources in your project. Similar to how we can use the Gizmos folder to load icons easily, Unity Resource folders allow us to reference certain assets so we can find them easily using Resources.Load<>(). In this case, it's where we want to put our compute shader so we can load it in our code.

One more thing: for the next chunk of the tutorial, we're going to be switching back and forth between C# and HLSL. To help us keep this straight, I'm going to use a dark background for code snippets in C#, and a light background for the code in HLSL.

After you create a compute shader in a Resource folder, open it up and paste this code to start.

First, we declare the kernel using the #pragma preprocessor directive.

Preprocessor directives are special commands to the compiler denoted using the hash symbol (#), and we'll dive into more of these. #Pragma is a general purpose directive used to communicate special parameters to certain compilers. Generally, pragma directives are look-don't-touch kind of code unless you're writing your own shader compiler like, for example, Unity did. In this case, Unity's compute shader compiler uses the kernel pragma directive to look for the kernel with a name that matches the token, like ComputeCurveFrames. This is used by Unity in the background when we call "FindKernel()" in C#.

If you've never played with compute shaders before, the kernel declaration can be a bit disorienting, and it doesn't help that shader syntax is not documented well. First, we have to talk about numthreads, because Microsoft sure doesn't know how. Consider this horrendous diagram:

**"See? Simple"** said the Microsoft engineer as he hit send on this aggressively perplexing image to Microsoft's Technical Writing Department.

The above image can be found in Microsoft's HLSL docs on numthreads, and is used in a bunch of other places in the documentation on the subject. No, it's not just you; the image is incomprehensible to an almost comical degree, and in my opinion, constitutes some of the worst technical writing I've ever seen, and only serves to complicate the subject, so let's break this whole thing down in a way that is hopefully a lot simpler.

A Simpler Hypothetical

Consider this hypothetical kernel declaration below as a starting point.

This hypothetical compute shader, which won't actually compile, has 2 kernels and 1 method, called by both of the kernels. The kernel MakePoint() is a compute kernel that makes a point, and the kernel MakeTriangle() is a compute kernel that, you guessed it, makes a triangle. Don't bother yourself with how MakePointFromIndex() works, just know that it does.

You might have noticed though that the kernels actually execute the exact same code. So what's the difference?

The numthreads attribute above each kernel tells us how many times this kernel executes. The numthreads above MakeTriangle() says that when it's called, it executes 3 different iterations on 3 separate threads simultaneously, each assigning an incremented value to i on each call; 0, 1, and 2. In this way, compute kernels can be understood as an multi threaded micro-encapsulation of a single for() loop.

So now, what if I want to make an entire mesh out of triangles using this kernel? Let's talk about thread groups. Next, consider this hypothetical call to MakeTriangle() from C# via the Dispatch() method.

As mentioned before, Dispatch() is called on the compute shader object, and requires the kernel ID and the thread group size. As mentioned before, MakeTriangle() is only capable of making points in groups of 3, so the thread group size here should be the number of triangles we want to make.

Hopefully, it's starting to become clear how this all fits together. The first mistake that people make is they try to set the numthreads to as many times as they think they'll need it to execute, then call dispatch() with only 1's as their thread group size(s). it's really important to understand that numthreads should be seen as a fundamental part of the kernel definition, NOT simply a declaration of how many times you want it to execute. A single call to MakeTriangle() does the exact same thing as a single call to MakePoint(), but because we declared numthreads(3), MakeTriangle() is only capable of making points in groups of 3. So it makes triangles.

But WAIT A MINUTE:

What does the variable i do in the kernel during this execution then? As mentioned before, it iterates up for each thread in the thread group, of which we already established there's 3. But it's the only argument here, so this call would just make the same exact same triangle a "triangleCount" number of times.

How would we use this kernel to make different triangles in the dispatch call? Let's make a small modification to the kernel declaration:

This new declaration showcases another poorly understood piece of shader syntax: system value semantics.

A semantic is a short little descriptor that comes after a variable or kernel/method declaration that conveys extra information about it, and are denoted using the colon (:) symbol. These can be really difficult to wrap your head around coming from higher level languages because there's no equivalent for this in languages like C#. Just understand that they're context specific pieces of shader syntax that let the program know what to do with certain things, and they're all over the place. They're usually only usable on certain variables, declarations, shader stages, and etc, and like #Pragma, you generally don't make your own unless you're in deep. You can read about all the various semantics here. Trust me, we'll se a lot more of these in the next part.

System value semantics, all of which are denoted by the prefix SV_, are semantics that indicate variables and methods that are filled and called by the system itself. As you might have guessed, this is actually how i gets it's value, because kernels aren't called in a traditional call stack the way normal methods are, so there's actually no other way for the application to fill these arguments.

Here, we use 2 system value semantics: SV_GroupID and SV_GroupThreadID.

SV_GroupID gives us the index of the thread group we are in. In our simplified example, this id gives us the index of the triangle. In other words, this value will be anywhere from 0 to triangleCount, or whatever size we specified when we called Dispatch().
SV_GroupThreadID gives us the index of the thread within the thread group. In our simplified example, this id gives us the index of the point of the triangle. In other words, this value will be anywhere from 0 to 3, or whatever size we specified when we declared numthreads for this kernel.

There's also SV_DispatchThreadID.

SV_DispatchThreadID denotes the total accumulated group thread index over all groups. In our simplified example, this id gives us the index of the specific point of the entire dispatch call. In other words, this value will be anywhere from 0 to (3 * triangleCount), or (whatever size we specified when we declared numthreads for this kernel) * (whatever size we specified when we called Dispatch).

All of these are optional in the kernel declaration, and can be put in any order you like, but obviously you need at least one of them to do anything useful in the shader. In this way, you can also not only see how compute shaders can also be used like for() loops, but also nested for() loops.

One more thing: Internally, each thread group executed in size 32 chunks, and you cannot have more than 1024 threads in a group. For this reason, you should try to make your numthreads in multiples of 32. In this example we didn't, so hypothetically, a single call to MakeTriangle() means we wasted 29 iterations in practice.

The Real Compute Shader

So far, we've dealt with this in 1 dimension, but as we went over before, this is actually all 3 dimensional. Let's take a look at our real declaration again:

In our kernel, we're using SV_DispatchThreadID. So that means our id value denotes the total thread index over a single dispatch call.

Don't let the 3D nature of numthreads fool you, it's no different than when it was one dimensional, the extra dimensions are just there if you wanted to iterate through things in more than one dimension. Also notice that our iterator variable is also a uint3 vector now, so the x, y, and z value of that vector correspond to the x, y, and z dimension of that iterator. Maybe you want to perform an operation on a 2D texture, so you'd use 2 of the dimensions here, and if you were generating terrain or voxels in 3D, you would actually use all 3.

In our case, we only need 1 dimension here, so from here on out we can completely ignore the y and z values if our id index.

By looking at our numthreads, we can tell we only have 32 threads in this group. We could get away with just doing 1, but as mentioned before, that would waste 31 iterations for each thread group we call on this kernel.

BezierFrame Struct

Now lets start porting over our code from C#. First, we'll need to bring over our BezierFrame struct.

The syntax should look mostly familiar.

In order for us to work with these bezier frames between the CPU and GPU, we'll need to make sure the data crosses over correctly. Just to be sure it does, lets add this attribute to our BezierFrame struct in C#.

This says that our fields will map directly over based on the order they're actually defined in. Notice that the [] indexer and DrawGizmo() method can be disregarded.

Fields

Notice our control points are stored in a float4x4 matrix instead of a float4x3. The reason is that, frustratingly, the SetMatrix() method we will be using on the C# end only takes 4x4 sized matrices as arguments. As a workaround, we can just modify our BuildControlPoints() and set the last column 0s in C# and just ignore it.

Then, lets create a #define preprocessor directive in our compute shader that we can use in place of our 4x4 control points matrix. a #define directive, sometimes referred to as a macro, tells the compiler to literally cut and paste the code that follows wherever the macro is used, prior to compilation. This is knows as macro expansion, which you can read more about here.

What we're doing here is making a shortcut for us by taking the float4x4 and turning it into a float4x3, ignoring the last column of 0's we don't need, so it will fit into our mul() equation when we use it.

The "\" characters at the end of the line indicates that the next line is also a part of the macro. They can be strung together to make a multi-line macro.

Finally, we'll also want to make a modification to our BuildControlRotations() method in C# to build a matrix instead of a 4-tuple, as the former is much easier to send over.

Buffers

Now we can define our buffers. These will be the fields we will bind the ComputeBuffer objects to in C#:

In HLSL, these are the two kinds of buffers we will almost always be working with. StructuredBuffer<> is a read only buffer in the compute shader. Obviously we can write to it in C#, but we can only read from it in HLSL. RWStructuredBuffer<> allows the compute shader to also write to the buffer.

Here, we define _TVectors as a StructuredBuffer<> because we only write to in in C#, while _CurveFrames is a RWStructuredBuffer<> because that's the buffer we're writing to in our shader.

Constants

Next is our constants, BEZIER_MATRIX and DERIVATIVE_MATRIX. For simplicity's sake, let's use macros here also.

Quaternion Helper Methods

Unfortunately, quaternion is not a native type in HLSL, so we'll need to make a couple of helper methods so we can work with them as float4 values. Luckily, there's already an excellent open source library for this you can find here. For simplicity's sake, we're going to take only the helper methods we need.

All this just allows to to use slerp() and mul() (called QMul() here) just like we did in C#.

Now, we can port over QuaternionBezier() to our Shader. Remember that our quaternions are in matrix form, and that we can swizzle our matrix just like we can with vectors by stringing together _m{i}{j}, where {i} and {j} refer to the coordinate in the matrix.

Simplifying Our File

By now we have everything we should need to make our kernel, but first, let's make our file a lot simpler.

In the same Resources folder our compute shader is in, create a new file and name it BezierInclude.hlsl. We're going to take all of the code out of our compute shader except for our kernel and put it in this include file. Above our kernel in the compute shader, add in this line.

The #Include preprocessor directive is what we use to reference this file in our code so we can use it. It's followed by the file name. Note that if the file were in a different folder, we would need to have the full directory from "Assets/..." in order for it to work. The naive way to interpret this command is to say it's analogous to the "using" in C#, but it's important to know that what this is actually doing is pasting the entire file into your code before compiling it. There's no such thing as symbolic linking in lower level languages like HLSL.

It's really can't be stressed enough that this is recursive, because this can get you into trouble without you realizing it. Let's say you have two hlsl files, A.hlsl and B.hlsl, but A.hlsl has an include for B.hlsl and vice versa; what do you think happens if you included one of these files? For this reason, it's best practice to surround all of your code in every include file with these lines, replacing BEZIER_INCLUDED with a unique identifier for your library:

The #ifndef preprocessor directive means "if not defined." Everything between that line and the next #endif preprocessor directive at the bottom of the file is only compiled with the code if BEZIER_INCLUDED isn't already defined, and within the #ifndef block, we define it. This guarentees that, at compile time, this file is only actually included the first time. All subsequent calls to include are ignored because BEZIER_INCLUDED is defined here.

ComputeCurveFrames Kernel

Finally, we can make our kernel. All of the work leading up to this makes this most likely the easiest part, because we can finally port over the code we had in OnDrawGizmosSelected():

Notice again that we only use id.x. The y and z dimensions of our thread group don't matter for our use case.

Using Our Compute Shader

We're finally in the home stretch here for part 1. At this point, we don't need our curve frames loop we made in OnDrawGizmosSelected() because we replaced it with a compute shader, that also means we don't need the constants, QuaternionBezier, and etc.

Like I said before, there's 5 major steps to actually running this shader. Before we do any of them, of course, we'll need to load our shader in OnDrawGizmosSelected(), and while we're at it, let's set up our variables here that we'll use to pass off to our shader:

Also, lets add our property ID's statically to our script for readability. Somewhere in our class:

1) Find the Kernel

This one should be pretty straightforward:

2) Create Compute Buffers

Next we need to create our buffers.

The ComputeBuffer constructor takes 2 arguments: the count and the stride. The count is the number of elements we expect to hold. Remember, our compute kernel calculates our curve in 32 size chunks, so the length here is rounded up to the nearest multiple of 32. The stride is the size of each element in the buffer in bytes. You can use sizeof() here for readability, but note that you can't use sizeof() on managed types without using reflection, but for simplicity we're just going to use the total number of floats in our BezierFrame struct. Since it holds 4 float3's, that makes 12 floats.

We do need to use SetData to actually populate our compute buffer with our backing array we have in C#. This is where we are actually passing data from the CPU to the GPU.

3) Bind Compute Buffers

These lines should also be pretty straightforward.

4) Upload Everything Else that isn't a Buffer

Pretty straightforward again. Notice this time that we don't need to specify a kernel to set these values.

5) Dispatch the Shader

Finally we can dispatch the shader. This is where we command our graphics hardware to execute our kernel.

Remember, our kernel only works in 1 dimension, so we can just set the y and z thread group sizes to 1. Our x thread group size is how many size 32 chunks we want to execute using our compute shader.

6) Dispose of our buffers when we're done with them

We need to dispose of our buffers, but before we do that, we should read out our curve frames.

This is where we actually get date from back from the GPU to the CPU. Remember that this is a bit longer than sending data to the GPU, so it should be avoided in game loop code at all costs. Notice that our curveFrames array we're using to hold the values that come out of the buffer may actually be smaller, because the count of the compute buffer is rounded up to the nearest size 32 chunk. That's okay; all of the values that are out of this range will just be ignored, which is fine because we don't want them anyway.

Now we can dispose of our buffers.

That should do it! you should finally be able to go back to the editor and see your work. If you did everything right you shouldn't see any difference, except the fact that now, the curve is running on the GPU!

Conclusion

In this tutorial, we went over a simple matrix equation for constructing a Bezier curve, developed some gizmos to help us see our curve using the math we discussed, set up a system to build a coordinate system around the curve by imitating the bezier curve using quaternion slerp(), and then ported over all of that functionality to a compute shader.

In the next tutorial, we'll transition to allowing our generator to work in play mode and actually render deconstruct a source mesh, use OnDrawProcedural() and a custom surface shader that will read from our curveFrames Mesh. This way, during runtime, we'll never actually need to read back anything to the GPU.

Author's note: part 2 of this tutorial isn't out yet. But the good news is every one of the tutorials I'll make on this website will come with source code if possible, and the source code for this project is farther ahead than this part of the tutorial! So feel free to check it out at the link below if you want to dive deeper and get a preview of where this is going.

Testing Our Work With a Wireframe Mesh

One last exercise before we finish here. Comment out the last loop we made in OnDrawGizmosSelected() that draws the curve and the frames and replace it with this code:

Back in your scene view, you should see something like this:

Feel free to mess around with this all you like, and get an idea of how exactly we navigate through our coordinate system for the next tutorial when we actually build and render a mesh.

Labs

CrockettScience Labs