Shader

Tags	graphics

Overview

Shaders are small executable programs executed by the GPU.

A shader program a written using a shading language specific to a graphic device.

Shader programs in a graphics pipeline are responsible for all transform, lighting, and shading effects.

Compute programs are responsible for high-performance general purpose programs using the GPU (GPGPU).

There are different kind of shader programs, executed at different stages of the graphic pipeline.

A GPU has unified shader architecture, which means that shader programs share the same instruction set architecture (ISA).

Thanks to this architecture, a GPU can balance its workload by allocating its shader cores to different shader programs.

Shader Languages

Languages

HLSL (High-Level Shading Language for Direct3D)

GLSL (OpenGL Shading Language for OpenGL and OpenGL ES)

Metal (macOS, iOS)

SPIR-V (Standard, Portable Intermediate Representation from Vulkan)

Data Types

GPUs natively support 32-bit integers, 32-bit and 64-bit floating point scalars and vectors.

A vertex program has three type if input data:

uniform data remains constant through a draw call.

varying data comes from the vertex data or is interpolated by the rasterization stage.

resources are bind to a shader stage and can be read and written (texture can be sampled).

Functions

Shader programs can perform common operations on their data types such as additions, multiplications, ...

For other operations, such as math operations, they use intrinsic functions.

Flow control is supported using two methods:

Static flow branches that is constant during a draw call (based on the value of uniform inputs).

Dynamic flow branches that can create thread divergence (based on the value of varying inputs).

Dynamic flow branches are more costly than static flow branches, as each shader program can execute code differently.

Vertex Attributes

Vertex shader can process vertex data in an arbitrary layout. To indicate how the data is setup, vertex attributes are defined.

Each vertex attribute is defined by:

A semantic to indicate the nature of the data (position, normal, tangent...)

An offset

An index

Vertex Data (Varying)

Vertex data can be stored in multiple arrays.

Each array is described by vertex attributes.

Example

One array can contain vertex positions and another vertex colors.

Several objects can be rendered using the same vertex positions but distinct vertex colors.

Constant Data (Uniform)

In addition to vertex input data, constant data can be provided to the shader programs using uniform buffers. Typically, information such as transform matrices, time values or other effects like fog parameters are passed to shader programs as constant data.

This data data is constant during a single frame, and it is typically separated into scene data that is shared by every renderer object such as the camera view transform matrix and object-specific data that is updated before rendering an object such as a model transform matrix.

When possible, data should be defined as SIMD types to match the memory layout and alignment of the GPU.

Shader Programs

Shaders are local functions.

Most languages follow the C-style rules.

Shader Stages

Depending on the graphics pipeline, different kind of shaders are supported.

Vertex shaders - Operate on vertex data.

Tessellation control shaders - Operate on patch data.

Tessellation evaluation shaders - Operate on primitive data.

Geometry shaders - Operate on primitive data.

Fragment shaders (or pixel shaders) - Operate on fragment data.

Compute shaders - Operate on any kind of data.

Vertex Shader

This is a fully programmable shader stage.

The main task of a vertex shader is to process incoming vertex data and map each vertex to a position in the viewport (clip-space coordinates).

A vertex shader must at least output a vertex position.

A vertex shader cannot add or remove vertices.

The output of a vertex shader can be sent to different stages.

Rasterizer – The vertices are interpolate and sent to the pixel shader stage.

Tessellation – The vertices are sent to the hull shader stage.

Geometry – The vertices are sent to the geometry shader stage.

Examples

Vertex transformation (using a world-view-projection transform)

Vertex animation (skinning and morphing)

Vertex deformation

Particle creation (by outputting degenerate meshes to a geometry shader)

Screen distortion (by deforming the vertices of a screen-aligned quad)

Terrain generation (based on a height-map)

Tessellation Shader

The tessellation stage can be used to render curved surfaces.

This is an optional stage.

The level of detail can be controlled based on the distance of the object from the camera.

The tessellation stage consists itself of three stages:

Control
- Tessellation control in OpenGL and Vulkan
- Hull shader in DirectX
- Compute kernel function in Metal

Tessellator
- Primitive generator in OpenGL and Vulkan
- Tessellator in DirectX and Metal

Evaluation
- Tessellation evaluation in OpenGL and Vulkan
- Domain shader in DirectX
- Post-tessellation function in Metal

The control stage and the evaluation stage are programmable stages.

The tessellator is a fixed-function stage.

Control

The input of the control stage is a patch primitive.

A patch primitive consists of several control points defining a subdivision surface or Bézier curve.

The control stage has two functions:

Specify how many triangles should be generated, and the type of the tessellation surface.

Optionally modify the incoming patch by adding or removing control points.

The different types of tessellation surface are:

Triangle

Quadrilateral (or quads)

Isoline – Sets of line strips (usually used for rendering hair).

The tessellation factors (known as tessellation levels in OpenGL and Vulkan) have two types:

Inner edge – Indicate how much tessellation occurs inside the surface.

Outer edge – Determine how the edges are split.

The tessellation factors and the type of of the tessellation surface are sent to the tessellator and evaluation stage.

The control points of the transformed patch are sent to the evaluation stage.

The control shader can discard a patch.

Tessellator

The tessellator generates a set of vertices with their barycentric coordinates (relative locations on the surface).

The points are sent to the evaluation stage.

Evaluation

The evaluation stage processes the vertices from the tessellator using the control points to generate the output values for the vertices.

The generated triangles are sent to the rasterizer stage or the geometry shader stage.

Geometry Shader

The geometry shader can transform primitives into other type of primitives.

Geometry shaders modify input data and can duplicate it.

This is an optional and fully programmable shader stage.

The geometry shader process points, lines or triangles and can process extended primitives that contain adjacent vertices on polyline.

Geometry shaders can also process patches, but the tessellator is more efficient.

Geometry shaders support instancing.

The processed vertices of a geometry shader are sent to the rasterizer stage.

Optionally, the vertices can be written to an output stream (transform feedback in OpenGL) to be sent back through the pipeline.

This functionality can be used to simulate particles, but it is costly as the data is stored in floating point numbers.

Examples

Wireframe view by transforming triangles into edges.

Render six faces of a cubemap.

Create cascaded shadow maps.

Variable-sized particles.

Fur rendering by extruding fins along silhouettes.

Metaball isosurface tessellation.

Fractal subdivision of line segments.

Cloth simulation.

Fragment Shader

The fragment shader is known as pixel shader in DirectX.

This is a fully programmable shader stage.

The rasterizer stage interpolates the vertex data and sends it to the fragment shader stage.

The default interpolation is a perspective-correct interpolation but the type of interpolation can be changed (for example, screen-space interpolation in which perspective projection is ignored).

A fragment is the piece of a triangle that comes from the rasterizer.

The main task of a fragment shader is to process incoming fragment data and calculate a color value for the final pixels.

The fragment shader can also output an opacity value and a depth value.

The color value and depth value are then written to the color buffer and depth buffer.

The default depth value comes from the rasterizer, but can be overridden by a fragment shader.

In the merge stage, the output from the fragment shader can be used to produce different effects, by testing the current values in the depth buffer and stencil buffer.

Compute Shader

The GPU can be used for any kind of processing task and isn’t limited to graphics.

This is called general-purpose GPU (GPGPU) programming.

The compute processing pipeline is made up of a programmable kernel function, that executes a compute pass and reads from and writes to resources directly.

Thread groups

In order to execute in parallel, each workload must be broken apart into thread groups.

A compute pass must specify the number of times to execute a kernel function. Threads are organized into a 3D grid and this number corresponds to the grid size.

Each thread group has s small amount of memory that is shared among threads.

For example, for processing a 2D image, each thread corresponds to a unique texel, and the grid size must be at least the size of the image.

The thread execution width is the number of threads that can be scheduled to run concurrently on the GPU (usually a power of two). Selecting an efficient thread group size depends on both the size of the data and the capabilities of a specific device. To make the most efficient use of the GPU, the total number of items in a thread group should be a multiple of the thread execution width.

Examples

Compute the luminance of an image.

Particle simulations.

Facial animation.

Mesh culling.

Image filtering.

Lighting

Gouraud, Phong, and Flat Shading

Lighting can be computed once per vertex, or once per fragment.

Per vertex lighting is known as Gouraud shading.

Per fragment lighting is known as Phong shading.

With models of low vertex densities, Gouraud shading will produce visible artifacts such as angularly shaped highlights.

Per primitive shading produces a faceted appearance known as Flat shading.

Flat shading can be performed in a vertex shader. It is achieved by disabling the interpolation of the vertices, so that the value of the first vertex is passed down to each fragment of the primitive.

Calculations

Lighting is generally calculated in fragment shaders.

To calculate lighting in fragment shaders, the vertex shader passes the transformed vertex normals to the fragment shader as varying data.

If the vertex normals are modified in the vertex normal, they need to renormalized before they are sent to the rasterization stage.

After the interpolation in the rasterization stage, the normals need to be renormalized.

The lighting calculations can be performed either in world space, or in camera space.

If the light positions are expressed in world space, it is generally preferable to perform the lighting calculations in world space too.