Metal

Category	Graphics

Frameworks

Overview

Metal is a low-level graphics API with programmable vertex, fragment and compute shaders.

Metal runs on iOS, macOS and tvOS.

Metal is not supported by the Xcode iOS simulator.

Metal 1

Metal was announced at WWDC 2014 for iOS and at WWDC 2015 for macOS and tvOS.

On an iOS device, Metal requires a A7 processor or later running at least iOS 8. On macOs, Metal requires a 2012 device or later running OS X El Capitan.

Metal 2

Metal 2 was announced at WWDC 2017 for iOS 11, tvOS 11 and macOS High Sierra.

Metal 2 supports new features powered by the A11 processor such as imageblocks, tile shading and threadgroup sharing.

MetalKit

MetalKit was announced at WWDC 2015.

It's an additional framework that helps setting up the Metal view for rendering.

View

Texture loader

Model I/O integration

View

Encapsulates a CAMetalLayer as the backing layer for the view. https://developer.apple.com/documentation/quartzcore/cametallayer

Manages the color attachments associated with the frame buffer.

Handles the main loop with a fixed frame rate.

Texture loader

The texture loader supports JPG, TIFF, and PNG formats. These formats are not recommended at runtime as they require a conversion to an hardware format. Additionally, JPG is a lossy format, and should be avoided.

MetalKit also supports PVR and KTX which can be directly copied into hardware memory.

The texture loader can generate mipmaps.

Model I/O integration

The model I/O integration supports ABC (Alembic), DAE (COLLADA), and OBJ (Wavefront) formats. MetalKit generates the appropriate vertex layout and vertex buffers. The mesh is divided into submeshes with optimized index buffers.

MetalKit can allocate hardware memory during loading but it requires conversion to hardware data.

Fundamentals

Metal is a modern graphics API

Thin API

Less expensive tasks

Explicit command submission

Low CPU overhead

The Metal API translates the API calls to the GPU hardware directly.

During a draw call, the GPU starts rendering immediately as all the data has already been encoded on the hardware.

Metal is designed for:

Tile-based deferred-mode rendering

Discrete (macOS) and unified (iOS) memory systems

Objects

Device
- Represents the GPU

Command Queue
- Contains a sequence of command buffers

Command Buffer
- Contains GPU hardware commands

Command Encoder
- Translates API commands into GPU hardware commands

States
- Buffer configurations, blending, depth, samplers...

Shaders
- Vertex, fragment and compute programs

Resources
- Textures (formatted memory)
- Data buffers (vertices, indices, constants...)

Command Queue

The command queue is created at startup. Typically there is only a single queue.

Command Encoder

There are three types of command encoders, and they can be interleaved.

Render
- Graphics rendering operations.

Compute
- Data parallel computations.

Blit
- GPU-accelerated resource copy operations.

States, shaders and resources are attached to command encoders.

There can be multiple command encoders at the same time, one for each pass.

Command encoders generate commands immediately.

Command encoders can run on different threads.

The order of their submission is still by the application.

Render Command Encoder

Encodes commands for a single rendering pass on a single render target.

Compute Command Encoder

Can be interleaved with render and blit commands.

There are only two kind of states:

Compute state
- Compute function, workload configuration.

Sampler
- Filter states, addressing modes.

Blit Command Encoder

Asynchronous data copies of textures and data buffers.

The blit command encoder can be used to generate mipmap levels with the MTLBlitCommandEncoder.generateMipmaps(for:) method.

https://developer.apple.com/documentation/metal/mtlblitcommandencoder/1400748-generatemipmaps

States

Expensive states are created using descriptors and cannot be changed.

Render target configurations

Depth and stencil configurations

Shaders

Blending

Inexpensive states can be changed in the command encoder.

Textures an data buffers specifications

Samplers

Cull mode, facing orientation, polygon mode, viewport...

Resources

The size and format of resources is fixed.

The render target textures are fixed.

The content of the data buffers and textures can be updated.

The resource update model is designed for an unified memory system such as iOS.

On discrete memory systems such as macOS, the managed resource system handles the synchronization.

Storage mode

On unified memory systems, the resources should be created with a shared storage mode.

By default, on macOS they are created with a managed storage mode.

GPU-only resources such as render targets should use the private storage mode for better performances.

Usage

When creating textures, the proper usage should be set.

By default, the usage is not optimized.

Texture compression

PVRT, ETC2, EAC (macOS, iOS).

ASTC (iOS only).

BC (macOS only).

Shader Language

The Metal shading language is a unified language for vertex, fragment and compute shaders.

It's based on C++11 and built from LLVM and clang.

Data types

C++ data types: bool, char, int, float...

Defined as SIMD types in simd/simd.h.

half is a 16-bit floating-point value (more efficient than float).

Vectors: charN, intN, floatN, halfN...

Matrices: charNxM, intNxM, floatNxM, halfNxM...

Atomic: atomic_int and atomic_uint (race-free operations).

Custom structs.

Alignment

Structs are aligned at the size of the largest element in the struct.

There are packed vector types aligned at scalar type length.

However, the packed types are not efficient for CPU operations.

struct Vertex
{
  float4 a;
  float2 b;
  float  c;
};

Size of Vertex is 32 bytes (a is aligned at 16 bytes)

struct Vertex
{
  packed_float4 a;
  packed_float2 b;
  float         c;
};

Size of Vertex is 28 bytes (a is aligned at 4 bytes)

Textures

Textures are templated types.

There are two template arguments:

Data type (float, half)

Access (read, write). Default is read (optional argument).

The origin is at the top-left corner of the texture.

Samplers

Samplers are independent from textures.

They can be declared in the fragment function or set as render states.

Buffers

Buffers are declared in an address space:

global
- Indexed dynamically using vertex_id, instance_id or global_id.

constant
- Multiple instances index the same location (light and material data, skinning matrices...).

Math

There are two modes for math operations:

fast
- Operations on NaN is undefined.

precise
- Higher range.
- May impact performances.

Fast math is the default but can be disabled with the compiler option -fno-fast-math.

It's recommended to use fast math by default and call precise math functions manually when needed using the metal::precise namespace.

Compilation

The Metal shading language code is compiled in two stages:

Front-end compilation happens at build time in Xcode or on the command-line. Metal files are compiled from high-level source code into intermediate representation (IR) files.

Back-end compilation happens on the target platform at runtime. IR files are compiled into low-level machine code.

The metal shaders that are compiled at build time by Xcode are stored in a default.metallib file that is included into the application bundle.

The MTLDevice.makeDefaultLibrary method looks for shaders inside this file.

Otherwise, the MTLDevice.newLibraryWithFile:error method will load a shaders that were built from the command-line.

let library = device.makeDefaultLibrary()
let vertexFunction = library.makeFunction(name: "vertexShader")
let fragmentFunction = library makeFunction(name: "fragmentShader")

Blending

The render pipeline state and the color attachments can be configured to achieve different alpha blending effects.

Best Practices

Built time

Compile the shaders at build time using Xcode or the command-line.

Convert the textures into PVR or KTX formats.

Convert the models into binary data using the appropriate interleaved vertex layout.

Initialization

Create a single device and command queue.

Loading

Create the render pipeline states and compute pipeline states.

Create the depth stencil states and sampler states.

Create the shader libraries.

Create the vertex and index buffers.

Create the texture data.

Create the shader constants buffers.

Dynamic Resources

There are three type of dynamic resources that can be updated by the CPU.

These resources should be created in shared memory to optimize their use by the CPU and GPU.

Shader constants.

Vertex and index buffers for dynamic geometry.

Dynamic textures.

Triple Buffering

Use triple buffering for shader constants with at most three command buffers in flight.

Tutorial

Overview

In the Metal API, there are different descriptor types that are used to setup the creation of Metal objects. These descriptors can be used to create multiple objects but they are not persistent.

When using Metal, your app follows a client-server pattern.

The app is the client and sends commands to the GPU.

The GPU is the server and process commands and notify the app when it can process more commands.

Commands are encoded into command buffers and they are sent in an ordered command queue.

For single-threaded apps, you create a single command buffer.

A Metal app needs a command queue and a pipeline object.

The app sends commands to the GPU through a command queue.

The pipeline object tells Metal how to process the commands.

Steps

Initialization

Setup a view that supports Metal.

Create a command queue.

Setup a render pipeline descriptor with custom vertex and fragment shaders.

Prepare custom vertex data in a vertex buffer.

Prepare index data in an index buffer.

Prepare texture data in a texture object.

Draw Loop

Create a command buffer to send commands to the GPU.

Setup a render pass descriptor.

Clear the color attachment.

Create a command encoder.

Generate commands
1. Change render states.
1. Associate buffers.
1. Draw primitives.

Complete the command generation.

Obtain a drawable from the view to present.

Commit the command buffer.

View

In macOS and iOS, everything is represented inside a view.

With MetalKit, the MTKView class can be used as the Metal view which simplifies the initialization and management of render targets. The MTKView class is a subclass of:

NSView on macOS

UIView on iOS

https://developer.apple.com/documentation/metalkit/mtkview

In the app storyboard, we set the custom class name of the view controller to MTKView.

The viewDidLoad() function of the view controller is invoked when the view is loaded. We override it to initialize Metal.

First, we use the view property to get an instance of the MTKView.

guard let mtkView = self.view as? MTKView else
{
  print("View attached to GameViewController is not an MTKView")
  return
}

The, we need a Metal device that we can then use to create Metal objects.

In Metal, a GPU is represented by MTLDevice.

https://developer.apple.com/documentation/metal/mtldevice

Select the default GPU

iOS and tvOS have only one GPU. MTLCreateSystemDefaultDevice() returns a device that supports Metal.

https://developer.apple.com/documentation/metal/1433401-mtlcreatesystemdefaultdevice

guard let defaultDevice = MTLCreateSystemDefaultDevice() else
{
  print("Metal is not supported on this device")
  return
}

Select the GPU device

On macOS, multiple GPUs can be present and can be enumerated to select the GPU to use with Metal.

https://developer.apple.com/documentation/metal/choosing_gpus_on_mac

A list of all the Metal devices in the system is obtained by calling MTLCopyAllDevices().

However to handle GPU change notifications, it is recommended to call MTLCopyAllDevicesWithObserver(handler:) that lets you specify an observer to receive device notifications during the lifetime of the app.

https://developer.apple.com/documentation/metal/2928189-mtlcopyalldeviceswithobserver

Once we have a valid MTLDevice, we assign it to the device property of the view.

Finally, we set the delegate property to a custom class that implements the MTKViewDelegate protocol.

We will receive two notifications:

mtkView:drawableSizeWillChange
- Called when the size of the view will change.

drawInMTKView
- Called when to render into the view.

The view setup is completed, and we can now start working with Metal.

Initialization

The command queue is represented by MTLCommandQueue.

https://developer.apple.com/documentation/metal/mtlcommandqueue

To create the command queue, call MTLDevice.makeCommandQueue().

https://developer.apple.com/documentation/metal/mtldevice/1433388-makecommandqueue

The command buffer is represented by MTLCommandBuffer.

https://developer.apple.com/documentation/metal/mtlcommandbuffer

To create the command buffer, call MTLCommandQueue.makeCommandQueue().

https://developer.apple.com/documentation/metal/mtlcommandqueue/1508686-makecommandbuffer

There is a maximum number of command buffers waiting to be executed. The method blocks until a buffer becomes available.

After creating a command buffer, you create an encoder object to fill the buffer with commands.

An encoder object that can encode graphics rendering commands is represented by MTLRenderCommandEncoder.

https://developer.apple.com/documentation/metal/mtlrendercommandencoder

This is a subclass of MTLCommandEncoder.

https://developer.apple.com/documentation/metal/mtlcommandencoder

To create the encoder object, call makeRenderCommandEncoder(descriptor:).

https://developer.apple.com/documentation/metal/mtlcommandbuffer/1442999-makerendercommandencoder

A place need to be reserved for a command buffer on its associated command queue by calling MTLCommandBuffer.enqueue().

https://developer.apple.com/documentation/metal/mtlcommandbuffer/1443019-enqueue

A command buffer can be enqueued only once.

The command buffers are guaranteed to execute in the order in which they were enqueued.

When you are ready to execute the set of encoded commands, you call the MTLCommandBuffer.commit() method to schedule the buffer for execution.

https://developer.apple.com/documentation/metal/mtlcommandbuffer/1443003-commit

The method enqueues the command buffer implicitly if needed.

Render Loop

https://developer.apple.com/documentation/metal/mtlrendercommandencoder

Create a render command encoder object

Specify the state of the graphics rendering pipeline

Specify resources for input to and output from the vertex and fragment functions

Specify additional fixed-function states

Draw graphics primitives

Terminate the render command encoder

Render Pass

In Metal, all the rendering is done inside a render pass represented by MTLRenderPassDescriptor.

Metal can render objects in a single pass or using multiple pass depending on the effects that we want to obtain. In a single pass scenario, we only use a single render pass object.

MetalKit can generate a render pass for the current drawable’s texture with MTKView.currentRenderPassDescriptor.

Drawable

There are a limited number of drawables as their take a considerable amount of space.

The generation of a render pass and the release of the associated drawable should be done as close as possible.

The current drawable is obtained with MTKView.currentDrawable, and the drawable is released when calling the MTLCommandBuffer.present:drawable function.

Color attachment

The colorAttachments property needs to be setup. In Metal there can at most four color attachments (frame buffers).

In a single pass scenario, we only use the first one, and we need to clear it to a default value by setting the clearColor property to a MTLClearColor value by calling MTLClearColorMake.

if let commandBuffer = commandQueue.makeCommandBuffer()
{
  if let renderPassDescriptor = view.currentRenderPassDescriptor
  {
    renderPassDescriptor.colorAttachments[0].clearColor = MTLClearColor(...)
     
    if let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor)
    {
      ...
      renderEncoder.endEncoding()                
      if let drawable = view.currentDrawable
      {
        commandBuffer.present(drawable)
      }
    }
  }
            
  commandBuffer.commit()
}

Synchronization

The CPU and GPU work asynchronously.

A semaphore can be used as the synchronization object.

The CPU waits for a semaphore to update uniform data.

Metal notifies the CPU when a command buffer has been processed.

The semaphore is reset and the uniform data is updated.

By allocating several uniform buffers and alternating between them, the CPU would not wait for the GPU to complete a frame. A common scenario is to allocate three buffers (triple buffering).