Metal
Category | Graphics |
---|
Frameworks
Overview
Metal is a low-level graphics API with programmable vertex, fragment and compute shaders.
Metal runs on iOS, macOS and tvOS.
Metal is not supported by the Xcode iOS simulator.
Metal 1
Metal was announced at WWDC 2014 for iOS and at WWDC 2015 for macOS and tvOS.
On an iOS device, Metal requires a A7 processor or later running at least iOS 8. On macOs, Metal requires a 2012 device or later running OS X El Capitan.
Metal 2
Metal 2 was announced at WWDC 2017 for iOS 11, tvOS 11 and macOS High Sierra.
Metal 2 supports new features powered by the A11 processor such as imageblocks, tile shading and threadgroup sharing.
MetalKit
MetalKit was announced at WWDC 2015.
It's an additional framework that helps setting up the Metal view for rendering.
- View
- Texture loader
- Model I/O integration
View
- Encapsulates a
CAMetalLayer
as the backing layer for the view. https://developer.apple.com/documentation/quartzcore/cametallayer
- Manages the color attachments associated with the frame buffer.
- Handles the main loop with a fixed frame rate.
Texture loader
The texture loader supports JPG, TIFF, and PNG formats. These formats are not recommended at runtime as they require a conversion to an hardware format. Additionally, JPG is a lossy format, and should be avoided.
MetalKit also supports PVR and KTX which can be directly copied into hardware memory.
The texture loader can generate mipmaps.
Model I/O integration
The model I/O integration supports ABC (Alembic), DAE (COLLADA), and OBJ (Wavefront) formats. MetalKit generates the appropriate vertex layout and vertex buffers. The mesh is divided into submeshes with optimized index buffers.
MetalKit can allocate hardware memory during loading but it requires conversion to hardware data.
Fundamentals
Metal is a modern graphics API
- Thin API
- Less expensive tasks
- Explicit command submission
- Low CPU overhead
The Metal API translates the API calls to the GPU hardware directly.
During a draw call, the GPU starts rendering immediately as all the data has already been encoded on the hardware.
Metal is designed for:
- Tile-based deferred-mode rendering
- Discrete (macOS) and unified (iOS) memory systems
Objects
- Device
- Represents the GPU
- Command Queue
- Contains a sequence of command buffers
- Command Buffer
- Contains GPU hardware commands
- Command Encoder
- Translates API commands into GPU hardware commands
- States
- Buffer configurations, blending, depth, samplers...
- Shaders
- Vertex, fragment and compute programs
- Resources
- Textures (formatted memory)
- Data buffers (vertices, indices, constants...)
Command Queue
The command queue is created at startup. Typically there is only a single queue.
Command Encoder
There are three types of command encoders, and they can be interleaved.
- Render
- Graphics rendering operations.
- Compute
- Data parallel computations.
- Blit
- GPU-accelerated resource copy operations.
States, shaders and resources are attached to command encoders.
There can be multiple command encoders at the same time, one for each pass.
Command encoders generate commands immediately.
Command encoders can run on different threads.
The order of their submission is still by the application.
Render Command Encoder
Encodes commands for a single rendering pass on a single render target.
Compute Command Encoder
Can be interleaved with render and blit commands.
There are only two kind of states:
- Compute state
- Compute function, workload configuration.
- Sampler
- Filter states, addressing modes.
Blit Command Encoder
Asynchronous data copies of textures and data buffers.
The blit command encoder can be used to generate mipmap levels with the MTLBlitCommandEncoder.generateMipmaps(for:)
method.
https://developer.apple.com/documentation/metal/mtlblitcommandencoder/1400748-generatemipmaps
States
Expensive states are created using descriptors and cannot be changed.
- Render target configurations
- Depth and stencil configurations
- Shaders
- Blending
Inexpensive states can be changed in the command encoder.
- Textures an data buffers specifications
- Samplers
- Cull mode, facing orientation, polygon mode, viewport...
Resources
The size and format of resources is fixed.
- The render target textures are fixed.
- The content of the data buffers and textures can be updated.
The resource update model is designed for an unified memory system such as iOS.
On discrete memory systems such as macOS, the managed resource system handles the synchronization.
Storage mode
On unified memory systems, the resources should be created with a shared storage mode.
By default, on macOS they are created with a managed storage mode.
GPU-only resources such as render targets should use the private storage mode for better performances.
Usage
When creating textures, the proper usage should be set.
By default, the usage is not optimized.
Texture compression
- PVRT, ETC2, EAC (macOS, iOS).
- ASTC (iOS only).
- BC (macOS only).
Shader Language
The Metal shading language is a unified language for vertex, fragment and compute shaders.
It's based on C++11 and built from LLVM and clang.
Data types
- C++ data types: bool, char, int, float...
- Defined as SIMD types in
simd/simd.h
.
half
is a 16-bit floating-point value (more efficient than float).
- Vectors: charN, intN, floatN, halfN...
- Matrices: charNxM, intNxM, floatNxM, halfNxM...
- Atomic: atomic_int and atomic_uint (race-free operations).
- Custom structs.
Alignment
Structs are aligned at the size of the largest element in the struct.
There are packed vector types aligned at scalar type length.
However, the packed types are not efficient for CPU operations.
struct Vertex
{
float4 a;
float2 b;
float c;
};
Size of Vertex is 32 bytes (a is aligned at 16 bytes)
struct Vertex
{
packed_float4 a;
packed_float2 b;
float c;
};
Size of Vertex is 28 bytes (a is aligned at 4 bytes)
Textures
Textures are templated types.
There are two template arguments:
- Data type (float, half)
- Access (read, write). Default is read (optional argument).
The origin is at the top-left corner of the texture.
Samplers
Samplers are independent from textures.
They can be declared in the fragment function or set as render states.
Buffers
Buffers are declared in an address space:
- global
- Indexed dynamically using vertex_id, instance_id or global_id.
- constant
- Multiple instances index the same location (light and material data, skinning matrices...).
Math
There are two modes for math operations:
- fast
- Operations on NaN is undefined.
- precise
- Higher range.
- May impact performances.
Fast math is the default but can be disabled with the compiler option -fno-fast-math
.
It's recommended to use fast math by default and call precise math functions manually when needed using the metal::precise
namespace.
Compilation
The Metal shading language code is compiled in two stages:
- Front-end compilation happens at build time in Xcode or on the command-line. Metal files are compiled from high-level source code into intermediate representation (IR) files.
- Back-end compilation happens on the target platform at runtime. IR files are compiled into low-level machine code.
The metal shaders that are compiled at build time by Xcode are stored in a default.metallib
file that is included into the application bundle.
The MTLDevice.makeDefaultLibrary
method looks for shaders inside this file.
Otherwise, the MTLDevice.newLibraryWithFile:error
method will load a shaders that were built from the command-line.
let library = device.makeDefaultLibrary()
let vertexFunction = library.makeFunction(name: "vertexShader")
let fragmentFunction = library makeFunction(name: "fragmentShader")
Blending
The render pipeline state and the color attachments can be configured to achieve different alpha blending effects.
Best Practices
Built time
- Compile the shaders at build time using Xcode or the command-line.
- Convert the textures into PVR or KTX formats.
- Convert the models into binary data using the appropriate interleaved vertex layout.
Initialization
- Create a single device and command queue.
Loading
- Create the render pipeline states and compute pipeline states.
- Create the depth stencil states and sampler states.
- Create the shader libraries.
- Create the vertex and index buffers.
- Create the texture data.
- Create the shader constants buffers.
Dynamic Resources
There are three type of dynamic resources that can be updated by the CPU.
These resources should be created in shared memory to optimize their use by the CPU and GPU.
- Shader constants.
- Vertex and index buffers for dynamic geometry.
- Dynamic textures.
Triple Buffering
- Use triple buffering for shader constants with at most three command buffers in flight.
Tutorial
Overview
In the Metal API, there are different descriptor types that are used to setup the creation of Metal objects. These descriptors can be used to create multiple objects but they are not persistent.
When using Metal, your app follows a client-server pattern.
The app is the client and sends commands to the GPU.
The GPU is the server and process commands and notify the app when it can process more commands.
Commands are encoded into command buffers and they are sent in an ordered command queue.
For single-threaded apps, you create a single command buffer.
- A Metal app needs a command queue and a pipeline object.
- The app sends commands to the GPU through a command queue.
- The pipeline object tells Metal how to process the commands.
Steps
Initialization
- Setup a view that supports Metal.
- Create a command queue.
- Setup a render pipeline descriptor with custom vertex and fragment shaders.
- Prepare custom vertex data in a vertex buffer.
- Prepare index data in an index buffer.
- Prepare texture data in a texture object.
Draw Loop
- Create a command buffer to send commands to the GPU.
- Setup a render pass descriptor.
- Clear the color attachment.
- Create a command encoder.
- Generate commands
- Change render states.
- Associate buffers.
- Draw primitives.
- Complete the command generation.
- Obtain a drawable from the view to present.
- Commit the command buffer.
View
In macOS and iOS, everything is represented inside a view.
With MetalKit, the MTKView
class can be used as the Metal view which simplifies the initialization and management of render targets. The MTKView
class is a subclass of:
NSView
on macOS
UIView
on iOS
https://developer.apple.com/documentation/metalkit/mtkview
In the app storyboard, we set the custom class name of the view controller to MTKView.
The viewDidLoad()
function of the view controller is invoked when the view is loaded. We override it to initialize Metal.
First, we use the view
property to get an instance of the MTKView
.
guard let mtkView = self.view as? MTKView else
{
print("View attached to GameViewController is not an MTKView")
return
}
The, we need a Metal device that we can then use to create Metal objects.
In Metal, a GPU is represented by MTLDevice
.
https://developer.apple.com/documentation/metal/mtldevice
Select the default GPU
iOS and tvOS have only one GPU. MTLCreateSystemDefaultDevice()
returns a device that supports Metal.
https://developer.apple.com/documentation/metal/1433401-mtlcreatesystemdefaultdevice
guard let defaultDevice = MTLCreateSystemDefaultDevice() else
{
print("Metal is not supported on this device")
return
}
Select the GPU device
On macOS, multiple GPUs can be present and can be enumerated to select the GPU to use with Metal.
https://developer.apple.com/documentation/metal/choosing_gpus_on_mac
A list of all the Metal devices in the system is obtained by calling MTLCopyAllDevices()
.
However to handle GPU change notifications, it is recommended to call MTLCopyAllDevicesWithObserver(handler:)
that lets you specify an observer to receive device notifications during the lifetime of the app.
https://developer.apple.com/documentation/metal/2928189-mtlcopyalldeviceswithobserver
Once we have a valid MTLDevice
, we assign it to the device
property of the view.
Finally, we set the delegate
property to a custom class that implements the MTKViewDelegate
protocol.
We will receive two notifications:
mtkView:drawableSizeWillChange
- Called when the size of the view will change.
drawInMTKView
- Called when to render into the view.
The view setup is completed, and we can now start working with Metal.
Initialization
The command queue is represented by MTLCommandQueue
.
https://developer.apple.com/documentation/metal/mtlcommandqueue
To create the command queue, call MTLDevice.makeCommandQueue()
.
https://developer.apple.com/documentation/metal/mtldevice/1433388-makecommandqueue
The command buffer is represented by MTLCommandBuffer
.
https://developer.apple.com/documentation/metal/mtlcommandbuffer
To create the command buffer, call MTLCommandQueue.makeCommandQueue()
.
https://developer.apple.com/documentation/metal/mtlcommandqueue/1508686-makecommandbuffer
There is a maximum number of command buffers waiting to be executed. The method blocks until a buffer becomes available.
After creating a command buffer, you create an encoder object to fill the buffer with commands.
An encoder object that can encode graphics rendering commands is represented by MTLRenderCommandEncoder
.
https://developer.apple.com/documentation/metal/mtlrendercommandencoder
This is a subclass of MTLCommandEncoder
.
https://developer.apple.com/documentation/metal/mtlcommandencoder
To create the encoder object, call makeRenderCommandEncoder(descriptor:)
.
https://developer.apple.com/documentation/metal/mtlcommandbuffer/1442999-makerendercommandencoder
A place need to be reserved for a command buffer on its associated command queue by calling MTLCommandBuffer.enqueue()
.
https://developer.apple.com/documentation/metal/mtlcommandbuffer/1443019-enqueue
A command buffer can be enqueued only once.
The command buffers are guaranteed to execute in the order in which they were enqueued.
When you are ready to execute the set of encoded commands, you call the MTLCommandBuffer.commit()
method to schedule the buffer for execution.
https://developer.apple.com/documentation/metal/mtlcommandbuffer/1443003-commit
The method enqueues the command buffer implicitly if needed.
Render Loop
https://developer.apple.com/documentation/metal/mtlrendercommandencoder
- Create a render command encoder object
- Specify the state of the graphics rendering pipeline
- Specify resources for input to and output from the vertex and fragment functions
- Specify additional fixed-function states
- Draw graphics primitives
- Terminate the render command encoder
Render Pass
In Metal, all the rendering is done inside a render pass represented by MTLRenderPassDescriptor
.
Metal can render objects in a single pass or using multiple pass depending on the effects that we want to obtain. In a single pass scenario, we only use a single render pass object.
MetalKit can generate a render pass for the current drawable’s texture with MTKView.currentRenderPassDescriptor
.
Drawable
There are a limited number of drawables as their take a considerable amount of space.
The generation of a render pass and the release of the associated drawable should be done as close as possible.
The current drawable is obtained with MTKView.currentDrawable
, and the drawable is released when calling the MTLCommandBuffer.present:drawable
function.
Color attachment
The colorAttachments
property needs to be setup. In Metal there can at most four color attachments (frame buffers).
In a single pass scenario, we only use the first one, and we need to clear it to a default value by setting the clearColor
property to a MTLClearColor
value by calling MTLClearColorMake
.
if let commandBuffer = commandQueue.makeCommandBuffer()
{
if let renderPassDescriptor = view.currentRenderPassDescriptor
{
renderPassDescriptor.colorAttachments[0].clearColor = MTLClearColor(...)
if let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor)
{
...
renderEncoder.endEncoding()
if let drawable = view.currentDrawable
{
commandBuffer.present(drawable)
}
}
}
commandBuffer.commit()
}
Synchronization
The CPU and GPU work asynchronously.
A semaphore can be used as the synchronization object.
- The CPU waits for a semaphore to update uniform data.
- Metal notifies the CPU when a command buffer has been processed.
- The semaphore is reset and the uniform data is updated.
By allocating several uniform buffers and alternating between them, the CPU would not wait for the GPU to complete a frame. A common scenario is to allocate three buffers (triple buffering).