The low-level of a game runtime relies on many platform-specific functionalities.

Some of these features are implemented in the Standard C++ Library (STL).

On some platforms, some functionality is provided by the POSIX library or the GNU C Library.

Sometimes, none of these implementations fit the requirements, and it is necessary to access the platform's API directly.

Standard C++ Library (STL)

The standard C++ library was initially created before the standardization of the C++ language.



The STL containers are implemented for generic-purposes, and most game engines implement custom containers instead.


The std::string is useful to perform string operations, but in practice they are very limited at runtime.

The std::string_view class is more useful as it provide a read-only reference to a string.

It can be when a string is used as an input parameter.


The std::allocator class is intended to be used as a template argument for containers.

As it is not possible to store state data, the only practical use of a custom allocator based on std::allocator, is to override the allocation to another global allocator.


The algorithm header file provides many functions that are useful for doing iterations, sorting, searches, ...

If you implement custom containers, be sure to provide the begin and end methods to return iterators that can be used with the STL algorithms (and the range-based for loop).


Exception handling is not recommended in the runtime.

However, it can still be useful in a limited scope, such as when running scripts, or in the editor build of the runtime.

A custom type of exception can be defined by deriving from std::exception.


For debugging a new type of exception:

CPU Features Detection

It's useful to determine the CPU on the target system to run the most optimized code path, and provide specific functionality (such as SIMD).

On x86, the CPUID instruction can be used to get information about the supported features and the CPU type.


The __cpuid intrinsic can be used to generate the instruction.

This intrinsic stores the supported features and CPU information in the cpuInfo array.

void __cpuid(
   int cpuInfo[4],
   int function_id


You first call cpuid with function_id set to 0, to get the number of the IDs.

int cpuInfo[4];

__cpuid(cpuInfo, 0);

// Number if IDs
int idCount = cpuInfo[0];

if (idCount >= 1)
    // Detect the features

To obtain the vendor's name, you obtain three separate strings.

The vendor string is stored in EBX, EDX, and ECX in that order.

char vendorData[32] = {};
*reinterpret_cast<int*>(vendorData) = cpuInfo[1];
*reinterpret_cast<int*>(vendorData + 4) = cpuInfo[3];
*reinterpret_cast<int*>(vendorData + 8) = cpuInfo[2];

std::string vendor = vendorData;

You can then compare the vendor with predefined strings.

if (vendor == "GenuineIntel")
    IsIntel = true;
else if (vendor == "AuthenticAMD")
    IsAMD = true;

To detect the SIMD support, you use bit flags with the registers that contain the feature sets.

Here are the most important features to detect:

__cpuid(cpuInfo, 1);
int cpuInfo2 = cpuInfo[2];
int cpuInfo3 = cpuInfo[3];

// SSE
IsSSE2 = (cpuInfo3 & 25) != 0;

// SSE2
IsSSE2 = (cpuInfo3 & 26) != 0;

// SSE 3.x
IsSSE3 = ((cpuInfo2 & (1 << 0)) != 0);
IsSSE3x = ((cpuInfo2 & (1 << 9)) != 0);

// SSE 4.x
IsSSE41 = ((cpuInfo2 & (1 << 19)) != 0);
IsSSE42 = ((cpuInfo2 & (1 << 20)) != 0);

// AVX
IsAVX = ((cpuInfo2 & (1 << 28)) != 0);

// F16C (half-precision)
IsFP16C = ((cpuInfo2 & (1 << 29)) != 0);

// FMA3
IsFMA = ((cpuInfo2 & (1 << 12)) != 0);

If the number of IDs is higher than 7, you can check for extended features.

if (idCount >= 7)
    __cpuid(cpuInfo, 7);
    int cpuInfo1 = cpuInfo[1];

    // AVX2
    IsAVX2 = ((cpuInfo1 & (1 << 5)) != 0);

    // AVX-512
    IsAVX512 = ((cpuInfo1 & (1 << 16)) != 0);


The cpuid_info function returns a pointer to a i386_cpu_info_t structure.

i386_cpu_info_t * cpuid_info(void);

Byte Swapping

Most platform have either intrinsic functions or macro functions to perform byte swaps with integers.

They can be more efficient than custom functions.

Microsoft (CRT)

#include <stdlib.h>

unsigned short _byteswap_ushort(unsigned short val);
unsigned long _byteswap_ulong(unsigned long val);
unsigned __int64 _byteswap_uint64(unsigned __int64 val);

Microsoft (RTL)

#include <rtl.h>

NTSYSAPI ULONG RtlUlongByteSwap(ULONG Source);


#include <libkern/OSByteOrder.h>



#include <byteswap.h>



uint16_t __builtin_bswap16(uint16_t x);
uint32_t __builtin_bswap32(uint32_t x);
uint64_t __builtin_bswap64(uint64_t x);




On 64-bit Windows, memory allocations are 16-byte aligned by default.

On 32 bit Windows, memory allocations are 8-byte aligned by default.

Calling Convention

The standard is __fastcall, which passes values on the stack.

With SIMD types, the __vectorcall calling convention can pass up to six __m128 values as arguments to a function in SIMD registers.

Because of limitations with __vectorcall, it is recommended to only pass __m128 values by reference in constructors.


The Windows header define macros for min and max that can conflict with the standard functions in the std namespace.

error C2589: '::' : illegal token on right side of '::'

To prevent windows from defining these macros, declare the NOMINMAX symbol prior to including windows.h.

#define NOMINMAX
#include <windows.h>


Reduce the WinRT code to a minimum.

For example, the references to Windows::UI::CoreWindow and Platform::String.


Reduce the Objective-C and Coco-dependent code to a minimum.

For example, convert NSString to std::wstring.