Allocators

Category	Memory

Overview

When allocating a block of memory, there are three main concepts.

The size of the allocated memory

The lifetime of the memory

The usage of the memory

When each aspects is known, an custom allocator provides the most efficient strategy for memory allocations.

Lifetimes

It is useful to think of the lifetime of a block of memory as one these three kind of generations.

Permanent allocation: the memory is persistent during the program lifetime (for example, a singleton).

Transient allocation: the memory only persist in a defined cycle (for example, a frame in an update loop).

Temporary allocation: the memory is short lived (for example, a formatted string in a logger).

Allocators

Allocators work at low-level and perform operations on blocks of memory instead of objects.

Building custom allocators and writing allocation-aware software has a cost.

In long running programs, improved memory access is more important than faster allocation calls.

The scope of a memory allocator can be:

Global: let the system manages the memory

Local: manages an arena for a duration of time

There are two kind of allocators:

General: general purpose to cover every cases

Specialized: to be used in special cases

Allocator Properties

No allocator fits all purposes and each allocator has advantages and pitfalls.

Allocators have different properties that make them more efficient in specific contexts.

Memory Size (fixed/variable)

Allocation Sizes

Fragmentation

Wasted space

Performance

Thread-safety

Cache-locality

Default Allocators

The default allocators are general and global. There are several problems with their implementation.

❌As general purpose allocators, they must work in every cases (from small to large allocations), and they are not efficient.

❌As global allocators, they let the operating system manage the virtual pages of memory which can be very slow.

STL Allocator

In some cases, such as an external library, it could be necessary to use one of the STL containers. It's possible to implement a simple STL allocator to forward the allocations and deallocations to a custom allocator.

The STL allocator is represented by the std::allocator class.

The STL allocator is used by the STL containers and implemented as a template parameter.

There are several requirements when implementing a STL allocator.

For example, for a custom allocator called Allocator, the class is declared the following way:

template<typename T>
class Allocator

Declare the following aliases:

typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef T* pointer;
typedef const T* const_pointer;
typedef T& reference;
typedef const T& const_reference;
typedef T value_type;

Declare the following methods:

Allocator() throw();
Allocator(const Allocator& other) throw();
 
template<typename U>
Allocator(const Allocator<U>&) throw();

Allocator<T>& operator=(const Allocator& other);
 
template<typename U>
Allocator& operator=(const Allocator<U>& other);
 
pointer address(reference x) const;
const_pointer address(const_reference x) const;
 
pointer allocate(size_type n, const void* hint = 0);
void deallocate(void* p, size_type n);
 
template<typename U>
void destroy(U* p);
 
size_type max_size() const;
 
template<typename U>
struct rebind
{ 
  typedef Allocator<U> other; 
};

Common General Allocators

Doug Lea's malloc
- ❌not multi-thread friendly

ptmalloc
- ❌LPGL licence

jemalloc
- ❌general-purpose

tcmalloc
- ❌general-purpose

rpmalloc
- ❌64 KB granularity
- ❌calls platform-specific API directly

Emery Berger's Hoard

Custom Allocators

Not every allocator needs to be a general purpose allocator, and not every allocator needs to be thread-safe. Implementing different variations of allocators to be used in special cases will increase performances.

The main difficulty for a memory allocator is to keep the amount of metadata and the amount of processing time low.

To increase efficiency:

Most allocators do not track every single byte in memory. For example, a resolution of 16 bytes or more.

Instead of traversing memory on each allocation looking for an empty space of the appropriate size, the allocators can keep lists of free areas categorized by their approximate sizes.

Custom allocators resolve the pitfalls of the default allocators and provide additional benefits.

✔They are specialized allocators that are more efficient depending on the use case.

✔They reduce the number of allocations and handle virtual pages of memory more efficiently.

✔They reduce memory fragmentation by allocating within memory arenas.

Memory Area

Memory areas define where memory is being allocated from. The common areas are heap memory and stack memory.

On some platforms, additional areas can be accessible such as GPU memory.

Allocators are independent from the memory area.

With custom allocators, we can easily switch between areas to reduce fragmentation when needed.

Debugging and Profiling

With custom allocators, we can provide additional functionality for debugging and profiling.

memory tracking

memory marking

bounds checking

memory replay

Memory tracking

Keeps track of each allocation made during the lifetime of the application.

Memory tracking is used to find memory leaks, and inefficient allocations (large allocations, frequent allocations and deallocations).

Memory tracking provide several functionalities:

Number of allocations

Source of allocations (file name and line number)

Callstack of allocations (high overhead)

Memory tracking can be implemented using intrusive or external data.

Intrusive ❌Requires more memory. ❌Changes the memory layout.

External (using a container) ✔Do not change the memory layout.

Memory marking

Marks allocated memory and released memory with pattern values.

Memory marking is used to find dangling pointers (pointers that do not point to a valid object).

When implementing with specific values it can trigger a segmentation fault.

ℹOn Windows, the Visual Studio debug heap marks memory with specific pattern values

0xCCCCCCCC: uninitialized stack memory

0xCDCDCDCD: uninitialized heap memory

0xDDDDDDDD: freed heap memory

0xFEEEFEEE: freed heap memory (overrides 0xDDDDDDDD sometimes)

Bounds checking

Bounds checking is used to detect whether a variable is within some bounds before it is used.

There two possible implementations:

Intrusive: add a pattern value at the start and end of each allocation. During deallocation, we check if the pattern values was overridden, resulting in corrupted memory. ❌Requires more memory for the pattern values. ❌Changes the memory layout.

Virtual memory: use page access protection. Move allocations to the start or end of a page and restrict access to the surrounding pages. ❌Requires a lot of memory to allocate the surrounding pages.

Memory replay

We can record every allocation and deallocation and output the data to a storage or stream it over the network to allow an external tool to replay the allocations during the lifetime of the application.

New / Delete operators

The default implementation of the new and delete operators is to allocate and deallocate memory through the default global allocator (malloc and free).

To be able to allocate memory using custom allocators, we can overload the operators to pass the allocator and additional information such a the file name, line number...

⚠The delete operator cannot be called with additional parameters, so the operator delete is called instead.

⚠The operator delete does not call the destructor, so the destructor has to be called manually.

Allocator allocator;

T* ptr = new (allocator, __FILE__, __LINE__) T(0, 1, 2);

ptr->~T();
operator delete (ptr, allocator, __FILE__, __LINE__);

We can also override the new[] and delete[] operators.

❌Before calling operator delete[], we need to call the destructors of each element of the array manually, however we don't know how many instances were allocated as it is compiler-specific.

Allocator allocator;

T* ptr = new (allocator, __FILE__, __LINE__) T[10];

// We cannot call the destructors!
operator delete[] (ptr, allocator, __FILE__, __LINE__);

Because of these limitations, it is preferable to use macros instead of operator overloads to use custom allocators.

Macros

We can define two macros to replace the new and delete operators.

Allocator allocator;

T* ptr = NEW(T, allocator)(0, 1, 2);

DELETE(ptr, allocator);

NEW macro

The NEW macro is implemented in a single line. It allocates memory with the allocator and calls the placement new operator. The type parameter is added at the end to invoke the constructor with the argument specified after the macro.

#define NEW(type, allocator) new (allocator.Allocate(sizeof(type), __FILE__, __LINE__)) type

DELETE macro

As we need to manually call the destructor of the type, we implement a template Delete() function.

The DELETE macro calls the Delete() function with the deduced template argument.

The Delete() function calls the destructor and deallocates the memory with the allocator.

template <typename T>
void Delete(T* ptr, IAllocator& allocator, const char* file, int line)
{
    ptr->~T();
 
    allocator.Deallocate(ptr, file, line);
}

#define DELETE(ptr, allocator) Delete(ptr, allocator, __FILE__, __LINE__)

Global Allocators

A global allocator allocates block of memory from the main memory.

Multiple allocations and deallocations can create memory fragmentation.

A way to reduce fragmentation is to manage memory using local allocators.

✔Easy to use.

❌Cannot achieve great locality.

❌Cost of deallocations.

❌Create memory fragmentation.

An allocator only needs to perform two operations: allocate and deallocate.

Allocate

Allocates of a block of memory of a specified size, and return its address.

Deallocate

Deallocates of a block of memory that was previously allocated from its address.

Interface

A common interface is to allocate memory using a specified alignment, and deallocate without requiring the size of the memory block.

void* Allocate(std::size_t size, std::size_t alignment)
{
    ...
}

void Deallocate(void* address)
{
    ...
}

Local Allocators

A local allocator (or static allocator) reserves a region of memory (memory arena).

All allocations are performed inside this arena and the allocator cannot grow.

When a local allocator is released, all its allocations are released.

✔Free deallocations by releasing the whole memory arena.

✔Local allocators facilitate threading when distinct thread use separate memory arenas.

Synchronization can be avoided

Cache line contention is avoided

❌Requires assumption of the initialize size of the memory arena.

A local allocator performs allocations and deallocations and also handles its memory arena.

Create

Allocate a region of memory using a global allocator.

Destroy

Release the region of memory previously allocated.

Interface

A common interface is to create a local arena from an existing buffer, and destroy the arena by releasing all the memory at once.

void* Create(void* data, std::size_t size)
{
    ...
}

void Destroy()
{
    ...
}

Move operations

A move assignment is not as efficient as a copy.

Objects returned by value a constructed in place when they are moved.

A move operation has side-effects:

Locality

Prefetching

When using a local allocator, moving is done inside the memory arena.

General Allocator

A general allocator is typically implemented as a global allocator using variations of malloc and free.

❌Slow.

❌Fragmentation.

❌Wasted memory.

❌Requires synchronization primitive.

Usage

Temporary variable-sized allocations.

Use cases

File decompression.

Allocations in third-party libraries.

Linear Allocator

A linear allocator (or monotonic allocator) allocates contiguous blocks of memory.

The allocator contains a start pointer to the start of the memory arena and an offset pointer to the end of the last allocation.

To allocate memory, the offset pointer is moved forward to the allocated size.

When the allocator is released, the offset pointer is reset to the start pointer.

✔Easy to implement.

✔Supports any size and alignment.

✔Determinic runtime cost (pointer arithmetic).

✔No fragmentation.

✔No wasted space.

✔️Does not require any additional data.

✔Can be made thread-safe.

✔Cache-friendly.

❌Specific blocks of memory cannot be deallocated. The memory is released at once.

❌Can only be used with POD (the destructors cannot be called).

ℹLinear allocators can be backed with static memory.

Usage

Allocated once at initialization, and deallocated at shutdown.
- Fixed-size
- Single-threaded

Allocated for a short duration of time with few data structures.
- Fixed-size
- Thread-safe

Temporary allocations (allocated on the stack).

Use cases

Frame lifetime (render queue)

Event handling (input)

Tasks processing (jobs over one or two frames)

Structure

A start pointer to the start of the memory arena.

An offset pointer to the start of the free memory.

Allocation

Complexity: O(1)

Moves the free pointer forward.

Deallocation

Not supported.

Implementation

class LinearAllocator
{
	void* _data;
	size_t _size;
	size_t _offset;
	
	void Create(void* data, size_t size)
	{
	  _data = data;
	  _size = size
	  _offset = 0;
	}
	
	void Destroy()
	{
	  _offset = 0;
	}
	
	void* Allocate(size_t size)
	{
	  if ((_offset + size) > _size)
	  {
	    return nullptr;
	  }
	
	  void* ptr = (_data + _offset);
	  _offset += size;
	  return ptr;
	}
	
	void Dellocate(void* ptr)
	{
	}
};

Alignment

To take into account the alignment of the allocation, we need to:

Check if the memory is aligned at the current offset.

If it is not aligned, we calculate the required padding.

The allocated memory is calculated with the padding.

The offset is increase of the allocated size and the padding size.

Stack Allocator (without LIFO)

The stack allocator is a linear allocator that keeps track of each allocations to allow blocks of allocated memory to be deallocated.

One approach is to store a header before each allocation to store information about that allocation.

ℹDo not confuse the stack allocator with stack memory

✔Similar to linear allocator.

✔Allows deallocations.

❌Can only deallocate a block of memory, and everything that was allocated after it.

❌Cannot deallocate a specific block of memory, unless it is the last allocation.

⚠The LIFO principle is not enforced, when a block of memory is deallocated, everything that was allocated after it is also deallocated.

Usage

Variable-size

Use cases

Lifetime of a scene

Scene loading with deallocations in reverse-order

Structure

A start pointer to the start of the memory arena.

An offset pointer to the start of the free memory.

Allocation

Complexity: O(1)

Moves the free pointer forward.

Deallocation

Complexity: O(1)

Moves the offset pointer backwards.

Implementation

When the allocator is initialized, the offset is set to 0.

When the allocator is released, the offset is reset to 0.

class StackAllocator
{
  void* _data;
  size_t _size;
  uint32_t _offset;

  void Create(void* data, size_t size)
  {
    _data = data;
    _size = size
    _offset = 0;
  }

  void Destroy()
  {
    _offset = 0;
  }

  void* Allocate(size_t size)
  {
    if (_offset + size <= _size)
    {
      return nullptr;
    }

    void* ptr = (_data + _offset);
    _offset += size;
    return ptr;
  }

  void Deallocate(void* ptr)
  {
    Header* header = (Header*)((uintptr_t)ptr - sizeof(Header));
    _offset = (ptr - _data);
  }
};

Alignment

Structure

To take into account the alignment, we need to add some padding.

We keep track of the padding by storing a header before each allocation.

The header can store different kind of data depending on the implementation.

The size of the allocation.

The padding from the previous offset.

The previous offset.

The previous offset pointer.

ℹStoring this data as a uint32_t value instead of size_t or void* minimizes the overhead on 64-bit systems, assuming allocations are under 4 GB of memory.

Allocation

Offsets the size of the allocation with the size of the header (size_t).

Places a header before the memory block.

Moves the free pointer forward.

To store the header in-place:

We check if the padding is large enough to store the header in-place.

If the header does not fit, we extend the padding to the next alignment.

Deallocation

Reads the allocated size from the header stored before the allocated block of memory.

Moves the offset pointer backwards.

Implementation

Before each allocation, a Header structure is stored with the size of the allocation.

The size of each allocation takes into account the size of the structure.

class StackAllocator
{
  struct Header
  {
    uint32_t size;
  };

  void* _data;
  size_t _size;
  uint32_t _offset;

  void Create(uint8_t* data, size_t size)
  {
    _data = data;
    _size = size
    _offset = 0;
  }

  void Destroy()
  {
    _offset = 0;
  }

  void* Allocate(size_t size)
  {
    size_t allocationSize = (size + sizeof(Header));

    if (_offset + allocationSize <= _size)
    {
      return nullptr;
    }

    void* ptr = (_data + _offset);
    Header* header = (Header*)ptr;
    header->size = allocationSize;

    void* ptr = address + sizeof(Header);
    _offset += allocationSize;
    return ptr;
  }

  void Deallocate(void* ptr)
  {
    Header* header = (Header*)((uintptr_t)ptr - sizeof(Header));
    _offset -= header->size;
  }
};

Stack Allocator (with LIFO)

An improvement of the stack allocator is to allow the last block of allocated memory to be deallocated. This is the LIFO (last-in, first-out) principle.

The previous offset needs to be stored in the header.

Out of order deallocations can be prevented by storing the previous offset in the allocator as well.

During deallocation, we compare the previous offset from the header with the offset in the allocator, asserting that they are the same.

✔Allows deallocations in reverse-order.

✔Prevents out of order deallocations.

❌Requires an additional field in the header structure.

❌Must deallocate in reverse-order.

Structure

A start pointer to the start of the memory arena.

An offset pointer to the start of the free memory.

A previous offset pointer to the start of the free memory before the last allocation.

A header before each allocation that contains a previous offset.

Implementation

The previous offset is stored in the allocator.

The Header structure also stores the previous offset.

Out of order deallocation is prevented by checking the previous offset from the Header of the block to deallocate with the previous offset of the allocator.

struct StackAllocator
{
  struct Header
  {
    std::uint32_t size;
    std::uint32_t previousOffset;
  };

  void* _data;
  size_t _size;
  uint32_t _offset;
  uint32_t _previousOffset;

  void Create(void* data, size_t size)
  {
    _data = data;
    _size = size
    _offset = 0;
  }

  void Destroy()
  {
    _offset = 0;
  }

  void* Allocate(std::size_t size)
  {
    size_t allocationSize = size + sizeof(Header);

    if (_offset + allocationSize <= _size)
    {
      return nullptr;
    }

    void* ptr = (_data + _offset);
    Header* header = (Header*)ptr;
    header->size = allocationSize;
    header->previousOffset = _previousOffset;

    void* ptr = address + sizeof(Header);
    _previousOffset = _offset;
    _offset += allocationSize;
    return ptr;
  }

  void Deallocate(void* ptr)
  {
    Header* header = (Header*)((uintptr_t)ptr - sizeof(Header));
    uint32_t previousOffset = (_offset - header->size);
    if (previousOffset != _previousOffset)
    {
      return;
    }

    _offset = _previousOffset;
    _previousOffset = header->previousOffset;
  }
};

Scoped Stack Allocator

The scoped stack allocator works with a backend linear allocator.

The allocator stores a finalizer block of data to be able to call the destructor of the allocated objects before winding out the memory.

✅Supports POD and non-POD types.

Structure

A reference to a rewind point.

A singly-linked list of finalizers.

class ScopeStack
{
private:
  LinearAllocator& _allocator;
  void* _rewindPoint;
  Finalizer* _finalizers;
};

The finalizer structure contains a pointer to the next node and a function pointer to call the destructors on the allocated objects.

struct Finalizer
{
  Finalizer* next;
  void (*dtor)(void* ptr);
};

The function pointer stores the address of a templated function.

template <typename T>
void Destructor(void* ptr)
{
  static_cast<T*>(ptr)->~T();
}

Allocation

When a POD is allocated, we just forward the allocation to the backend allocator.

When a non-POD is allocated, we adjust the requested size to account for the finalizer structure and store a function pointer to the destructor, and set the head of the finalizer list to the new finalizer item.

Finalizer* AllocateWithFinalizer(size_t size)
{
  return (Finalizer*)_allocator.Allocate(size + AlignedSize(sizeof(Finalizer)));
}

void* ObjectFromFinalizer(Finalizer* finalizer)
{
  return (finalizer + AlignedSize(sizeof(Finalizer));
}

template <typename T>
T* Allocate()
{
  Finalizer* finalizer = AllocateWithFinalizer(sizeof(T));
  T* ptr  = new (ObjectFromFinalizer(finalizer)) T;

  finalizer->dtor = &Destructor<T>;
  finalizer->next = finalizer;
  _finalizers = finalizer;

  return ptr;
}

Deallocation

When a scope is released, the finalizer chain is iterated to call every destructor in reverse-order.

The deallocation is then forwarded to the backend allocator.

~ScopeStack()
{
  for (Finalizer* finalizer = _finalizers; ; finalizer = finalizer->next)
  {
    (*finalizer->dtor)(ObjectFromFinalizer(finalizer));
  }
  
  _allocator.Deallocate(_rewindPoint);
}

Double-Ended Stack Allocator

The double-ended stack allocator is a LIFO stack allocator that can allocate from both ends of the stack.

The allocator contains a start pointer to the top of the stack, an end pointer to the bottom of the stack, and corresponding offset pointers.

✔❌Similar to LIFO stack allocator.

✔The top and bottom of the stack can be used independently.

❌Requires explicit allocations and deallocations from the top or bottom of the stack.

Use cases

Level loading
- Permanent data allocated at the top.
- Temporary data allocated at the bottom.

Structure

A start pointer to the top of the memory arena (allocating from the top).

An start offset pointer to the current top of the stack (allocating from the top).

An end pointer to the top of the memory arena (allocating from the bottom).

An end offset pointer to the current top of the stack (allocating from the bottom).

A header before each allocation.

Allocation

Allocations are explicitly requested from the top or bottom of the stack.

Deallocation

Deallocations are also explicitly requested from the top or bottom of the stack.

Pool Allocator

The pool allocator divides its memory arena into chunks (or pools/slots) of identical size and keeps track of the free chunks.

When a block of memory is allocated, a pointer to a free chunk is returned.

When a block of memory is deallocated, the corresponding chunk is released and added to the list of free chunks.

✔Deterministic runtime cost (pointer exchange).

✔No fragmentation (can reuse free chunks).

✔️No wasted space.

✔️Does not require any additional data (free list).

✔Can be made thread-safe.

✔Allocations and deallocations can be performed in random order.

❌The size of allocations is fixed.

❌The size of allocations must be smaller than the size of a chunk.

❌Allocations are not contiguous in memory.

ℹThe linked list is not sorted. Its order its determined by the the allocations and deallocations.

Usage

Fixed-size.

Same lifetime.

Random order of allocations and deallocations.

Use cases

Object pools such as particles or projectiles.
- Maximum number of instances and fixed size for each particle.
- For example, for a maximum number of 256 particles, each particle having a size of 32 bytes, the pool allocator allocates 256*32 = 8192 bytes once divided into chunks of 32 bytes.

Structure

The allocator keep tracks of the free chunks in a list.

The number of chunks is determined by: arena size / chunk size.

A head pointer to the first free chunk.

A header at the beginning of each chunk stores a pointer to the next free chunk.

The list of free chunks is stored as a free list (or intrusive list).

A free list is a data structure that internally stores a linked list inside the allocated memory.

The list of free chunks is stored in-place without the need for an additional data structure.

⚠The size of a chunk must be at least the size of a node in the linked list.

Allocation

Complexity: O(1)

Pops the first free chunk of memory from the linked list (the first node).

Deallocation

Complexity: O(1)

Pushes the deallocated chunk as the first node in the linked list.

Implementation

The Node structure contains a pointer to the next node.

When the pool is created, all the chunks are pushed to the list of free nodes. The head node is set to the first chunk.

When the pool is destroyed, the same operation is performed.

When allocating memory, the pointer to the head node is the address of the free chunk of memory that is returned. The next node of that chunk becomes the new head node.

If the head node is null, then the pool has no free memory.

When deallocating memory, the pointer to the block of memory becomes the new head node with its next node set to the previous head node.

struct Pool
{
  struct Node
  {
    Node* next;
  };

  std::uint8_t* _data;
  std::size_t _size;
  std::size_t _chunkSize;
  Node* _headNode;

  void Create(std::uint8_t* data, std::size_t size, std::size_t chunkSize)
  {
    _data = data;
    _size = size
    _chunkSize = chunkSize;
    _headNode = nullptr;

    Destroy();
  }

  void Destroy()
  {
    std::size_t chunkCount = (_size / _chunkSize);

    for (auto index = 0; index < chunkCount; index++)
    {
      void* ptr = &_data[index * _chunkSize];
      Node* node = reinterpret_cast<Node*>(ptr);

      node->next = _headNode;
      _headNode = node;
    }
  }

  void* Allocate(std::size_t size)
  {
    Node* node = _headNode;
    if (node == nullptr)
    {
      return nullptr;
    }

    _headNode = _headNode->next;
    return reinterpret_cast<void*>(node);
  }

  void Deallocate(void* ptr)
  {
    Node* node = reinterpret_cast<Node*>(ptr);
    node->next = _headNode;
    _headNode = node;
  }
};

Chunk Allocator

The chunk allocator (or slab/multipool allocator) manages multiple pools. Each pool is divided in different chunk sizes.

When a block of memory is allocated, the allocator selects the appropriate pool with the smallest chunk size that matches the requested size.

✔Deterministic runtime cost (pointer exchange).

✔No fragmentation (can reuse free chunks).

✔The waste of memory can be controlled.

✔️Does not require any additional data (free list).

✔Can be made thread-safe.

✔Allocations and deallocations can be performed in random order.

❌Allocations are not contiguous in memory.

❌The efficiency of the allocator depends on the correct setup of pools and chunk sizes.

ℹA typical implementation is to use exponential sizes power-of-two for the chunks.

Usage

Repeatedly allocate and deallocate blocks of memory of distinct sizes.

Different sets of fixed-size allocations.

Thread-safe,

Use cases

Scene streaming.

Free List Allocator

The free list allocator allows allocations and deallocations in any order.

✔Can allocate and deallocate in any order.

❌Lower performances.

❌Poor memory locality.

❌Contension.

There are two common implementations: a linked list and a red black tree.

Linked List Structure

The allocator stores the start address of each free contiguous block in memory and its size.

The free blocks of memory are sorted by address.

❌Allocations and deallocations iterate the linked list.

Allocation

Complexity: O(N)

Searches in the linked list for a block where the data can fit.

There are two strategies to find a block of memory:

first-fit: finds the first block where the memory fits.

best-fit: finds the smallest block where the memory fits.

Removes the block from the linked list.

Places an allocation header before the data.

Deallocation

Complexity: O(N)

Reads the allocated size from the header.

Inserts the block into the sorted linked list.

Merges the contiguous memory blocks together (coalescence).

Red-Black Tree Structure

A red-black tree is a binary search tree where every nodes is either red or black to keep the tree balanced during insertions and deletions.

Each node stores a bit to indicate whether it is red or black.

Each node stores its parent node and its previous and next sibling nodes.

The tree data is stored inside the free memory blocks.

✔Allocations and deallocations do not iterate.

✔Best-fit algorithm for allocations.

✔Reduced fragmentation

✔Increased performances.

❌A sorted doubly linked list is required for coalescence operations.

Buddy Allocator

The buddy allocator divides memory into partitions. Each block is subdivided into two smaller blocks.

✅No fragmentation.

✅Can compact memory.

❌Internal fragmentation because of the fixed-size memory blocks.

Structure

The buddy allocator is typically implemented using a binary tree to represent used or unused split memory blocks.

Ring Allocator

The ring allocator behaves like a queue if allocations are released in the same order as they were created. This is the FIFO (first-in, first-out) principle.

There is two behaviors when the queue is full:

Overwrite old data.

Fail allocating new data.

Structure

A start pointer to the beginning of the memory.

An end pointer to the end of the memory.

Allocation

If the allocator is not full, increase the start pointer.

If the allocator is full, increase the start and end pointers.

Dellocation

Increase the end pointer.

Growing Allocator

A growing allocator is used as a backend for other allocators.

It allocates memory from the system when needed.

A growing allocator using virtual memory allocates at least at the granularity of a page.

The allocator can be setup to grow at a higher granularity to reduce the number of switch to kernel mode.

When memory is deallocated, there two possible behaviors:

Release the physical memory: better for large allocations because we want to recover the memory for other allocations.

Keep the physical memory allocated: better for small allocations to avoid performances issues with allocations around page boundaries.

A possible implementation, is to delegate the release of the physical memory to the user that can request the pages of virtual memory to be released at the appropriate time.

Sequential Allocator

The sequential allocator is an allocator that uses other allocators as backend allocators and selects the allocators sequentially depending on their available memory and the requested allocation size.

For example, a typically implementation is to have two backend allocators. The first allocator is a static allocator with a fixed memory arena, and the second allocator is a growable allocator using the heap. The static allocator until it has no available memory, in which case the growable allocator is used.

Another alternative, is to implement an allocator with multiple backends, and determine which backend to use depending on the size of the allocations.

TLS Allocator

The Thread Local Storage (TLS) is a dedicated storage area that can only be accessed by one thread.

TLS access does not require any locks or atomics.

The TLS allocator is a lock-less allocator that is dedicated to a thread.

It's implemented with a TLS variable and a backend allocator. The backend allocator is typically a stack allocator.

✔Lock-less allocator.

✔Can be used implicitly instead of a regular stack allocator.

The TempAllocatorScope structure is defined to make use of the TLS allocator, and the memory arena is reset in the destructor.

template<typename ALLOCATOR>
thread_local ALLOCATOR* tlsLinearAllocator = nullptr;

template<typename ALLOCATOR>
struct TempAllocatorScope
{
private:
	std::size_t _size;

public:
	TempAllocatorScope() :
		_size(tlsLinearAllocator->GetSize())
	{}

	~TempAllocatorScope()
	{
		tlsLinearAllocator->Reset(m_Space);
	}
};

Small Allocator

template<SizeType InlineCapacity = 0>
class SmallAllocator
{
private:
  u8 StackMemory[InlineCapacity] _stack;
  IAllocator* _allocator;

public:
  SmallAllocator(IAllocator* allocator) { ... }

  void Allocate()
  {
    if !_stack.Allocate()
      _allocator->Allocate();
  }

  ...
};

Interfaces

There are three common interfaces to define allocators.

As a set of functions

As a template parameter

Aa an object reference

Functions

Stateful functions such as malloc() and free().

❌Does not support allocator objects.

Template

A template parameter such as the STL allocator.

✔The allocator type is available by the compiler.

❌A user of allocators must be a template class.

❌Different allocator usages generate different C++ types of client objects.

Object

A pure abstract class passed by reference.

✔A user of allocators does not need to be a template class.

✔Different allocator usages do not affect the C++ types of client objects.

❌A user must keep a reference to an allocator address.

❌Allocators must be accessed through virtual functions.

Implementations

Template

Allocator

Standard allocator used as a template parameter.

The allocator is a concrete type with no base class.

class Allocator
{
public:
    void* Allocate(std::size_t size)
    {
        return std::malloc(size);
    }

    void Deallocate(void* address)
    {
        return std::free(address);
    }
};

Usage

template <typename Allocator>
class Client
{
private:
    Allocator _allocator;

public:
    Client(Allocator& allocator) : _allocator(allocator)
    {
    ...
    }
};

void main()
{
    auto allocator = Allocator();
    auto client = Client<Allocator>(allocator);
    ...
}

Object

The allocator is used as a reference.

Allocator

The allocator is defined as a pure abstract class.

class IAllocator
{
protected:
    IAllocator() = default;

public:
    virtual void* Allocate(std::size_t size) = delete;

    virtual void Deallocate(void* address) = delete;
};

Implement an allocator as a derived class.

class Allocator : public IAllocator
{
public:
    inline void* Allocate(std::size_t size) override
    {
        return std::malloc(size);
    }

    inline void Deallocate(void* address) override
    {
        return std::free(address);
    }
};

Usage

class Client
{
private:
    IAllocator _allocator;

public:
    Client(IAllocator& allocator) : _allocator(allocator)
    {
    ...
    }
};

void main()
{
    auto allocator = Allocator();
    auto client = Client(allocator);
    ...
}

Local allocator with backend allocator

Allocator

The local allocator allocates an arena of memory at initialization from a backend allocator and releases the memory at destruction.

The Allocate() and Deallocate() methods can keep track of allocations using different implementations such as a list of allocated blocks or a memory header injected before the allocated data.

class LocalAllocator
{
private:
    IAllocator _allocator;
    void* _data;
    std::size_t _size;

public:
    LocalAllocator(IAllocator& allocator, std::size_t size) :
        _allocator(allocator),
    {
        _data = _allocator.Allocate(size)
    }

    ~LocalAllocator()
    {
        _allocator.Deallocate(_data)
    }

    void* Allocate(std::size_t size)
    {
        ...
    }

    void Deallocate(void* address)
    {
        ...
    }
};

Usage

ℹDeleting the arrays is not necessary as the local allocator will release the memory arena when going out of scope.

void main()
{
    auto globalAllocator = Allocator();
    auto localAllocator = LocalAllocator(globalAllocator, 2048);

    int* intValues = static_cast<int*>localAllocator.Allocate(sizeof<int> * 10)
    float* floatValues = static_cast<float*>localAllocator.Allocate(sizeof<float> * 10)
}

Local allocator with stack memory

Allocator

The local allocator creates an arena of memory at initialization from a backing memory.

class LocalAllocator
{
private:
  void* _data;
  std::size_t _size;

public:
  LocalAllocator(void* data, std::size_t size) :
    _data(data),
    _size(size),
  {
  }

  ~LocalAllocator()
  {
  }

  void* Allocate(std::size_t size)
  {
    ...
  }

  void Deallocate(void* address)
  {
    ...
  }
};

Usage

void main()
{
    std::uint8*_t buffer[256];
    auto localAllocator = LocalAllocator(buffer, sizeof(buffer));

    int* intValues = static_cast<int*>localAllocator.Allocate(sizeof<int> * 10)
    float* floatValues = static_cast<float*>localAllocator.Allocate(sizeof<float> * 10)
}