<?xml version="1.0" encoding="UTF-8" standalone="no"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>ChapterΒ 33.Β Allocators</title><meta name="generator" content="DocBook XSL Stylesheets V1.75.2" /><meta name="keywords" content=" ISO C++ , library " /><link rel="home" href="../spine.html" title="The GNU C++ Library Documentation" /><link rel="up" href="extensions.html" title="PartΒ XII.Β Extensions" /><link rel="prev" href="bk01pt12ch32s07.html" title="Diagnostics" /><link rel="next" href="bitmap_allocator.html" title="bitmap_allocator" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">ChapterΒ 33.Β Allocators</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="bk01pt12ch32s07.html">Prev</a>Β </td><th width="60%" align="center">PartΒ XII.ΒExtensions</th><td width="20%" align="right">Β <a accesskey="n" href="bitmap_allocator.html">Next</a></td></tr></table><hr /></div><div class="chapter" title="ChapterΒ 33.Β Allocators"><div class="titlepage"><div><div><h2 class="title"><a id="manual.ext.allocator"></a>ChapterΒ 33.Β Allocators</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="ext_allocators.html#manual.ext.allocator.mt">mt_allocator</a></span></dt><dd><dl><dt><span class="sect2"><a href="ext_allocators.html#allocator.mt.intro">Intro</a></span></dt><dt><span class="sect2"><a href="ext_allocators.html#allocator.mt.design_issues">Design Issues</a></span></dt><dt><span class="sect2"><a href="ext_allocators.html#allocator.mt.impl">Implementation</a></span></dt><dt><span class="sect2"><a href="ext_allocators.html#allocator.mt.example_single">Single Thread Example</a></span></dt><dt><span class="sect2"><a href="ext_allocators.html#allocator.mt.example_multi">Multiple Thread Example</a></span></dt></dl></dd><dt><span class="sect1"><a href="bitmap_allocator.html">bitmap_allocator</a></span></dt><dd><dl><dt><span class="sect2"><a href="bitmap_allocator.html#allocator.bitmap.design">Design</a></span></dt><dt><span class="sect2"><a href="bitmap_allocator.html#allocator.bitmap.impl">Implementation</a></span></dt></dl></dd></dl></div><div class="sect1" title="mt_allocator"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="manual.ext.allocator.mt"></a>mt_allocator</h2></div></div></div><p></p><div class="sect2" title="Intro"><div class="titlepage"><div><div><h3 class="title"><a id="allocator.mt.intro"></a>Intro</h3></div></div></div><p>The mt allocator [hereinafter referred to simply as "the allocator"]is a fixed size (power of two) allocator that was initiallydeveloped specifically to suit the needs of multi threadedapplications [hereinafter referred to as an MT application]. Overtime the allocator has evolved and been improved in many ways, inparticular it now also does a good job in single threadedapplications [hereinafter referred to as a ST application]. (Note:In this document, when referring to single threaded applicationsthis also includes applications that are compiled with gcc withoutthread support enabled. This is accomplished using ifdef's on__GTHREADS). This allocator is tunable, very flexible, and capableof high-performance.</p><p>The aim of this document is to describe - from an application point ofview - the "inner workings" of the allocator.</p></div><div class="sect2" title="Design Issues"><div class="titlepage"><div><div><h3 class="title"><a id="allocator.mt.design_issues"></a>Design Issues</h3></div></div></div><div class="sect3" title="Overview"><div class="titlepage"><div><div><h4 class="title"><a id="allocator.mt.overview"></a>Overview</h4></div></div></div><p> There are three general components to the allocator: a datumdescribing the characteristics of the memory pool, a policy classcontaining this pool that links instantiation types to common orindividual pools, and a class inheriting from the policy class that isthe actual allocator.</p><p>The datum describing pools characteristics is</p><pre class="programlisting">template<bool _Thread>class __pool</pre><p> This class is parametrized on thread support, and is explicitlyspecialized for both multiple threads (with <code class="code">bool==true</code>)and single threads (via <code class="code">bool==false</code>.) It is possible touse a custom pool datum instead of the default class that is provided.</p><p> There are two distinct policy classes, each of which can be usedwith either type of underlying pool datum.</p><pre class="programlisting">template<bool _Thread>struct __common_pool_policytemplate<typename _Tp, bool _Thread>struct __per_type_pool_policy</pre><p> The first policy, <code class="code">__common_pool_policy</code>, implements acommon pool. This means that allocators that are instantiated withdifferent types, say <code class="code">char</code> and <code class="code">long</code> will bothuse the same pool. This is the default policy.</p><p> The second policy, <code class="code">__per_type_pool_policy</code>, implementsa separate pool for each instantiating type. Thus, <code class="code">char</code>and <code class="code">long</code> will use separate pools. This allows per-typetuning, for instance.</p><p> Putting this all together, the actual allocator class is</p><pre class="programlisting">template<typename _Tp, typename _Poolp = __default_policy>class __mt_alloc : public __mt_alloc_base<_Tp>, _Poolp</pre><p> This class has the interface required for standard library allocatorclasses, namely member functions <code class="code">allocate</code> and<code class="code">deallocate</code>, plus others.</p></div></div><div class="sect2" title="Implementation"><div class="titlepage"><div><div><h3 class="title"><a id="allocator.mt.impl"></a>Implementation</h3></div></div></div><div class="sect3" title="Tunable Parameters"><div class="titlepage"><div><div><h4 class="title"><a id="allocator.mt.tune"></a>Tunable Parameters</h4></div></div></div><p>Certain allocation parameters can be modified, or tuned. Thereexists a nested <code class="code">struct __pool_base::_Tune</code> that contains allthese parameters, which include settings for</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>Alignment</p></li><li class="listitem"><p>Maximum bytes before calling <code class="code">::operator new</code> directly</p></li><li class="listitem"><p>Minimum bytes</p></li><li class="listitem"><p>Size of underlying global allocations</p></li><li class="listitem"><p>Maximum number of supported threads</p></li><li class="listitem"><p>Migration of deallocations to the global free list</p></li><li class="listitem"><p>Shunt for global <code class="code">new</code> and <code class="code">delete</code></p></li></ul></div><p>Adjusting parameters for a given instance of an allocator can onlyhappen before any allocations take place, when the allocator itself isinitialized. For instance:</p><pre class="programlisting">#include <ext/mt_allocator.h>struct pod{int i;int j;};int main(){typedef pod value_type;typedef __gnu_cxx::__mt_alloc<value_type> allocator_type;typedef __gnu_cxx::__pool_base::_Tune tune_type;tune_type t_default;tune_type t_opt(16, 5120, 32, 5120, 20, 10, false);tune_type t_single(16, 5120, 32, 5120, 1, 10, false);tune_type t;t = allocator_type::_M_get_options();allocator_type::_M_set_options(t_opt);t = allocator_type::_M_get_options();allocator_type a;allocator_type::pointer p1 = a.allocate(128);allocator_type::pointer p2 = a.allocate(5128);a.deallocate(p1, 128);a.deallocate(p2, 5128);return 0;}</pre></div><div class="sect3" title="Initialization"><div class="titlepage"><div><div><h4 class="title"><a id="allocator.mt.init"></a>Initialization</h4></div></div></div><p>The static variables (pointers to freelists, tuning parameters etc)are initialized as above, or are set to the global defaults.</p><p>The very first allocate() call will always call the_S_initialize_once() function. In order to make sure that thisfunction is called exactly once we make use of a __gthread_once callin MT applications and check a static bool (_S_init) in STapplications.</p><p>The _S_initialize() function:- If the GLIBCXX_FORCE_NEW environment variable is set, it sets the bool_S_force_new to true and then returns. This will cause subsequent calls toallocate() to return memory directly from a new() call, and deallocate willonly do a delete() call.</p><p>- If the GLIBCXX_FORCE_NEW environment variable is not set, both ST and MTapplications will:- Calculate the number of bins needed. A bin is a specific power of two sizeof bytes. I.e., by default the allocator will deal with requests of up to128 bytes (or whatever the value of _S_max_bytes is when _S_init() iscalled). This means that there will be bins of the following sizes(in bytes): 1, 2, 4, 8, 16, 32, 64, 128.- Create the _S_binmap array. All requests are rounded up to the next"large enough" bin. I.e., a request for 29 bytes will cause a block fromthe "32 byte bin" to be returned to the application. The purpose of_S_binmap is to speed up the process of finding out which bin to use.I.e., the value of _S_binmap[ 29 ] is initialized to 5 (bin 5 = 32 bytes).</p><p>- Create the _S_bin array. This array consists of bin_records. There will beas many bin_records in this array as the number of bins that we calculatedearlier. I.e., if _S_max_bytes = 128 there will be 8 entries.Each bin_record is then initialized:- bin_record->first = An array of pointers to block_records. There will beas many block_records pointers as there are maximum number of threads(in a ST application there is only 1 thread, in a MT application thereare _S_max_threads).This holds the pointer to the first free block for each thread in thisbin. I.e., if we would like to know where the first free block of size 32for thread number 3 is we would look this up by: _S_bin[ 5 ].first[ 3 ]The above created block_record pointers members are now initialized totheir initial values. I.e. _S_bin[ n ].first[ n ] = NULL;</p><p>- Additionally a MT application will:- Create a list of free thread id's. The pointer to the first entryis stored in _S_thread_freelist_first. The reason for this approach isthat the __gthread_self() call will not return a value that corresponds tothe maximum number of threads allowed but rather a process id number orsomething else. So what we do is that we create a list of thread_records.This list is _S_max_threads long and each entry holds a size_t thread_idwhich is initialized to 1, 2, 3, 4, 5 and so on up to _S_max_threads.Each time a thread calls allocate() or deallocate() we call_S_get_thread_id() which looks at the value of _S_thread_key which is athread local storage pointer. If this is NULL we know that this is a newlycreated thread and we pop the first entry from this list and saves thepointer to this record in the _S_thread_key variable. The next timewe will get the pointer to the thread_record back and we use thethread_record->thread_id as identification. I.e., the first thread thatcalls allocate will get the first record in this list and thus be threadnumber 1 and will then find the pointer to its first free 32 byte blockin _S_bin[ 5 ].first[ 1 ]When we create the _S_thread_key we also define a destructor(_S_thread_key_destr) which means that when the thread dies, thisthread_record is returned to the front of this list and the thread idcan then be reused if a new thread is created.This list is protected by a mutex (_S_thread_freelist_mutex) which is onlylocked when records are removed or added to the list.</p><p>- Initialize the free and used counters of each bin_record:- bin_record->free = An array of size_t. This keeps track of the numberof blocks on a specific thread's freelist in each bin. I.e., if a threadhas 12 32-byte blocks on it's freelists and allocates one of these, thiscounter would be decreased to 11.- bin_record->used = An array of size_t. This keeps track of the numberof blocks currently in use of this size by this thread. I.e., if a threadhas made 678 requests (and no deallocations...) of 32-byte blocks thiscounter will read 678.The above created arrays are now initialized with their initial values.I.e. _S_bin[ n ].free[ n ] = 0;</p><p>- Initialize the mutex of each bin_record: The bin_record->mutexis used to protect the global freelist. This concept of a globalfreelist is explained in more detail in the section "A multithreaded example", but basically this mutex is locked whenever ablock of memory is retrieved or returned to the global freelistfor this specific bin. This only occurs when a number of blocksare grabbed from the global list to a thread specific list or whena thread decides to return some blocks to the global freelist.</p></div><div class="sect3" title="Deallocation Notes"><div class="titlepage"><div><div><h4 class="title"><a id="allocator.mt.deallocation"></a>Deallocation Notes</h4></div></div></div><p> Notes about deallocation. This allocator does not explicitlyrelease memory. Because of this, memory debugging programs likevalgrind or purify may notice leaks: sorry about thisinconvenience. Operating systems will reclaim allocated memory atprogram termination anyway. If sidestepping this kind of noise isdesired, there are three options: use an allocator, like<code class="code">new_allocator</code> that releases memory while debugging, useGLIBCXX_FORCE_NEW to bypass the allocator's internal pools, or use acustom pool datum that releases resources on destruction.</p><p>On systems with the function <code class="code">__cxa_atexit</code>, theallocator can be forced to free all memory allocated before programtermination with the member function<code class="code">__pool_type::_M_destroy</code>. However, because this memberfunction relies on the precise and exactly-conforming ordering ofstatic destructors, including those of a static local<code class="code">__pool</code> object, it should not be used, ever, on systemsthat don't have the necessary underlying support. In addition, inpractice, forcing deallocation can be tricky, as it requires the<code class="code">__pool</code> object to be fully-constructed before the objectthat uses it is fully constructed. For most (but not all) STLcontainers, this works, as an instance of the allocator is constructedas part of a container's constructor. However, this assumption isimplementation-specific, and subject to change. For an example of apool that frees memory, see the following<a class="ulink" href="http://gcc.gnu.org/viewcvs/trunk/libstdc++-v3/testsuite/ext/mt_allocator/deallocate_local-6.cc?view=markup" target="_top">example.</a></p></div></div><div class="sect2" title="Single Thread Example"><div class="titlepage"><div><div><h3 class="title"><a id="allocator.mt.example_single"></a>Single Thread Example</h3></div></div></div><p>Let's start by describing how the data on a freelist is laid out in memory.This is the first two blocks in freelist for thread id 3 in bin 3 (8 bytes):</p><pre class="programlisting">+----------------+| next* ---------|--+ (_S_bin[ 3 ].first[ 3 ] points here)| | || | || | |+----------------+ || thread_id = 3 | || | || | || | |+----------------+ || DATA | | (A pointer to here is what is returned to the| | | the application when needed)| | || | || | || | || | || | |+----------------+ |+----------------+ || next* |<-+ (If next == NULL it's the last one on the list)| || || |+----------------+| thread_id = 3 || || || |+----------------+| DATA || || || || || || || |+----------------+</pre><p>With this in mind we simplify things a bit for a while and say that there isonly one thread (a ST application). In this case all operations are made towhat is referred to as the global pool - thread id 0 (No thread may beassigned this id since they span from 1 to _S_max_threads in a MT application).</p><p>When the application requests memory (calling allocate()) we first look at therequested size and if this is > _S_max_bytes we call new() directly and return.</p><p>If the requested size is within limits we start by finding out from whichbin we should serve this request by looking in _S_binmap.</p><p>A quick look at _S_bin[ bin ].first[ 0 ] tells us if there are any blocks ofthis size on the freelist (0). If this is not NULL - fine, just remove theblock that _S_bin[ bin ].first[ 0 ] points to from the list,update _S_bin[ bin ].first[ 0 ] and return a pointer to that blocks data.</p><p>If the freelist is empty (the pointer is NULL) we must get memory from thesystem and build us a freelist within this memory. All requests for new memoryis made in chunks of _S_chunk_size. Knowing the size of a block_record andthe bytes that this bin stores we then calculate how many blocks we can createwithin this chunk, build the list, remove the first block, update the pointer(_S_bin[ bin ].first[ 0 ]) and return a pointer to that blocks data.</p><p>Deallocation is equally simple; the pointer is casted back to a block_recordpointer, lookup which bin to use based on the size, add the block to the frontof the global freelist and update the pointer as needed(_S_bin[ bin ].first[ 0 ]).</p><p>The decision to add deallocated blocks to the front of the freelist was madeafter a set of performance measurements that showed that this is roughly 10%faster than maintaining a set of "last pointers" as well.</p></div><div class="sect2" title="Multiple Thread Example"><div class="titlepage"><div><div><h3 class="title"><a id="allocator.mt.example_multi"></a>Multiple Thread Example</h3></div></div></div><p>In the ST example we never used the thread_id variable present in each block.Let's start by explaining the purpose of this in a MT application.</p><p>The concept of "ownership" was introduced since many MT applicationsallocate and deallocate memory to shared containers from differentthreads (such as a cache shared amongst all threads). This introducesa problem if the allocator only returns memory to the current threadsfreelist (I.e., there might be one thread doing all the allocation andthus obtaining ever more memory from the system and another threadthat is getting a longer and longer freelist - this will in the endconsume all available memory).</p><p>Each time a block is moved from the global list (where ownership isirrelevant), to a threads freelist (or when a new freelist is builtfrom a chunk directly onto a threads freelist or when a deallocationoccurs on a block which was not allocated by the same thread id as theone doing the deallocation) the thread id is set to the current one.</p><p>What's the use? Well, when a deallocation occurs we can now look atthe thread id and find out if it was allocated by another thread idand decrease the used counter of that thread instead, thus keeping thefree and used counters correct. And keeping the free and used counterscorrects is very important since the relationship between these twovariables decides if memory should be returned to the global pool ornot when a deallocation occurs.</p><p>When the application requests memory (calling allocate()) we firstlook at the requested size and if this is >_S_max_bytes we call new()directly and return.</p><p>If the requested size is within limits we start by finding out from whichbin we should serve this request by looking in _S_binmap.</p><p>A call to _S_get_thread_id() returns the thread id for the calling thread(and if no value has been set in _S_thread_key, a new id is assigned andreturned).</p><p>A quick look at _S_bin[ bin ].first[ thread_id ] tells us if there areany blocks of this size on the current threads freelist. If this isnot NULL - fine, just remove the block that _S_bin[ bin ].first[thread_id ] points to from the list, update _S_bin[ bin ].first[thread_id ], update the free and used counters and return a pointer tothat blocks data.</p><p>If the freelist is empty (the pointer is NULL) we start by looking atthe global freelist (0). If there are blocks available on the globalfreelist we lock this bins mutex and move up to block_count (thenumber of blocks of this bins size that will fit into a _S_chunk_size)or until end of list - whatever comes first - to the current threadsfreelist and at the same time change the thread_id ownership andupdate the counters and pointers. When the bins mutex has beenunlocked, we remove the block that _S_bin[ bin ].first[ thread_id ]points to from the list, update _S_bin[ bin ].first[ thread_id ],update the free and used counters, and return a pointer to that blocksdata.</p><p>The reason that the number of blocks moved to the current threadsfreelist is limited to block_count is to minimize the chance that asubsequent deallocate() call will return the excess blocks to theglobal freelist (based on the _S_freelist_headroom calculation, seebelow).</p><p>However if there isn't any memory on the global pool we need to getmemory from the system - this is done in exactly the same way as in asingle threaded application with one major difference; the list builtin the newly allocated memory (of _S_chunk_size size) is added to thecurrent threads freelist instead of to the global.</p><p>The basic process of a deallocation call is simple: always add theblock to the front of the current threads freelist and update thecounters and pointers (as described earlier with the specific check ofownership that causes the used counter of the thread that originallyallocated the block to be decreased instead of the current threadscounter).</p><p>And here comes the free and used counters to service. Each time adeallocation() call is made, the length of the current threadsfreelist is compared to the amount memory in use by this thread.</p><p>Let's go back to the example of an application that has one threadthat does all the allocations and one that deallocates. Both thesethreads use say 516 32-byte blocks that was allocated during threadcreation for example. Their used counters will both say 516 at thispoint. The allocation thread now grabs 1000 32-byte blocks and putsthem in a shared container. The used counter for this thread is now1516.</p><p>The deallocation thread now deallocates 500 of these blocks. For eachdeallocation made the used counter of the allocating thread isdecreased and the freelist of the deallocation thread gets longer andlonger. But the calculation made in deallocate() will limit the lengthof the freelist in the deallocation thread to _S_freelist_headroom %of it's used counter. In this case, when the freelist (given that the_S_freelist_headroom is at it's default value of 10%) exceeds 52(516/10) blocks will be returned to the global pool where theallocating thread may pick them up and reuse them.</p><p>In order to reduce lock contention (since this requires this binsmutex to be locked) this operation is also made in chunks of blocks(just like when chunks of blocks are moved from the global freelist toa threads freelist mentioned above). The "formula" used can probablybe improved to further reduce the risk of blocks being "bounced backand forth" between freelists.</p></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="bk01pt12ch32s07.html">Prev</a>Β </td><td width="20%" align="center"><a accesskey="u" href="extensions.html">Up</a></td><td width="40%" align="right">Β <a accesskey="n" href="bitmap_allocator.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">DiagnosticsΒ </td><td width="20%" align="center"><a accesskey="h" href="../spine.html">Home</a></td><td width="40%" align="right" valign="top">Β bitmap_allocator</td></tr></table></div></body></html>