<sect1 id="manual.ext.allocator.bitmap" xreflabel="mt allocator"><?dbhtml filename="bitmap_allocator.html"?><sect1info><keywordset><keyword>ISO C++</keyword><keyword>allocator</keyword></keywordset></sect1info><title>bitmap_allocator</title><para></para><sect2 id="allocator.bitmap.design" xreflabel="allocator.bitmap.design"><title>Design</title><para>As this name suggests, this allocator uses a bit-map to keep trackof the used and unused memory locations for it's book-keepingpurposes.</para><para>This allocator will make use of 1 single bit to keep track ofwhether it has been allocated or not. A bit 1 indicates free,while 0 indicates allocated. This has been done so that you caneasily check a collection of bits for a free block. This kind ofBitmapped strategy works best for single object allocations, andwith the STL type parameterized allocators, we do not need tochoose any size for the block which will be represented by asingle bit. This will be the size of the parameter around whichthe allocator has been parameterized. Thus, close to optimalperformance will result. Hence, this should be used for node basedcontainers which call the allocate function with an argument of 1.</para><para>The bitmapped allocator's internal pool is exponentially growing.Meaning that internally, the blocks acquired from the Free ListStore will double every time the bitmapped allocator runs out ofmemory.</para><para>The macro <literal>__GTHREADS</literal> decides whether to useMutex Protection around every allocation/deallocation. The stateof the macro is picked up automatically from the gthr abstractionlayer.</para></sect2><sect2 id="allocator.bitmap.impl" xreflabel="allocator.bitmap.impl"><title>Implementation</title><sect3 id="bitmap.impl.free_list_store" xreflabel="Free List Store"><title>Free List Store</title><para>The Free List Store (referred to as FLS for the remaining part of thisdocument) is the Global memory pool that is shared by all instances ofthe bitmapped allocator instantiated for any type. This maintains asorted order of all free memory blocks given back to it by thebitmapped allocator, and is also responsible for giving memory to thebitmapped allocator when it asks for more.</para><para>Internally, there is a Free List threshold which indicates theMaximum number of free lists that the FLS can hold internally(cache). Currently, this value is set at 64. So, if there aremore than 64 free lists coming in, then some of them will be givenback to the OS using operator delete so that at any given time theFree List's size does not exceed 64 entries. This is done becausea Binary Search is used to locate an entry in a free list when arequest for memory comes along. Thus, the run-time complexity ofthe search would go up given an increasing size, for 64 entrieshowever, lg(64) == 6 comparisons are enough to locate the correctfree list if it exists.</para><para>Suppose the free list size has reached it's threshold, then thelargest block from among those in the list and the new block willbe selected and given back to the OS. This is done because itreduces external fragmentation, and allows the OS to use thelarger blocks later in an orderly fashion, possibly merging themlater. Also, on some systems, large blocks are obtained via callsto mmap, so giving them back to free system resources becomes mostimportant.</para><para>The function _S_should_i_give decides the policy that determineswhether the current block of memory should be given to theallocator for the request that it has made. That's because we maynot always have exact fits for the memory size that the allocatorrequests. We do this mainly to prevent external fragmentation atthe cost of a little internal fragmentation. Now, the value ofthis internal fragmentation has to be decided by this function. Ican see 3 possibilities right now. Please add more as and when youfind better strategies.</para><orderedlist><listitem><para>Equal size check. Return true only when the 2 blocks are of equalsize.</para></listitem><listitem><para>Difference Threshold: Return true only when the _block_size isgreater than or equal to the _required_size, and if the _BS is > _RSby a difference of less than some THRESHOLD value, then return true,else return false. </para></listitem><listitem><para>Percentage Threshold. Return true only when the _block_size isgreater than or equal to the _required_size, and if the _BS is > _RSby a percentage of less than some THRESHOLD value, then return true,else return false.</para></listitem></orderedlist><para>Currently, (3) is being used with a value of 36% Maximum wastage perSuper Block.</para></sect3><sect3 id="bitmap.impl.super_block" xreflabel="Super Block"><title>Super Block</title><para>A super block is the block of memory acquired from the FLS fromwhich the bitmap allocator carves out memory for single objectsand satisfies the user's requests. These super blocks come insizes that are powers of 2 and multiples of 32(_Bits_Per_Block). Yes both at the same time! That's because thenext super block acquired will be 2 times the previous one, andalso all super blocks have to be multiples of the _Bits_Per_Blockvalue.</para><para>How does it interact with the free list store?</para><para>The super block is contained in the FLS, and the FLS is responsible forgetting / returning Super Bocks to and from the OS using operator newas defined by the C++ standard.</para></sect3><sect3 id="bitmap.impl.super_block_data" xreflabel="Super Block Data"><title>Super Block Data Layout</title><para>Each Super Block will be of some size that is a multiple of thenumber of Bits Per Block. Typically, this value is chosen asBits_Per_Byte x sizeof(size_t). On an x86 system, this gives thefigure 8 x 4 = 32. Thus, each Super Block will be of size 32x Some_Value. This Some_Value is sizeof(value_type). For now, letit be called 'K'. Thus, finally, Super Block size is 32 x K bytes.</para><para>This value of 32 has been chosen because each size_t has 32-bitsand Maximum use of these can be made with such a figure.</para><para>Consider a block of size 64 ints. In memory, it would look like this:(assume a 32-bit system where, size_t is a 32-bit entity).</para><table frame='all'><title>Bitmap Allocator Memory Map</title><tgroup cols='5' align='left' colsep='1' rowsep='1'><colspec colname='c1'></colspec><colspec colname='c2'></colspec><colspec colname='c3'></colspec><colspec colname='c4'></colspec><colspec colname='c5'></colspec><tbody><row><entry>268</entry><entry>0</entry><entry>4294967295</entry><entry>4294967295</entry><entry>Data -> Space for 64 ints</entry></row></tbody></tgroup></table><para>The first Column(268) represents the size of the Block in bytes asseen by the Bitmap Allocator. Internally, a global free list isused to keep track of the free blocks used and given back by thebitmap allocator. It is this Free List Store that is responsiblefor writing and managing this information. Actually the number ofbytes allocated in this case would be: 4 + 4 + (4x2) + (64x4) =272 bytes, but the first 4 bytes are an addition by the Free ListStore, so the Bitmap Allocator sees only 268 bytes. These first 4bytes about which the bitmapped allocator is not aware hold thevalue 268.</para><para>What do the remaining values represent?</para><para>The 2nd 4 in the expression is the sizeof(size_t) because theBitmapped Allocator maintains a used count for each Super Block,which is initially set to 0 (as indicated in the diagram). This isincremented every time a block is removed from this super block(allocated), and decremented whenever it is given back. So, whenthe used count falls to 0, the whole super block will be givenback to the Free List Store.</para><para>The value 4294967295 represents the integer corresponding to the bitrepresentation of all bits set: 11111111111111111111111111111111.</para><para>The 3rd 4x2 is size of the bitmap itself, which is the size of 32-bitsx 2,which is 8-bytes, or 2 x sizeof(size_t).</para></sect3><sect3 id="bitmap.impl.max_wasted" xreflabel="Max Wasted Percentage"><title>Maximum Wasted Percentage</title><para>This has nothing to do with the algorithm per-se,only with some vales that must be chosen correctly to ensure that theallocator performs well in a real word scenario, and maintains a goodbalance between the memory consumption and the allocation/deallocationspeed.</para><para>The formula for calculating the maximum wastage as a percentage:</para><para>(32 x k + 1) / (2 x (32 x k + 1 + 32 x c)) x 100.</para><para>where k is the constant overhead per node (e.g., for list, it is8 bytes, and for map it is 12 bytes) and c is the size of thebase type on which the map/list is instantiated. Thus, suppose thetype1 is int and type2 is double, they are related by the relationsizeof(double) == 2*sizeof(int). Thus, all types must have thisdouble size relation for this formula to work properly.</para><para>Plugging-in: For List: k = 8 and c = 4 (int and double), we get:33.376%</para><para>For map/multimap: k = 12, and c = 4 (int and double), we get: 37.524%</para><para>Thus, knowing these values, and based on the sizeof(value_type), we maycreate a function that returns the Max_Wastage_Percentage for us to use.</para></sect3><sect3 id="bitmap.impl.allocate" xreflabel="Allocate"><title><function>allocate</function></title><para>The allocate function is specialized for single object allocationONLY. Thus, ONLY if n == 1, will the bitmap_allocator'sspecialized algorithm be used. Otherwise, the request is satisfieddirectly by calling operator new.</para><para>Suppose n == 1, then the allocator does the following:</para><orderedlist><listitem><para>Checks to see whether a free block exists somewhere in a regionof memory close to the last satisfied request. If so, then thatblock is marked as allocated in the bit map and given to theuser. If not, then (2) is executed.</para></listitem><listitem><para>Is there a free block anywhere after the current block rightup to the end of the memory that we have? If so, that block isfound, and the same procedure is applied as above, andreturned to the user. If not, then (3) is executed.</para></listitem><listitem><para>Is there any block in whatever region of memory that we ownfree? This is done by checking</para><itemizedlist><listitem><para>The use count for each super block, and if that fails then</para></listitem><listitem><para>The individual bit-maps for each super block.</para></listitem></itemizedlist><para>Note: Here we are never touching any of the memory that theuser will be given, and we are confining all memory accessesto a small region of memory! This helps reduce cachemisses. If this succeeds then we apply the same procedure onthat bit-map as (1), and return that block of memory to theuser. However, if this process fails, then we resort to (4).</para></listitem><listitem><para>This process involves Refilling the internal exponentiallygrowing memory pool. The said effect is achieved by calling_S_refill_pool which does the following:</para><itemizedlist><listitem><para>Gets more memory from the Global Free List of the Requiredsize.</para></listitem><listitem><para>Adjusts the size for the next call to itself.</para></listitem><listitem><para>Writes the appropriate headers in the bit-maps.</para></listitem><listitem><para>Sets the use count for that super-block just allocated to 0(zero).</para></listitem><listitem><para>All of the above accounts to maintaining the basic invariantfor the allocator. If the invariant is maintained, we aresure that all is well. Now, the same process is applied onthe newly acquired free blocks, which are dispatchedaccordingly.</para></listitem></itemizedlist></listitem></orderedlist><para>Thus, you can clearly see that the allocate function is nothing but acombination of the next-fit and first-fit algorithm optimized ONLY forsingle object allocations.</para></sect3><sect3 id="bitmap.impl.deallocate" xreflabel="Deallocate"><title><function>deallocate</function></title><para>The deallocate function again is specialized for single objects ONLY.For all n belonging to > 1, the operator delete is called withoutfurther ado, and the deallocate function returns.</para><para>However for n == 1, a series of steps are performed:</para><orderedlist><listitem><para>We first need to locate that super-block which holds the memorylocation given to us by the user. For that purpose, we maintaina static variable _S_last_dealloc_index, which holds the indexinto the vector of block pairs which indicates the index of thelast super-block from which memory was freed. We use thisstrategy in the hope that the user will deallocate memory in aregion close to what he/she deallocated the last time around. Ifthe check for belongs_to succeeds, then we determine the bit-mapfor the given pointer, and locate the index into that bit-map,and mark that bit as free by setting it.</para></listitem><listitem><para>If the _S_last_dealloc_index does not point to the memory blockthat we're looking for, then we do a linear search on the blockstored in the vector of Block Pairs. This vector in code iscalled _S_mem_blocks. When the corresponding super-block isfound, we apply the same procedure as we did for (1) to mark theblock as free in the bit-map.</para></listitem></orderedlist><para>Now, whenever a block is freed, the use count of that particularsuper block goes down by 1. When this use count hits 0, we removethat super block from the list of all valid super blocks stored inthe vector. While doing this, we also make sure that the basicinvariant is maintained by making sure that _S_last_request and_S_last_dealloc_index point to valid locations within the vector.</para></sect3><sect3 id="bitmap.impl.questions" xreflabel="Questions"><title>Questions</title><sect4 id="bitmap.impl.question.1" xreflabel="Question 1"><title>1</title><para>Q1) The "Data Layout" section iscryptic. I have no idea of what you are trying to say. Layout of what?The free-list? Each bitmap? The Super Block?</para><para>The layout of a Super Block of a givensize. In the example, a super block of size 32 x 1 is taken. Thegeneral formula for calculating the size of a super block is32 x sizeof(value_type) x 2^n, where n ranges from 0 to 32 for 32-bitsystems.</para></sect4><sect4 id="bitmap.impl.question.2" xreflabel="Question 2"><title>2</title><para>And since I just mentioned theterm `each bitmap', what in the world is meant by it? What does eachbitmap manage? How does it relate to the super block? Is the SuperBlock a bitmap as well?</para><para>Each bitmap is part of a Super Block which is made up of 3 partsas I have mentioned earlier. Re-iterating, 1. The use count,2. The bit-map for that Super Block. 3. The actual memory thatwill be eventually given to the user. Each bitmap is a multipleof 32 in size. If there are 32 x (2^3) blocks of single objectsto be given, there will be '32 x (2^3)' bits present. Each 32bits managing the allocated / free status for 32 blocks. Sinceeach size_t contains 32-bits, one size_t can manage up to 32blocks' status. Each bit-map is made up of a number of size_t,whose exact number for a super-block of a given size I have justmentioned.</para></sect4><sect4 id="bitmap.impl.question.3" xreflabel="Question 3"><title>3</title><para>How do the allocate and deallocate functions work in regard tobitmaps?</para><para>The allocate and deallocate functions manipulate the bitmaps andhave nothing to do with the memory that is given to the user. AsI have earlier mentioned, a 1 in the bitmap's bit fieldindicates free, while a 0 indicates allocated. This lets uscheck 32 bits at a time to check whether there is at lease onefree block in those 32 blocks by testing for equality with(0). Now, the allocate function will given a memory block findthe corresponding bit in the bitmap, and will reset it (i.e.,make it re-set (0)). And when the deallocate function is called,it will again set that bit after locating it to indicate thatthat particular block corresponding to this bit in the bit-mapis not being used by anyone, and may be used to satisfy futurerequests.</para><para>e.g.: Consider a bit-map of 64-bits as represented below:1111111111111111111111111111111111111111111111111111111111111111</para><para>Now, when the first request for allocation of a single objectcomes along, the first block in address order is returned. Andsince the bit-maps in the reverse order to that of the addressorder, the last bit (LSB if the bit-map is considered as abinary word of 64-bits) is re-set to 0.</para><para>The bit-map now looks like this:1111111111111111111111111111111111111111111111111111111111111110</para></sect4></sect3><sect3 id="bitmap.impl.locality" xreflabel="Locality"><title>Locality</title><para>Another issue would be whether to keep the all bitmaps in aseparate area in memory, or to keep them near the actual blocksthat will be given out or allocated for the client. After sometesting, I've decided to keep these bitmaps close to the actualblocks. This will help in 2 ways.</para><orderedlist><listitem><para>Constant time access for the bitmap themselves, since no kind oflook up will be needed to find the correct bitmap list or it'sequivalent.</para></listitem><listitem><para>And also this would preserve the cache as far as possible.</para></listitem></orderedlist><para>So in effect, this kind of an allocator might prove beneficial from apurely cache point of view. But this allocator has been made to try androll out the defects of the node_allocator, wherein the nodes getskewed about in memory, if they are not returned in the exact reverseorder or in the same order in which they were allocated. Also, thenew_allocator's book keeping overhead is too much for small objects andsingle object allocations, though it preserves the locality of blocksvery well when they are returned back to the allocator.</para></sect3><sect3 id="bitmap.impl.grow_policy" xreflabel="Grow Policy"><title>Overhead and Grow Policy</title><para>Expected overhead per block would be 1 bit in memory. Also, oncethe address of the free list has been found, the cost forallocation/deallocation would be negligible, and is supposed to beconstant time. For these very reasons, it is very important tominimize the linear time costs, which include finding a free listwith a free block while allocating, and finding the correspondingfree list for a block while deallocating. Therefore, I havedecided that the growth of the internal pool for this allocatorwill be exponential as compared to linear fornode_allocator. There, linear time works well, because we aremainly concerned with speed of allocation/deallocation and memoryconsumption, whereas here, the allocation/deallocation part doeshave some linear/logarithmic complexity components in it. Thus, totry and minimize them would be a good thing to do at the cost of alittle bit of memory.</para><para>Another thing to be noted is the pool size will double every timethe internal pool gets exhausted, and all the free blocks havebeen given away. The initial size of the pool would besizeof(size_t) x 8 which is the number of bits in an integer,which can fit exactly in a CPU register. Hence, the term given isexponential growth of the internal pool.</para></sect3></sect2></sect1>