<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><meta content="text/html; charset=ISO-8859-1"http-equiv="content-type"><title>Bitmap Allocator</title><meta content="Dhruv Matani" name="author"><meta content="Bitmap Allocator" name="description"></head><body><h1 style="text-align: center;">Bitmap Allocator</h1><em><br><small><small>The latest version of this document is always availableat <ahref="http://gcc.gnu.org/onlinedocs/libstdc++/ext/ballocator_doc.html">http://gcc.gnu.org/onlinedocs/libstdc++/ext/ballocator_doc.html</a>.</small></small></em><br><br><em> To the <a href="http://gcc.gnu.org/libstdc++/">libstdc++-v3homepage</a>.</em><br><br><hr style="width: 100%; height: 2px;"><br>As this name suggests, this allocator uses a bit-map to keep track ofthe used and unused memory locations for it's book-keeping purposes.<br><br>This allocator will make use of 1 single bit to keep track of whetherit has been allocated or not. A bit 1 indicates free, while 0 indicatesallocated. This has been done so that you can easily check a collectionof bits for a free block. This kind of Bitmapped strategy works bestfor single object allocations, and with the STL type parameterizedallocators, we do not need to choose any size for the block which willbe represented by a single bit. This will be the size of the parameteraround which the allocator has been parameterized. Thus, close tooptimal performance will result. Hence, this should be used for nodebased containers which call the allocate function with an argument of 1.<br><br>The bitmapped allocator's internal pool is exponentially growing.Meaning that internally, the blocks acquired from the Free List Storewill double every time the bitmapped allocator runs out of memory.<br><br><hr style="width: 100%; height: 2px;"><br>The macro __GTHREADS decides whether to use Mutex Protection aroundevery allocation/deallocation. The state of the macro is picked upautomatically from the gthr abstration layer.<br><br><hr style="width: 100%; height: 2px;"><h3 style="text-align: center;">What is the Free List Store?</h3><br>The Free List Store (referred to as FLS for the remaining part of thisdocument) is the Global memory pool that is shared by all instances ofthe bitmapped allocator instantiated for any type. This maintains asorted order of all free memory blocks given back to it by thebitmapped allocator, and is also responsible for giving memory to thebitmapped allocator when it asks for more.<br><br>Internally, there is a Free List threshold which indicates the Maximumnumber of free lists that the FLS can hold internally (cache).Currently, this value is set at 64. So, if there are more than 64 freelists coming in, then some of them will be given back to the OS usingoperator delete so that at any given time the Free List's size does notexceed 64 entries. This is done because a Binary Search is used tolocate an entry in a free list when a request for memory comes along.Thus, the run-time complexity of the search would go up given anincreasing size, for 64 entries however, lg(64) == 6 comparisons areenough to locate the correct free list if it exists.<br><br>Suppose the free list size has reached it's threshold, then the largestblock from among those in the list and the new block will be selectedand given back to the OS. This is done because it reduces externalfragmentation, and allows the OS to use the larger blocks later in anorderly fashion, possibly merging them later. Also, on some systems,large blocks are obtained via calls to mmap, so giving them back tofree system resources becomes most important.<br><br>The function _S_should_i_give decides the policy that determineswhether the current block of memory should be given to the allocatorfor the request that it has made. That's because we may not always haveexact fits for the memory size that the allocator requests. We do thismainly to prevent external fragmentation at the cost of a littleinternal fragmentation. Now, the value of this internal fragmentationhas to be decided by this function. I can see 3 possibilities rightnow. Please add more as and when you find better strategies.<br><br><ol><li>Equal size check. Return true only when the 2 blocks are of equalsize.</li><li>Difference Threshold: Return true only when the _block_size isgreater than or equal to the _required_size, and if the _BS is > _RSby a difference of less than some THRESHOLD value, then return true,else return false. </li><li>Percentage Threshold. Return true only when the _block_size isgreater than or equal to the _required_size, and if the _BS is > _RSby a percentage of less than some THRESHOLD value, then return true,else return false.</li></ol><br>Currently, (3) is being used with a value of 36% Maximum wastage perSuper Block.<br><br><hr style="width: 100%; height: 2px;"><span style="font-weight: bold;">1)What is a super block? Why is it needed?</span><br><br>A super block is the block of memory acquired from the FLS from whichthe bitmap allocator carves out memory for single objects and satisfiesthe user's requests. These super blocks come in sizes that are powersof 2 and multiples of 32 (_Bits_Per_Block). Yes both at the same time!That's because the next super block acquired will be 2 times theprevious one, and also all super blocks have to be multiples of the_Bits_Per_Block value. <br><br><span style="font-weight: bold;">2) How does it interact with the freelist store?</span><br><br>The super block is contained in the FLS, and the FLS is responsible forgetting / returning Super Bocks to and from the OS using operator newas defined by the C++ standard.<br><br><hr style="width: 100%; height: 2px;"><h3 style="text-align: center;">How does the allocate function Work?</h3><br>The allocate function is specialized for single object allocation ONLY.Thus, ONLY if n == 1, will the bitmap_allocator's specialized algorithmbe used. Otherwise, the request is satisfied directly by callingoperator new.<br><br>Suppose n == 1, then the allocator does the following:<br><br><ol><li>Checks to see whether the a free block exists somewhere in aregion of memory close to the last satisfied request. If so, then thatblock is marked as allocated in the bit map and given to the user. Ifnot, then (2) is executed.</li><li>Is there a free block anywhere after the current block right uptothe end of the memory that we have? If so, that block is found, and thesame procedure is applied as above, and returned to the user. If not,then (3) is executed.</li><li>Is there any block in whatever region of memory that we own free?This is done by checking <br><div style="margin-left: 40px;"><ul><li>The use count for each super block, and if that fails then </li><li>The individual bit-maps for each super block. </li></ul></div>Note: Here we are never touching any of the memory that the user willbe given, and we are confining all memory accesses to a small region ofmemory! This helps reduce cache misses. If this succeeds then we applythe same procedure on that bit-map as (1), and return that block ofmemory to the user. However, if this process fails, then we resort to(4).</li><li>This process involves Refilling the internal exponentiallygrowing memory pool. The said effect is achieved by calling_S_refill_pool which does the following: <br><div style="margin-left: 40px;"><ul><li>Gets more memory from the Global Free List of the Requiredsize. </li><li>Adjusts the size for the next call to itself. </li><li>Writes the appropriate headers in the bit-maps.</li><li>Sets the use count for that super-block just allocated to 0(zero). </li><li>All of the above accounts to maintaining the basic invariantfor the allocator. If the invariant is maintained, we are sure that allis well. Now, the same process is applied on the newly acquired freeblocks, which are dispatched accordingly.</li></ul></div></li></ol><br>Thus, you can clearly see that the allocate function is nothing but acombination of the next-fit and first-fit algorithm optimized ONLY forsingle object allocations.<br><br><br><hr style="width: 100%; height: 2px;"><h3 style="text-align: center;">How does the deallocate function work?</h3><br>The deallocate function again is specialized for single objects ONLY.For all n belonging to > 1, the operator delete is called withoutfurther ado, and the deallocate function returns.<br><br>However for n == 1, a series of steps are performed:<br><br><ol><li>We first need to locate that super-block which holds the memorylocation given to us by the user. For that purpose, we maintain astatic variable _S_last_dealloc_index, which holds the index into thevector of block pairs which indicates the index of the last super-blockfrom which memory was freed. We use this strategy in the hope that theuser will deallocate memory in a region close to what he/shedeallocated the last time around. If the check for belongs_to succeeds,then we determine the bit-map for the given pointer, and locate theindex into that bit-map, and mark that bit as free by setting it.</li><li>If the _S_last_dealloc_index does not point to the memory blockthat we're looking for, then we do a linear search on the block storedin the vector of Block Pairs. This vector in code is called_S_mem_blocks. When the corresponding super-block is found, we applythe same procedure as we did for (1) to mark the block as free in thebit-map.</li></ol><br>Now, whenever a block is freed, the use count of that particular superblock goes down by 1. When this use count hits 0, we remove that superblock from the list of all valid super blocks stored in the vector.While doing this, we also make sure that the basic invariant ismaintained by making sure that _S_last_request and_S_last_dealloc_index point to valid locations within the vector.<br><br><hr style="width: 100%; height: 2px;"><br><h3 style="text-align: center;">Data Layout for a Super Block:</h3><br>Each Super Block will be of some size that is a multiple of the numberof Bits Per Block. Typically, this value is chosen as Bits_Per_Byte xsizeof(size_t). On an x86 system, this gives the figure 8 x4 = 32. Thus, each Super Block will be of size 32 x Some_Value. ThisSome_Value is sizeof(value_type). For now, let it be called 'K'. Thus,finally, Super Block size is 32 x K bytes.<br><br>This value of 32 has been chosen because each size_t has 32-bitsand Maximum use of these can be made with such a figure.<br><br>Consider a block of size 64 ints. In memory, it would look like this:(assume a 32-bit system where, size_t is a 32-bit entity).<br><br><table cellpadding="0" cellspacing="0" border="1"style="text-align: left; width: 763px; height: 21px;"><tbody><tr><td style="vertical-align: top; text-align: center;">268<br></td><td style="vertical-align: top; text-align: center;">0<br></td><td style="vertical-align: top; text-align: center;">4294967295<br></td><td style="vertical-align: top; text-align: center;">4294967295<br></td><td style="vertical-align: top; text-align: center;">Data ->Space for 64 ints<br></td></tr></tbody></table><br><br>The first Column(268) represents the size of the Block in bytes as seenbythe Bitmap Allocator. Internally, a global free list is used to keeptrack of the free blocks used and given back by the bitmap allocator.It is this Free List Store that is responsible for writing and managingthis information. Actually the number of bytes allocated in this casewould be: 4 + 4 + (4x2) + (64x4) = 272 bytes, but the first 4 bytes areanaddition by the Free List Store, so the Bitmap Allocator sees only 268bytes. These first 4 bytes about which the bitmapped allocator is notaware hold the value 268.<br><br><span style="font-weight: bold;">What do the remaining values represent?</span><br><br>The 2nd 4 in the expression is the sizeof(size_t) because theBitmapped Allocator maintains a used count for each Super Block, whichis initially set to 0 (as indicated in the diagram). This isincremented every time a block is removed from this super block(allocated), and decremented whenever it is given back. So, when theused count falls to 0, the whole super block will be given back to theFree List Store.<br><br>The value 4294967295 represents the integer corresponding to the bitrepresentation of all bits set: 11111111111111111111111111111111.<br><br>The 3rd 4x2 is size of the bitmap itself, which is the size of 32-bitsx 2,which is 8-bytes, or 2 x sizeof(size_t).<br><br><hr style="width: 100%; height: 2px;"><br>Another issue would be whether to keep the all bitmaps in a separatearea in memory, or to keep them near the actual blocks that will begiven out or allocated for the client. After some testing, I've decidedto keep these bitmaps close to the actual blocks. this will help in 2ways. <br><br><ol><li>Constant time access for the bitmap themselves, since no kind oflook up will be needed to find the correct bitmap list or it'sequivalent.</li><li>And also this would preserve the cache as far as possible.</li></ol><br>So in effect, this kind of an allocator might prove beneficial from apurely cache point of view. But this allocator has been made to try androll out the defects of the node_allocator, wherein the nodes getskewed about in memory, if they are not returned in the exact reverseorder or in the same order in which they were allocated. Also, thenew_allocator's book keeping overhead is too much for small objects andsingle object allocations, though it preserves the locality of blocksvery well when they are returned back to the allocator.<br><br><hr style="width: 100%; height: 2px;"><br>Expected overhead per block would be 1 bit in memory. Also, once theaddress of the free list has been found, the cost forallocation/deallocation would be negligible, and is supposed to beconstant time. For these very reasons, it is very important to minimizethe linear time costs, which include finding a free list with a freeblock while allocating, and finding the corresponding free list for ablock while deallocating. Therefore, I have decided that the growth ofthe internal pool for this allocator will be exponential as compared tolinear for node_allocator. There, linear time works well, because weare mainly concerned with speed of allocation/deallocation and memoryconsumption, whereas here, the allocation/deallocation part does havesome linear/logarithmic complexity components in it. Thus, to try andminimize them would be a good thing to do at the cost of a little bitof memory.<br><br>Another thing to be noted is the the pool size will double every timethe internal pool gets exhausted, and all the free blocks have beengiven away. The initial size of the pool would be sizeof(size_t) x 8which is the number of bits in an integer, which can fit exactlyin a CPU register. Hence, the term given is exponential growth of theinternal pool.<br><br><hr style="width: 100%; height: 2px;">After reading all this, you maystill have a few questions about the internal working of thisallocator, like my friend had!<br><br>Well here are the exact questions that he posed:<br><br><span style="font-weight: bold;">Q1) The "Data Layout" section iscryptic. I have no idea of what you are trying to say. Layout of what?The free-list? Each bitmap? The Super Block?</span><br><br><div style="margin-left: 40px;"> The layout of a Super Block of a givensize. In the example, a super block of size 32 x 1 is taken. Thegeneral formula for calculating the size of a super block is32 x sizeof(value_type) x 2^n, where n ranges from 0 to 32 for 32-bitsystems.<br></div><br><span style="font-weight: bold;">Q2) And since I just mentioned theterm `each bitmap', what in the world is meant by it? What does eachbitmap manage? How does it relate to the super block? Is the SuperBlock a bitmap as well?</span><br style="font-weight: bold;"><br><div style="margin-left: 40px;"> Good question! Each bitmap is part ofaSuper Block which is made up of 3 parts as I have mentioned earlier.Re-iterating, 1. The use count, 2. The bit-map for that Super Block. 3.The actual memory that will be eventually given to the user. Eachbitmap is a multiple of 32 in size. If there are 32 x (2^3) blocks ofsingle objects to be given, there will be '32 x (2^3)' bits present.Each32 bits managing the allocated / free status for 32 blocks. Since eachsize_t contains 32-bits, one size_t can manage upto 32blocks' status. Each bit-map is made up of a number of size_t,whose exact number for a super-block of a given size I have justmentioned.<br></div><br><span style="font-weight: bold;">Q3) How do the allocate and deallocatefunctions work in regard to bitmaps?</span><br><br><div style="margin-left: 40px;"> The allocate and deallocate functionsmanipulate the bitmaps and have nothing to do with the memory that isgiven to the user. As I have earlier mentioned, a 1 in the bitmap's bitfield indicates free, while a 0 indicates allocated. This lets us check32 bits at a time to check whether there is at lease one free block inthose 32 blocks by testing for equality with (0). Now, the allocatefunction will given a memory block find the corresponding bit in thebitmap, and will reset it (ie. make it re-set (0)). And when thedeallocate function is called, it will again set that bit afterlocating it to indicate that that particular block corresponding tothis bit in the bit-map is not being used by anyone, and may be used tosatisfy future requests.<br><br>eg: Consider a bit-map of 64-bits as represented below:<br>1111111111111111111111111111111111111111111111111111111111111111<br><br>Now, when the first request for allocation of a single object comesalong, the first block in address order is returned. And since thebit-maps in the reverse order to that of the address order, the lastbit(LSB if the bit-map is considered as a binary word of 64-bits) isre-set to 0.<br><br>The bit-map now looks like this:<br>1111111111111111111111111111111111111111111111111111111111111110<br></div><br><br><hr style="width: 100%; height: 2px;"><br>(Tech-Stuff, Please stay out if you are not interested in the selectionof certain constants. This has nothing to do with the algorithm per-se,only with some vales that must be chosen correctly to ensure that theallocator performs well in a real word scenario, and maintains a goodbalance between the memory consumption and the allocation/deallocationspeed).<br><br>The formula for calculating the maximum wastage as a percentage:<br><br>(32 x k + 1) / (2 x (32 x k + 1 + 32 x c)) x 100.<br><br>Where,<br>k => The constant overhead per node. eg. for list, it is 8 bytes,and for map it is 12 bytes.<br>c => The size of the base type on which the map/list isinstantiated. Thus, suppose the the type1 is int and type2 is double,they are related by the relation sizeof(double) == 2*sizeof(int). Thus,all types must have this double size relation for this formula to workproperly.<br><br>Plugging-in: For List: k = 8 and c = 4 (int and double), we get:<br>33.376%<br><br>For map/multimap: k = 12, and c = 4 (int and double), we get:<br>37.524%<br><br>Thus, knowing these values, and based on the sizeof(value_type), we maycreate a function that returns the Max_Wastage_Percentage for us to use.<br><br><hr style="width: 100%; height: 2px;"><small><small><em> See <ahref="file:///home/dhruv/projects/libstdc++-v3/gcc/libstdc++-v3/docs/html/17_intro/license.html">license.html</a>for copying conditions. Comments and suggestions are welcome, and maybesent to <a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailinglist</a>.</em><br></small></small><br><br></body></html>