MiniMe Internals: Memory Management

Goals
Pages as Heaps
Garbage Collection
- Mark Phase
- Sweep Phase

Goals

Persistent Objects

The starting goal is built-in support for persistent objects.

My theory here is that persistent objects tend to want to form into relatively small (<10Kbytes) clumps; where the the number of links between objects within each clump is high, and the number of links between objects in different clumps is low. Thus, while it would be difficult to read individual objects into memory and maintain all of the links between objects; it would make sense to read clumps of objects as a unit into memory together -- with the links between them already established.

The objects within each clump are going to be of many different kinds (i.e., different classes or implementations). Thus, each clump sounds like it wants to be its own little heap. Objects of any kind may be allocated and intermixed in the heap and linked together. Direct links are only supported to other objects within the same heap. Some kind of "symbolic link" would be used to link to objects in a different heap.

Function Activation Records in Heap; not Stacked

In the "good old days", programs ran in batch mode. This meant that they were started with all of their input already in place and when they had processed all of this input, the program exited. It was during these days that the idea of using a stack for function activation records (FAR) was developed. And it worked well in that environment.

Then programs changed from being able to process all of their input, start to finish, in one execution to having to deal with their input arriving as asynchronous events. This holds true both for GUI programs and for server programs in the newer client-server programming model.

Unfortunately, we are still stuck with programming languages that are based on a stack. This makes it difficult (and unnatural) to program event-based or interrupt-based programs and makes current programming very difficult and error prone.

The memory arrangement that seems to make sense for today's programming is to allocate FARs in a heap. This removes the rigid fifo execution order imposed by a stack.

Considering this further, we'd want to be able to allocate all of the temporary objects required by a function in its FAR. So each FAR becomes its own little heap too. When the function terminates, the entire FAR could be reclaimed as one unit. This stands as an alternative to the ephemeral garbage collection schemes that focus on reclaiming short-lived objects quickly. And I wonder if this scheme might be somewhat more efficient since temporary objects for functions that last a longer time would still collected en-masse when the function finally terminates.

Bringing it all Together

So what we want is:

to be able to read persistent clumps of objects (already in their own little heap) into memory, i.e., into some broader "big heap"
to be able to allocate and reclaim little FAR heaps within this "big heap"
to be able to still support a "global heap" for objects whose lifetimes are neither persistent nor tied to any particular FAR.

What we must provide, somehow, to the underlying system in order to get this is a specification, at each object creation, of the "little heap" (persistent, global, or FAR) that the new object should be placed in. How this might be done (how much can be determined automatically by the compiler and what extra declarations would be required) is the main focus of this research project.

Pages as Heaps

Physical Pages

So we've seen the need for "little heaps" and for a "big heap" to hold the little heaps. We'll do this by breaking the little heaps into an integral number of fixed-sized physical pages. These physical pages can then be easily and efficiently managed within the "big heap" since they are all the same size. Matching this physical page size to the page size for Intel x86 processors gives us a physical page size of 4K bytes.

Not only are fixed-sized physical pages nice for managing RAM memory, but they are also nice for managing disk memory. In fact, current file systems already use various schemes of allocating integral numbers of fixed sized disk pages to files.

So our "big heap" ends up just being a free list of physical pages. We'll need one page manager for all of RAM memory. And we could have additional page managers for disk memory. (I'd call these "partitions" rather than "file systems", since persistent hash tables would provide the disk directory and inode functions of file systems leaving just the free page management). (Also note that MiniMe doesn't do its own disk page management, this was done to simplify the language).

The various kinds of physical pages (e.g., persistent, overflow, far) are defined in mem_pages.h.

Logical Pages as Heaps

A "little heap", then, is a series of one or more physical pages that are linked together to form a "logical page". From here on, we'll use the term "logical page" rather than "little heap".

These logical pages can easily grow by allocating additional physical pages as needed. Note that the physical pages that make up a single logical page are not necessarily contiguous in memory (or on disk).

Object Lifetime and Pointer Assignment

We've seen that there are going to be many logical pages in memory, each acting as its own little heap. These pages (and the objects on these pages) have widely different lifetimes from persistent pages that live forever to FAR pages that only last microseconds and everything in between.

We've already said that persistent objects on one logical page can not directly point to persistent objects on another logical page (but must use symbolic links instead). It is now time to generalize and formalize the restrictions for pointer use between objects.

A direct pointer from object 'Source' to object 'Destination' is only allowed when:

the lifetime of the Destination object is greater than the lifetime of the Source object, or
both the Source and Destination objects are on the same logical page.

In order to track and enforce this, each logical page is assigned a number indicating its relative lifetime. These lifetime numbers are checked during pointer assignment to catch any violations of the above rule. The greater the number, the shorter the lifetime of the page:

0 - is used for 'nil' and for all symbols because they can be placed safely on any page.
1 - is used for all persistent pages.
2 - is used for the global heap (which is one huge logical page).
3+ - is used for FAR pages, with the number being incremented for each function call level.

Note that this is not a perfect scheme for catching violations. It is possible to create FAR pages on different threads whose lifetime numbers would not be directly comparable to each other. If these FAR pages somehow found each other and tried to store a pointer from one to the other, it is possible that the rules would be violated because the Destination page would have a lifetime number less than the Source page (indicating a longer lifetime), but in fact have a shorter lifetime. Also, in these situations, it is possible that a pointer assignment might be flagged as a violation that really isn't (if the reverse were true).

Garbage Collection

We no longer have a need for ephemeral collection, so I'm using a mark-sweep collector. My reasons are that a mark-sweep collector is (at least conceptually) easier and that it may be faster (or at least not slower?) given that it will be used to collect all of memory where the expectation is that most objects survive the collection.

Note that there is no need to collect objects on persistent pages. This has the benefit of reducing, probably dramatically, the number of objects that need to be examined during collection. (But persistent pages are marked as 'in_use' during collection and an 'age' maintained to aid selection for removal from memory when memory gets tight).

Mark Phase

All objects allocated in heaps must be derived from Mini_obj. The Mini_obj class provides two operations that are used in the mark phase:

test_and_set_mark: each object is responsible for maintaining its own mark bit
mark: marks the objects pointed to by this object by calling 'mark_obj' on each pointer

Sweep Phase

Each physical page maintains its own free space list for allocation. The sweep phase examines each physical page in turn. All physical pages (except persistent pages) that were not marked 'in_use' during the mark phase are simply placed back on the page_manager's free page list.

Non persistent physical pages that are marked 'in_use' are swept to rebuild their free space list. There are several operations provided by the Mini_obj class for this purpose:

test_and_reset_bit
len: to determine where the next memory location to examine is
reclaim: called on objects about to be reclaimed (only used by File and fs_hash).