GAP Internals - GASMAN

GAP

Published

February 11, 2024

The following post will only be of interest to people who:

Know a bit about GAP’s internals
Know a little bit of C
Want to know more about how GAP handles memory.

I’m also going to not explain much about GAP kernel programming in general – if you would like me to, then tell me and I might write something up! This post is divided into two sections. A brief overview of gasman, GAP’s memory manager, and then some practical examples of errors I have found.

Overview

The most important feature of GAP’s memory manager is that it moves memory on garbage collection. This causes problems, because in C it is natural to have pointers, and these pointers can become invalidated.

So, how does GAP represent memory, and do this garbage collection? How, when we move an object, can we ensure that every pointer to that object is updated to it’s new location?

The answer is (as with many things in computing), another layer of indirection. GAP’s main function for allocating memory is NewBag. Unlike C’s malloc, which returns a pointer, NewBag returns a Bag, which is a pointer to a pointer. These Bag pointers are known as master pointers. We can see that a Bag is two points from it’s definition:

typedef UInt * * Bag;

To get to the actual allocated memory, we must dereference our Bag twice. Why do this? To move a normal block of C memory you have to move every pointer which points to it. We can have many references to a Bag, and to move the memory it points to we only have to change one pointer – the first level of indirection.

Here is some pseudo-code showing the idea (don’t do this in practice, GAP has other bookkeeping to do and this would cause GAP to horribly explode!).

void moveBag(Bag b, void* newLocation, size_t len) {
    // First, move the Bag's contents
    memcpy(newLocation, *b, len);
    // Then, move the master pointer
    *b = newLocation;
}

So, how does this effect us? There are two main issues this causes, which unfortunately are in conflict:

It is slightly slower to access memory in a Bag bag than with a simple C pointer, as we have to keep dereferencing pointers twice. Therefore it is tempting to calculate *bag, and cache the value.
We must recalculate *bag, or any explicit C pointer inside a bag, whenever an GC occurs. This can occur whenever a GAP memory allocation function, in particular NewBag or ResizeBag, are called.

The second point is the most important, as the bugs this creates are very hard to find. To hit a bug caused by holding a pointer to the inside of a bag we need to:

Take a pointer to the inside of a Bag
Call NewBag, ResizeBag, or a function which calls one of these.
Have that allocation cause a garbage collection to occur.

Most allocations don’t cause a garbage collection, so don’t cause an issue. Also, the resulting errors are incredibly hard to fix – if a GC happens, then you try to write into your Bag, you end up writing over a random location in a random other Bag – and this change may not be noticable until much later, when you then access that now corrupted bag!

Examples

These are all genuine examples of this kind of bug, from the GAP 4.8 beta release. Many of them had been around for years, but I made a deep effort to dig them out (how? That’s for a later post!) I suspect these bugs were, very occasionally, causing unexpected crashes but they were extremely hard to reproduce. I will group them into categories.

Bad Strings

It is often tempting to take a raw C pointer to a string in GAP – these are required to call C standard library functions for example. However, we must not hold these raw C pointers over a call to NewBag!

The standard function to get a raw C pointer from a GAP string CSTR_STRING (this actually only works for certain GAP strings, but we won’t go into that here).

These first three set off immediate alarm bells. We are passing a C string (from CSTR_STRING) into functions which sound like they might allocate a new bag at some point (and they all do). Therefore we have to rewrite them to avoid that.

Note there is nothing wrong with passing the C string "object, value" into a function – it is just C strings which we stored inside GAP Bags we have to worry about.

// src/opers.c : FuncSETTER_FUNCTION
func = NewFunctionCT( T_FUNCTION, SIZE_FUNC, CSTR_STRING(fname), 2,
                      "object, value", DoSetterFunction );

// src/opers.c : FuncGETTER_FUNCTION
func = NewFunctionCT( T_FUNCTION, SIZE_FUNC, CSTR_STRING(fname), 1,
                      "object, value", DoGetterFunction );

// src/streams.c : FuncREAD_GAP_ROOT
return READ_GAP_ROOT(CSTR_STRING(filename)) ? True : False;

Sometimes, we just call CSTR_STRING too early. In this next function we just calculated init_key too early, and moved it’s construction later in the function.

// src/intfuncs.c : FuncInitRandomMT
init_key = CHARS_STRING(initstr);

We need to explain some functions for this next example. NameGVar gives us a raw char* to a string, and C_NEW_STRING_DYN makes a new GAP string by copying a C string. Therefore we are, in a round-about way, copying a GAP string. However, this copy must make a new GAP bag, which will invalidate name! The answer is to use a string copy function which directly copies a bag.

// src/gvars.c : AssGVar
Char* name;
name = NameGVar(gvar);
C_NEW_STRING_DYN(onam, name);

Holding pointers

This class of bugs is just holding a generic C pointer too long. In this code snippet (repeated similarly in many functions in the same file), we calculate cache before we start looping, then use it each time around the loop. The loop makes new bags, so we must recalculate cache before each use.

// src/opers.c : DoOperation0Args (and many others)
     /* try to find an applicable method in the cache                       */
    cache = 1+ADDR_OBJ( CacheOper( oper, 2 ) );
...
    do {
...
            if (  cache[i] != 0  && cache[i+1] == prec) {
              method = cache[i];
...
        } while (res == TRY_NEXT_METHOD );

Bad Assignments

This is the most complicated type of error, and most common. It is caused partly by the strangeness of the C programming language.

Let’s pick one, and pull it apart (I’ve unfolded some macros for you). In this example, NAME_FUNC takes a Bag which represents a function, and lets us name the function.

// src/opers.c : NewOperationArgs
#define NAME_FUNC(func) (*( *(func) + 8))
NAME_FUNC(func) = CopyObj( name, 0 );

At first this seems fine. We calculate CopyObj (which might do a garbage collection), then we do the derefencing within NAME_FUNC to find where we are assigning to, then do the assignment. Except, that’s not how C works!.

The compiler is allowed (and does) mix up the left and right hand sides of an =, and if it calculates the left first to get the memory location we are writing to, then calculates the right (causing a garbage collection), then assigning, bad things happen! Fixing this is easy (if boring), just put an explicit extra assignment in.

// src/opers.c : NewOperationArgs
#define NAME_FUNC(func) (*( *(func) + 8))
Bag temp = CopyObj( name, 0 );
NAME_FUNC(func) = temp;

The same problem occurs in these other examples.

// src/opers.c : SetterAndFilter
FLAG1_FILT(setter)  = SetterFilter( FLAG1_FILT(getter) );
FLAG2_FILT(setter)  = SetterFilter( FLAG2_FILT(getter) );

// src/opers.c : FuncGETTER_FUNCTION
ENVI_FUNC(func) = INTOBJ_INT( RNamObj(name) );

Conclusions

In general, the important thing is to be careful whenever taking raw pointers, and resisting the urge to hold them for a long time – don’t worry about efficiency unless you have profiled your code! Also, never call a function on both the left, and right, hand side of an assignment.