Implementing Garbage Collection In C

Implementing Garbage Collection In C

How to free all allocated memory and prevent memory leaks 🌬️

If you're coming from a programming language such as C or C++ then you have heard of memory allocation. But have you heard of dynamic memory allocation?

What Is Dynamic Memory Allocation?

In programming, dynamic memory allocation refers to the process of creating memory space during execution or runtime. It C/C++, this is done manually via a group of functions

  • malloc
  • calloc
  • realloc

What Problem Does This Create

Well, since memory allocation is performed manually, releasing of such memory should be done manually too, and failing to do that results in memory leaks.

Memory Leaks 🤔

Yeah, memory leaks. This refers to a type of resource leak in which memory that is no longer being used is not released, so it's lost and now unreachable. Memory leaks can slow down your system and lead to applications lagging.

Garbage Collection To The Rescue

Garbage Collection is a form of memory management. The garbage collector (or gc) attempts to reclaim memory that is no longer in use, hereby preventing memory leaks. Memory that is no longer being used is referred to as garbage.

Can garbage collection be implemented in C?

Yeah it definitely can, that would definitely require some badassery. Their different levels to this kind of badassery, and we'll start from the ground level.

An Implementation Of Garbage Collection

DataStructure

The first thing we need is a struct that holds different information about the allocated memory. Each struct will be a node in a doubly linked list.

/**
 * struct memory - a node within a linked list
 * Description: this node contains information
 * about a dynamically allocated memory.
 */
struct memory
{
    /*
     * @mem: a malloced memory
     */
    void *mem;
    /*
     * @next: a pointer to the next node
     */
    struct memory *next;
};

/* alias for convenience */
typedef struct memory mem_t;

An new way of mallocing

We would need a new function that allocates memory using malloc, then adds that allocated memory to the linked list. We'll call it gc_malloc.

/**
 * gc_malloc - allocate memory on the heap but also
 *    add the allocated memory to a linked list
 *
 * @size: the size of the memory to be allocated
 *
 * @head: the address of the head of the linked list
 *
 * NOTE: This function is incomplete as it does not implement
 * malloc checks.
 *
 * Return: Nothing.
 */
void *gc_malloc(size_t size, mem_t **head)
{
    mem_t *node;

    /* allocate space for the node */
    node = malloc(sizeof(mem_t));
    /* allocate the space the user wants */
    node->mem = malloc(size);

    /* add node to top linked list */
    node->next = *head;
    *head = node;

    return (node->mem);
}

An alternative to stdlib's free()

Since we allocate memory and add a new mem_t node to the list, when we free up memory, we should also remove that added node from the list. Let's keep it simple and call the function gc_free.

/**
 * gc_free - free an allocated memory
 *    and also remove it from the linked list
 * @mem: memory to be freed
 *
 * @head: the address of the head of the linked list
 *
 * Return: Nothing.
 */
void gc_free(void *mem, mem_t **head)
{
    mem_t *temp;

    if ((*head)->mem == mem)
    {
        temp = (*head)->next;
        free((*head)->mem);
        free(*head);
        *head = temp;
    }
    else
    {
        temp = *head;
        while (temp)
        {
            if (temp->mem == mem)
            {
                (*head)->next = temp->next;
                free(temp->mem);
                free(temp);
                break;
            }
            temp = temp->next;
        }
    }
}

Releasing all malloced memory at a go!

We definitely need a function for this. There are a couple of ways to implement this function, so I'll be taking a recursive approach because well, recursion.

Let's stick to our "gc" prefix and call this one gc_free_all.

/**
 * gc_free_all - free all allocated memory at one go.
 *     it also empties the linked list
 *
 * @head: the head of the linked list
 *
 * Return: Nothing.
 */
void gc_free_all(mem_t *head)
{
    if (head == NULL)
        return;
    gc_free_all(head->next);
    free(head->mem);
    free(head);
}

Putting It Together...

We now have a functionality that works like a garbage collector. I'll explain how

  • The linked list is important because it contains all allocated memory as separate nodes.

  • gc_malloc works similarly to malloc(), but it adds a mem_t node to the linked list. So we keep a reference to each memory allocated. Allocated memory turn to leaks when there's no reference to them.

  • gc_free works similarly to free(), but it removes the reference from the linked list, then frees the allocated memory.

  • gc_free_all. The champion! This function releases all memory in the linked list at one go. This ensures that all memory is cleaned up and there are no leaks.

You can have separate linked lists for different functions in your program.


I wrote a simple but problematic code where I did a lot of malloc without effectively keeping a reference to each malloced string.

int main(void)
{
    char *str;
    int i;

    for (i = 0; i < 100; i++)
    {
        str = malloc(12);
    }
    free(str);
    return (0);
}

Here is what I get when I run with valgrind.

==8958== Memcheck, a memory error detector
==8958== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==8958== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==8958== Command: ./gc
==8958== 
==8958== 
==8958== HEAP SUMMARY:
==8958==     in use at exit: 1,188 bytes in 99 blocks
==8958==   total heap usage: 100 allocs, 1 frees, 1,200 bytes allocated
==8958== 
==8958== LEAK SUMMARY:
==8958==    definitely lost: 1,188 bytes in 99 blocks
==8958==    indirectly lost: 0 bytes in 0 blocks
==8958==      possibly lost: 0 bytes in 0 blocks
==8958==    still reachable: 0 bytes in 0 blocks
==8958==         suppressed: 0 bytes in 0 blocks
==8958== Rerun with --leak-check=full to see details of leaked memory
==8958== 
==8958== For lists of detected and suppressed errors, rerun with: -s
==8958== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Here's how I refactored to code using our garbage collector functions.

int main(void)
{
    mem_t *head = NULL;
    char *str;
    int i;

    for (i = 0; i < 100; i++)
    {
        str = gc_malloc(12, &head);
    }
    gc_free_all(head);
    return (0);
}

Here's my new malloc report ✨

==9470== Memcheck, a memory error detector
==9470== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==9470== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==9470== Command: ./gc
==9470== 
==9470== 
==9470== HEAP SUMMARY:
==9470==     in use at exit: 0 bytes in 0 blocks
==9470==   total heap usage: 200 allocs, 200 frees, 2,800 bytes allocated
==9470== 
==9470== All heap blocks were freed -- no leaks are possible
==9470== 
==9470== For lists of detected and suppressed errors, rerun with: -s
==9470== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Nice learning. This was an idea I actually had while I was working on a C project. Lemme know what you think in the comments. If you have any suggestions, questions or improvements, let's talk in the comments 😎