Vedant Misra
Dynamic Memory Allocator
systemsCompleted coursework project

Dynamic Memory Allocator

A Custom malloc Implementation

Custom implementation of C's malloc, free, and realloc from scratch, achieving production-comparable allocator behavior while demonstrating low-level systems programming.

Role

Implementer

Timeline

September - October 2024

Institution

Pennsylvania State University

Course

CMPSC 473 - Operating Systems

Focus

Allocator design and heap correctness

100%

Correctness

20/20 test traces passed

88%

Memory Utilization

Efficient space usage

~12K

Throughput (ops/s)

High performance

C ProgrammingSystems ProgrammingData StructuresAlgorithm DesignMemory ManagementPerformance OptimizationDebuggingGit

Executive Summary

Custom implementation of C's malloc, free, and realloc functions from scratch, achieving performance comparable to production allocators while demonstrating mastery of low-level systems programming.

  • 100% Correctness - 20/20 test traces passed

  • 88% Memory Utilization - Efficient space usage

  • ~12K Throughput (ops/s) - High performance

Key Features

  • Segregated free lists with 11 size classes

  • Immediate coalescing for fragmentation reduction

  • 16-byte memory alignment

  • Comprehensive heap consistency checker

  • O(1) insertion and removal from free lists

Technical Skills Demonstrated

  • C Programming
  • Systems Programming
  • Data Structures
  • Algorithm Design
  • Memory Management
  • Performance Optimization
  • Debugging (GDB)
  • Git

Background & Motivation

What is Dynamic Memory Allocation?

In C programming, memory management is a critical task that developers must handle explicitly. Unlike languages with automatic garbage collection (Java, Python), C requires programmers to manually allocate and deallocate memory. This control provides performance benefits but introduces complexity.

Dynamic memory allocation refers to the process of allocating memory at runtime, as opposed to compile-time. The C standard library provides several functions for this:

  • malloc(size_t size): Allocates a block of memory of specified size
  • free(void* ptr): Deallocates a previously allocated block
  • realloc(void* ptr, size_t size): Resizes an allocated block
  • calloc(size_t nmemb, size_t size): Allocates and zero-initializes memory

Why Build a Custom Allocator?

Building a memory allocator from scratch provides deep insights into:

  1. Operating System Concepts: Understanding how programs interact with memory
  2. Performance Optimization: Balancing speed vs. memory utilization
  3. Data Structures: Implementing efficient free list management
  4. Low-Level Programming: Pointer arithmetic, bit manipulation, and memory alignment
  5. Debugging Skills: Detecting memory corruption and fragmentation issues

The Challenge

The primary challenge in memory allocation is the trade-off between throughput and memory utilization:

  • Throughput: Number of allocation/deallocation operations per second
  • Utilization: Ratio of payload memory to total heap memory (minimizing fragmentation)

This project aims to achieve an optimal balance between these competing goals.

Theoretical Foundation

Memory Layout & Heap Structure

In a typical C program, memory is organized into several segments:

Memory Layout

The heap is the memory region where dynamic allocation occurs. It grows upward (toward higher memory addresses) as programs request more memory.

Block Structure

Each allocated or free block in our heap has a specific structure:

Block Structure

Segregated Free Lists

To combat fragmentation and improve allocation speed, we use segregated free lists. This technique maintains multiple free lists, each for a specific size class:

Segregated Free Lists

Benefits:

  • Fast Lookup: Search only relevant size classes
  • Reduced Fragmentation: Similar-sized blocks grouped together
  • Better Cache Performance: Improved locality of reference

Coalescing

Coalescing is the process of merging adjacent free blocks to form larger blocks, reducing external fragmentation.

Coalescing Cases

System Architecture

High-Level Design

Our memory allocator implements the following architecture:

System Architecture

Size Classes

Our implementation uses 11 segregated lists with exponentially increasing size ranges:

IndexSize Range (bytes)Description
016 - 32Tiny allocations
133 - 64Small allocations
265 - 128Small-medium
3129 - 256Medium
4257 - 512Medium-large
5513 - 1024Large (1KB)
61025 - 2048Large (2KB)
72049 - 4096Large (4KB)
84097 - 8192Very large (8KB)
98193 - 16384Very large (16KB)
10>16384Huge allocations

Implementation Details

Core Data Structures

Global Variables:

$ c
static size_t wordsize = 8;           // 64-bit word
static size_t double_wordsize = 16;   // Minimum block size
static size_t chunk_size = (1 << 12); // 4KB default extension
static char *heaplist_pointer;        // Points to first block
static char *seg_list[11];            // Segregated free list array

Memory Allocation: malloc(size_t size)

The allocation process follows these steps:

$ c
void *malloc(size_t size) {
    size_t asize;       // Adjusted size
    size_t extendsize;  // Extension size if needed
    char *bp;           // Block pointer
    
    // Step 1: Ignore spurious requests
    if (size == 0)
        return NULL;
    
    // Step 2: Adjust size for alignment (minimum 16 bytes)
    if (size > double_wordsize)
        asize = align(size);
    else
        asize = double_wordsize;
    
    // Step 3: Search for a fit in free lists
    if ((bp = find_fit(asize)) != NULL) {
        place(bp, asize);
        return bp;
    }
    
    // Step 4: No fit found, extend heap
    extendsize = (asize > chunk_size) ? asize : chunk_size;
    if ((bp = extend_heap(extendsize / wordsize)) == NULL)
        return NULL;
    
    // Step 5: Place block in extended heap
    bp = find_fit(asize);
    place(bp, asize);
    
    return bp;
}

Finding a Free Block: find_fit()

Our implementation uses a first-fit strategy within segregated lists:

$ c
static void *find_fit(size_t asize) {
    int index = get_index(asize);  // Determine appropriate size class
    
    // Search from appropriate size class upward
    for (int i = index; i < 11; i++) {
        char *current_block = (char *)seg_list[i];
        
        // Skip empty lists
        if (current_block == (char *)mem_heap_lo())
            continue;
        
        // Traverse list to find first fit
        while (current_block != (char *)mem_heap_lo()) {
            size_t block_size = extract_block_size(hdrp(current_block));
            
            if (asize <= block_size) {
                return current_block;  // First fit found!
            }
            
            current_block = get_address(successor_addr(current_block));
        }
    }
    
    return NULL;  // No fit found
}

Why First-Fit?

We experimented with both first-fit and best-fit strategies:

  • First-Fit: Returns the first block that fits
    • ✅ Faster allocation (O(n) worst case, but often O(1) in practice)
    • ✅ Better throughput
    • ⚠️ May increase fragmentation slightly
  • Best-Fit: Returns the smallest block that fits
    • ✅ Better utilization (less fragmentation)
    • ❌ Slower (must search entire list)
    • ❌ Lower throughput

Our testing showed that first-fit provided better overall performance with acceptable utilization.

Freeing Memory: free(void *ptr)

Freeing is straightforward: mark the block as free and coalesce:

$ c
void free(void *block_pointer) {
    if (block_pointer == NULL)
        return;
    
    size_t block_size = extract_block_size(hdrp(block_pointer));
    
    // Mark block as free
    put_value(hdrp(block_pointer), block_header_config(block_size, 0));
    put_value(ftrp(block_pointer), block_header_config(block_size, 0));
    
    // Coalesce with adjacent free blocks
    coalesce(block_pointer);
}

Performance Analysis

Time Complexity

OperationComplexityExplanation
mallocO(n) worst case, O(1) averageFirst-fit in segregated lists
freeO(1)Constant-time coalescing and list insertion
reallocO(n)May require malloc + memcpy
List insertionO(1)Insert at head
List removalO(1)Direct pointer manipulation

Design Trade-offs

First-Fit vs. Best-Fit:

MetricFirst-FitBest-FitWinner
Speed⚡⚡⚡⚡ Fast⚡⚡ SlowerFirst-Fit ✓
Utilization📊📊📊 Good📊📊📊📊 BetterBest-Fit ✓
Implementation✅ Simple⚠️ ComplexFirst-Fit ✓
Our Choice✅ SelectedFirst-Fit ✓

Justification: Our testing showed first-fit achieved 90%+ of best-fit's utilization while providing significantly better throughput.

Heap Consistency Checker

The heap checker is a critical debugging tool that validates heap invariants.

Implemented Checks

  1. CHECK 1: Every block in free list is marked as free
  2. CHECK 2: No contiguous free blocks (coalescing check)
  3. CHECK 3: Every free block is in the free list
  4. CHECK 4: Free list pointers are valid
  5. CHECK 5: No block overlaps
  6. CHECK 6: All blocks properly aligned

Debugging Strategy

When developing, we used a systematic approach:

  1. Enable DEBUG mode by uncommenting #define DEBUG
  2. Call mm_checkheap(__LINE__) after every operation
  3. Print detailed error messages with line numbers
  4. Use gdb to inspect memory at failure points
  5. Test with small traces first (syn-*-short.rep)

Challenges & Solutions

Challenge 1: Pointer Corruption

Problem: Segmentation faults due to corrupted pointers in free lists.

Root Cause: Incorrect removal from segregated lists - not updating all necessary pointers.

Solution: Added comprehensive case handling in remove_from_segregated_list. Ensured all four cases (only, first, middle, last) update pointers correctly.

Challenge 2: Coalescing Errors

Problem: Memory utilization test failures due to fragmentation.

Root Cause: Coalescing was removing blocks from free list before merging, but not always adding the merged block back.

Solution: Ensured coalesce() ALWAYS calls add_to_segregated_list() before returning. Added heap checker to detect adjacent free blocks.

Challenge 3: Alignment Issues

Problem: Random crashes when accessing certain allocated blocks.

Root Cause: Not properly aligning sizes to 16-byte boundaries.

Solution: Implemented robust alignment function and applied alignment consistently in malloc and realloc.

Challenge 4: Performance Tuning

Problem: Low throughput scores despite correct functionality.

Root Cause: Best-fit search was too slow.

Solution: Switched from best-fit to first-fit. Reduced average search time from O(n) to O(1) in most cases. Throughput increased by ~40% with & lt;5% utilization decrease.

Results & Evaluation

Test Suite

The allocator was tested using 20 trace files:

Trace CategoryCountDescription
BDD4Binary Decision Diagram operations
CBIT4Constraint generation for BDD checker
NGRAM5N-gram frequency counting
SYN-ARRAY2Synthetic array allocations
SYN-STRING2Synthetic string operations
SYN-STRUCT2Synthetic struct allocations
SYN-MIX2Mixed allocation patterns
SYN-LARGEMEM1Large memory allocations (64-bit test)

Performance Metrics

MetricTargetAchievedGrade
Correctness20/20 traces20/20 ✅100%
Utilization>70%~88% ✅Excellent
Throughput>8000 Kops/s~12000 Kops/s ✅Excellent

Comparison with Standard Library

AllocatorUtilizationThroughputComplexity
GNU libc malloc~95%~20000 Kops/sVery High
Our implementation~88%~12000 Kops/sMedium
Naive implicit list~65%~2000 Kops/sLow

Analysis: Our allocator achieves excellent performance for an educational implementation, reaching ~60% of glibc's throughput while maintaining strong utilization.

Key Insights

  1. Segregated lists are highly effective - They provide excellent balance between utilization and throughput
  2. First-fit is superior for throughput - In our workload, first-fit was 30-40% faster than best-fit with minimal utilization loss
  3. Immediate coalescing pays off - Reduced fragmentation more than compensates for the slight slowdown in free()
  4. Alignment is critical - 16-byte alignment is mandatory for modern architectures (SSE/AVX instructions require it)
  5. Testing is essential - The heap checker caught numerous subtle bugs that would have been nearly impossible to find otherwise

Conclusion & Future Work

What We Learned

This project provided deep insights into:

  1. Systems Programming: Low-level memory management, pointer arithmetic, bit manipulation
  2. Algorithm Design: Trade-offs between time and space complexity
  3. Performance Optimization: Balancing competing metrics (utilization vs. throughput)
  4. Debugging Techniques: Using gdb, watchpoints, and custom heap checkers
  5. Software Engineering: Writing clean, well-documented, maintainable code

Potential Improvements

If we were to extend this project, we would consider:

1. Adaptive Placement Strategy

Instead of pure first-fit, use first-fit for small allocations (<1KB) and best-fit for large allocations (>1KB). Small allocations are frequent; large ones can afford the search time.

2. Segregated List Optimization

Move from fixed size boundaries to adaptive boundaries based on allocation patterns. Track allocation frequency by size and adjust boundaries dynamically. Expected improvement: 2-3% utilization increase.

3. Memory Footprint Reduction

Use XOR linked lists to save one pointer and eliminate footers for allocated blocks. Expected savings: ~25% reduction in overhead.

4. Thread Safety

Add mutex locks for heap operations and implement per-thread arenas (like tcmalloc). Expected overhead: 5-10% performance decrease.

Acknowledgments

Special thanks to the course instructors and TAs for their guidance, and to the CS:APP authors (Bryant & O'Hallaron) for foundational concepts. This implementation follows best practices in systems programming and demonstrates proficiency in C, algorithms, and software engineering.