Fundamentals of Computing with .NET

This Spring I had a programming mentorship through the F# Software Foundation with the talented Onur Gumus who showed me a number of functional programming concepts using F#. In addition, we also had a lesson on the fundamentals of computing in general, and I thought it might be helpful to share what I learned here.

CPU and Memory

In DOS, conventional memory is 640kb. Computer memory is like postboxes, and each address can take one byte (a number between 0-255). Combining addresses can give us more bits —> bigger numbers:

Values stored at different memory addresses

For modern OSes, 32-bit and 64-bit processes are stored in different areas. Each process is isolated from one another, but in DOS this wasn’t the case. Modern OSes translate physical memory addresses into virtual memory, and each process thinks it owns the physical memory:

32-bit: Thinks it has 4GB
64-bit: Thinks it has 128TB

If you run out of physical memory, then the OS uses a swap file on the hard disk, but swap is much slower than RAM.

The CPU can read/write things to memory:

Relationship between CPU and memory

The CPU also has boxes called registers (Rax, Rvsx, Rcx, etc. for Intel CPUs). We move things from memory to the CPU registers so we can change values, do math, etc:

Loading values into the CPU registers and storing the result in memory

Value and Reference Types

The next concept to be aware of is there are two types of program memory: stack and heap. The stack grows downwards in memory while the heap grows upwards (to avoid addressing conflicts):

Program memory

Consider the following snippet of code:

int x = 1;
int y = 2;
int z = x + y;

Inside our machine, the following happens:

CPU and stack pointer

The interesting thing happens when we call another function. Every function has its own scope, and the compiler reserves space by incrementing the stack pointer to always be just past all local variables. When a function returns, it needs to know where to return to, so we push the return address to the stack. Once the second function returns, we want to return (change the instruction pointers location) to the original caller. Here’s an example:

The CPU and stack

But what happens if we need the return value + 3? We can store the value in the heap by writing 3 and 5 into the heap and then point to their locations on the stack. Keep in mind that whereas the stack can push and pop items without needing to cleanup, heap cleanup is not free. In unmanaged languages such as C, you must manually free memory when it is no longer needed on the heap. In C#, the garbage collector checks if heap values are in use or not by walking a tree graph.

Consider the following C++ code:

// In C++ struct = class, except class can be private
class P {
    int x;
    int y;
}

P p; // This puts p into the stack
P *p = new P(); // We put P in the heap if we want to preserve it

This results in the following in memory:

Heap pointer in the stack

In C#, the decision is made when you declare the type:

// Not garbage collected
public struct P {
    int x;
    int y;
}

// Garbage collected
public class P 
{
    int x;
    int y;
}

Most business applications use classes, and 99% of the time the decision to use struct over class is a performance one. Since values are stored directly on the heap for structs, you get the following performance benefits:

No looking up pointer values in the heap.
No garbage collection, so faster.

Note that in .NET, a value type cannot be null because it is the absence of a value, eg:

var foo = new Foo();
int x = 0;
Foo anotherFoo = null;

Null pointer

Whereas int, char, double, and bool are value types, a string is a char array whose values are stored on the heap.

Data Structure Basics

Arrays are the fastest data structure because each array is a contiguous list in memory. C#‘s list implementation is an array under the hood. Inserting into a list is fast, but removing is costly because we need to shift the remaining values in memory so that they remain contiguous:

Deleting an element from an array

Consider the following C# code:

public class P {
    public int x {get; set;}
    public int y {get; set;}
}

public class Program
{
    public static void Main()
    {
        List<P> pointsList = new();
        pointsList.Add(new P {x = 5, y = 10});
	pointsList.Add(new P {x = 6, y = 8});
	pointsList.Add(new P {x = 1, y = 7});
    }
}

Internally, List<P> is an array of pointers pointing to the heap (where the actual values are stored):

Values in the heap

Note that while the values in the heap are non-contiguous, the stack pointer addresses are.

Linked lists are suboptimal because they are not contiguous in memory, so it can take longer to load them into the CPU registers and you pay a performance penalty.