· tutorials · 5 min read
Fundamentals of Computing with .NET
This Spring I had a programming mentorship through the F# Software Foundation with the talented Onur Gumus who showed me a number of functional programming concepts using F#. In addition, we also had a lesson on the fundamentals of computing in general, and I thought it might be helpful to share what I learned here.
CPU and Memory
In DOS, conventional memory is 640kb. Computer memory is like postboxes, and each address can take one byte (a number between 0-255). Combining addresses can give us more bits —> bigger numbers:
For modern OSes, 32-bit and 64-bit processes are stored in different areas. Each process is isolated from one another, but in DOS this wasn’t the case. Modern OSes translate physical memory addresses into virtual memory, and each process thinks it owns the physical memory:
- 32-bit: Thinks it has 4GB
- 64-bit: Thinks it has 128TB
If you run out of physical memory, then the OS uses a swap file on the hard disk, but swap is much slower than RAM.
The CPU can read/write things to memory:
The CPU also has boxes called registers (Rax, Rvsx, Rcx, etc. for Intel CPUs). We move things from memory to the CPU registers so we can change values, do math, etc:
Value and Reference Types
The next concept to be aware of is there are two types of program memory: stack and heap. The stack grows downwards in memory while the heap grows upwards (to avoid addressing conflicts):
Consider the following snippet of code:
int x = 1;
int y = 2;
int z = x + y;
Inside our machine, the following happens:
The interesting thing happens when we call another function. Every function has its own scope, and the compiler reserves space by incrementing the stack pointer to always be just past all local variables. When a function returns, it needs to know where to return to, so we push the return address to the stack. Once the second function returns, we want to return (change the instruction pointers location) to the original caller. Here’s an example:
But what happens if we need the return value + 3? We can store the value in the heap by writing 3 and 5 into the heap and then point to their locations on the stack. Keep in mind that whereas the stack can push and pop items without needing to cleanup, heap cleanup is not free. In unmanaged languages such as C, you must manually free memory when it is no longer needed on the heap. In C#, the garbage collector checks if heap values are in use or not by walking a tree graph.
Consider the following C++ code:
// In C++ struct = class, except class can be private
class P {
int x;
int y;
}
P p; // This puts p into the stack
P *p = new P(); // We put P in the heap if we want to preserve it
This results in the following in memory:
In C#, the decision is made when you declare the type:
// Not garbage collected
public struct P {
int x;
int y;
}
// Garbage collected
public class P
{
int x;
int y;
}
Most business applications use classes, and 99% of the time the decision to use struct over class is a performance one. Since values are stored directly on the heap for structs, you get the following performance benefits:
- No looking up pointer values in the heap.
- No garbage collection, so faster.
Note that in .NET, a value type cannot be null because it is the absence of a value, eg:
var foo = new Foo();
int x = 0;
Foo anotherFoo = null;
Whereas int, char, double, and bool are value types, a string is a char array whose values are stored on the heap.
Data Structure Basics
Arrays are the fastest data structure because each array is a contiguous list in memory. C#‘s list implementation is an array under the hood. Inserting into a list is fast, but removing is costly because we need to shift the remaining values in memory so that they remain contiguous:
Consider the following C# code:
public class P {
public int x {get; set;}
public int y {get; set;}
}
public class Program
{
public static void Main()
{
List<P> pointsList = new();
pointsList.Add(new P {x = 5, y = 10});
pointsList.Add(new P {x = 6, y = 8});
pointsList.Add(new P {x = 1, y = 7});
}
}
Internally, List<P>
is an array of pointers pointing to the heap (where the actual values are stored):
Note that while the values in the heap are non-contiguous, the stack pointer addresses are.
Linked lists are suboptimal because they are not contiguous in memory, so it can take longer to load them into the CPU registers and you pay a performance penalty.