Tuesday, January 5, 2010

Memory in .NET - what goes where

A lot of confusion has been wrought by people explaining the difference between value types and reference types as "value types go on the stack, reference types go on the heap". This is simply untrue (as stated) and this article attempts to clarify matters somewhat.

What's in a variable?

The key to understanding the way memory works in .NET is to understand what a variable is, and what its value is. At the most basic level, a variable is just an association between a name (used in the program's source code) and a slot of memory. A variable has a value, which is the contents of the memory slot it's associated with. The size of that slot, and the interpretation of the value, depends on the type of the variable - and this is where the difference between value types and reference types comes in.
The value of a reference type variable is always either a reference or null. If it's a reference, it must be a reference to an object which is compatible with the type of the variable. For instance, a variable declared as Stream s will always have a value which is either null or a reference to an instance of the Stream class. (Note that an instance of a subclass of Stream, eg FileStream, is also an instance of Stream.) The slot of memory associated with the variable is just the size of a reference, however big the actual object it refers to might be. (On the 32-bit version of .NET, for instance, a reference type variable's slot is always just 4 bytes.)
The value of a value type is always the data for an instance of the type itself. For instance, suppose we have a struct declared as:
struct PairOfInts
{
public int a;
public int b;
}
The value of a variable declared as PairOfInts pair is the pair of integers itself, not a reference to a pair of integers. The slot of memory is large enough to contain both integers (so it must be 8 bytes). Note that a value type variable can never have a value of null - it wouldn't make any sense, as null is a reference type concept, meaning "the value of this reference type variable isn't a reference to any object at all".

So where are things stored?

The memory slot for a variable is stored on either the stack or the heap. It depends on the context in which it is declared:
  • Each local variable (ie one declared in a method) is stored on the stack. That includes reference type variables - the variable itself is on the stack, but remember that the value of a reference type variable is only a reference (or null), not the object itself. Method parameters count as local variables too, but if they are declared with the ref modifier, they don't get their own slot, but share a slot with the variable used in the calling code. See my article on parameter passing for more details.
  • Instance variables for a reference type are always on the heap. That's where the object itself "lives".
  • Instance variables for a value type are stored in the same context as the variable that declares the value type. The memory slot for the instance effectively contains the slots for each field within the instance. That means (given the previous two points) that a struct variable declared within a method will always be on the stack, whereas a struct variable which is an instance field of a class will be on the heap.
  • Every static variable is stored on the heap, regardless of whether it's declared within a reference type or a value type. There is only one slot in total no matter how many instances are created. (There don't need to be any instances created for that one slot to exist though.) The details of exactly which heap the variables live on are complicated, but explained in detail in an MSDN article on the subject.
There are a couple of exceptions to the above rules - captured variables (used in anonymous methods and lambda expressions) are local in terms of the C# code, but end up being compiled into instance variables in a type associated with the delegate created by the anonymous method. The same goes for local variables in an iterator block.

A worked example

The above may all sound a bit complicated, but a full example should make things a bit clearer. Here's a short program which does nothing useful, but should demonstrate the points raised above.
using System;

struct PairOfInts
{
static int counter=0;

public int a;
public int b;

internal PairOfInts (int x, int y)
{
a=x;
b=y;
counter++;
}
}

class Test
{
PairOfInts pair;
string name;

Test (PairOfInts p, string s, int x)
{
pair = p;
name = s;
pair.a += x;
}

static void Main()
{
PairOfInts z = new PairOfInts (1, 2);
Test t1 = new Test(z, "first", 1);
Test t2 = new Test(z, "second", 2);
Test t3 = null;
Test t4 = t1;
// XXX
}
}
Let's look at what's where in memory at the line marked with the comment "XXX". (Assume that nothing is being garbage collected.)
  • There's a PairOfInts instance on the stack, corresponding with variable z. Within that instance, a=1 and b=2. (The 8 byte slot needed for z itself might then be represented in memory as 01 00 00 00 02 00 00 00.)
  • There's a Test reference on the stack, corresponding with variable t1. This reference refers to an instance on the heap, which occupies "something like" 20 bytes: 8 bytes of header information (which all heap objects have), 8 bytes for the PairOfInts instance, and 4 bytes for the string reference. (The "something like" is because the specification doesn't say how it has to be organised, or what size the header is, etc.) The value of the pair variable within that instance will have a=2 and b=2 (possibly represented in memory as 02 00 00 00 02 00 00 00). The value of the name variable within that instance will be a reference to a string object (which is also on the heap) and which (probably through other objects, such as a char array) represents the sequence of characters in the word "first".
  • There's a second Test reference on the stack, corresponding with variable t2. This reference refers to a second instance on the heap, which is very similar to the one described above, but with a reference to a string representing "second" instead of "first", and with a value of pair where a=3 (as 2 has been added to the initial value 1). If PairOfInts were a reference type instead of a value type, there would only be one instance of it throughout the whole program, and just several references to the single instance, but as it is, there are several instances, each with different values inside.
  • There's a third Test reference on the stack, corresponding with variable t3. This reference is null - it doesn't refer to any instance of Test. (There's some ambiguity about whether this counts as a Test reference or not - it doesn't make any difference though, really - I generally think of null as being a reference which doesn't refer to any object, rather than being an absence of a reference in the first place. The Java Language Specification gives quite nice terminology, saying that a reference is either null or a pointer to an object of the appropriate type.)
  • There's a fourth Test reference on the stack, corresponding with variable t4. This reference refers to the same instance as t1 - ie the values of t1 and t4 are the same. Changing the value of one of these variables would not change the value of the other, but changing a value within the object they both refer to using one reference would make that change visible via the other reference. (For instance, if you set t1.name="third"; then examined t4.name, you'd find it referred to "third" as well.)
  • Finally, there's the PairOfInts.counter variable, which is on the heap (as it's static). There's only a single "slot" for the variable, however many (or few) PairOfInts values there are.

0 Please Share a Your Opinion.: