4. Variables

4.1. What is a variable?

A variable in mathematics represents a possibility of a value. If a range is designated to a variable, the variable may take one of values in the range. The selection of the value must be consistent throughout a proof.

A variable in functional and logical programming languages is similar to that in mathematics. A variable name is bound to a value. Once bound, it never changes. Note that the same name may be bound to a different value in the different environment.

A variable in imperative (procedural) programming languages is a named box to store a value. A variable name is bound to a certain memory area. The value in the memory area may change during execution of the program.

_images/variable-binding.png

Difference of variable binding

In the remaining part of this section, we mainly deal with variables in imperative (procedural) languages.

4.2. Type

Many languages such as Fortran, Pascal, C and Java have a type as a property of a variable. In these languages, a type must be declared for a variable before its use, and the variable can only store values of that type.

On the other hand, languages such as Lisp, JavaScript, Python, PHP and Ruby have no types for variables. A variable can store any value.

We will learn about types in detail next week.

4.3. Left value

A memory area is the essential part of a variable. A value is stored there during program execution.

If a variable name occurs in a program, it is evaluated as follows:
  1. Get an address of a memory area which is bound to the name.

  2. Get a value from the memory area.

If a variable name occurs in the left side of an assignment statement, however, the result of evaluation should be an address to store a new value. Sometimes we call an address as a left value.

4.4. Memory management

Compilers (or interpreters) allocate some memory area for each of the variables. There are several memory management methods as described below.

4.4.1. Static data area

A static variable is a variable that exists across the entire execution of the program. It is usually allocated in the static data area at compile time.

One of the merits of type declaration is that the required amount of memory can be calculated at compile time based on the type declared. If we have no type declaration, the memory area for a variable usually contains a pointer that refers to a heap area (see below) where we can allocate required amount of memory at run time.

4.4.2. Stack

A stack is a first-in-last-out data structure. It is often used for the area to allocate local variables. When a fuction is called, parameters and local variables are allocated at the top of the stack. When that function returns, the allocated area is discarded.

  • Recursive call can be easily implemented.

  • When using dynamic scoping, we can find the valid bindings by searching downwards from the top of the stack. It is easier to implement than static scoping.

  • A variable which occurs in a closure should not be allocated in the stack because the variable will be used when the closure is called afterwards (see extent).

4.4.3. Heap

A heap is a data structure in which we can allocate some amount of memory when required and release it when not necessary any more. The released area will be reused for other purposes.

Some languages require the programmers to write a code for releasing unnecessary area. They often (very often) forget to write the code, resulting a situation where we have no new area to allocate even if the actually used area is small. We call this situation as a memory leak.

In order to avoid the memory leak, some languages provide garbage collection which automatically releases unnecessary area. We can determine a certain area is unnecessary if there is no refrence to the area.

Followings are the typical garbage collection methods:

Reference count

A counter is attached to each memory area. When someone starts to refer the area, increment the counter. When stopping to refer, decrement it. If the counter becomes 0, release that area.

Mark and sweep

First, we put a mark on every area reachable by following references from areas obviously necessary (e.g. an area bound to a variable name). Next, we sweep the entire heap from end to end, and release areas that are not marked.

Copying

We divide the heap into two, and use only half. When there is no available area to allocate, we copy the necessary area to the other half by following references starting from obviously necessary areas. This method achieves compaction as well as garbage collection.

Garbage collection usually requires to stop the execution of the program during it takes place. Since it is inconvenient for real-time systems, there are devised garbage collection methods that can be done in parallel with the execution of the program.

Exercise 7

Suppose we use reference count garbage collection. In some cases, the counter of unnecessary area will not become 0. Give an example.

[Answer]

Exercise 8

Explain why we should stop the execution of the program when using mark-and-sweep garbage collection.

[Answer]

4.5. Extent

An extent is a time period during which a variable holds its value. Note that a scope and an extent are different in general.

In most languages, variables have the following extent:

global variables

during the entire program execution.

local variables

during the function (or subprogram unit) execution.

instance variables

during the instance exists.

Static local variables

In languages such as C and PHP, a local variable declared as static has a scope of that function, but its extent is the entire execution of the program.

function test()
{
    static $a = 0;
    echo $a;
    $a++;
}

Exercise 9

Give an example where a static local variable is effectively used.

[Answer]

Note

When a function is excuted as a closure (we will study it later), the extent of local variables declared in it will be extended. In addition, they must be allocated on the heap instead of the stack.