More on Functions

Read time: 51 minutes (12764 words)

Last time, we reviewed the basics of functions and their use as a decomposition block in programming. This time, we want to look further at how functions interact with their environment, and how we call functions into life.

Structure of a function

A basic function is a simple container that holds a block of code we want to activate from time to time. As we say last time, this container can live within a single program file, or be split off into a file of its own, often with other supporting routines needed to make the primary function - well - function!

We define an interface to the function that sets us up to split the function off and have other teams of programmers built the function with a guarantee that we can hook things up at a later time. Let’s look closer at that interface.

The prototype revisited

The protype is everything needed for the compiler to make sure your function is being used properly. The code is not necessary, but everything on that first line in a function definition is neded. When we split off the code (everytthing inside of the curly brackets - including the brackets, we simply place a semicolon and you have the function prototype.

Normal (if there is such a thing) functions return a single value as a result of their work. This is the mathematician’s view of a function. We define this return type as part of the prototype. We also provide the function name, and a parameter list we eill use to pass information to the function. Those parametrs are normally used by the function in its processing, when we call it into life.

Two questions crop up right away: What if I need more than one value to be returned, and what if I have a bunch of data to pass to the function?

Parameters in detail

There are actually two kinds of parameters we can pass to a function: Value Parameters and Address Parameters. The simplest of these is the value parameter.

Value parameters

The simple definition of a value parameter is just a parameter that receives a copy of some information from the caller’s world. The caller places a reference to a variable they can access in the right spot in the parameter list on a call to our function. The system will look up the current value of that variable, and copy it into a place where the function can access it. We can also place a literal value in the parameter list if the parameter is a value parameter. Here is an example:

#include <cmath>
...
float mySin(float angle) {
    // do stuff with angle
    return sin(angle);
}
...

float myAngle = 10.0;
float sinX = mySin(myAngle);
float sinZ = mySin(10.5);

Which gives this:

sinX = -0.544021 sinY = -0.544021

In this example, the mySin function is defined with a single formal parameter named angle, which must be a float. In the first call to this function, we pass in the current value from the myAngle variable through the interface.

What that means is that when the mySin function wakes up, it finds an initialized local variable that it knows by the name angle that it can play with. The initial value is whatever was in the caller’s myAngle variable at the time of the call.

If you think about it, using this scheme, there is no way for the mySin function to alter the value of the caller’s myAngle variable. To the mathematician, this is a good thing, since altering anything in the caller’s world is called a side-effect, which is something we do not really want - most of the time.

It would be nasty for the caller to discover that their variable had been messed with just because they asked us for the sin of that angle!

How do we set up value parameters? Well, it is easy, because we do not need to do anything special, this is the default kind of parameter for all single valued data types (that leaves out arrays as we shall see in a moment!)

Address parameters

If we need to pass a large amount of data to a function, it is inefficient to send down a copy, so we send down information the function can use to locate the parameter directly, and let it play with the caller’s data! This is not necessarily a good thing, since the function could do damage to something it just needs to peek at, but we may have no option but to do this.

Formally, what we send to the function is a reference to the caller’s data, but it really is just a special container containing the physical memory address where the data lives! Since we let the function know where the caller’s data is in memory, they can use that information to modify the caller’s data, or just look at the values.

Setting up this kind of interface is quite easy. Here is our example from above, modified for reference parameters.

float mySin(float & angle) {
    // do stuff with angle
    return sin(angle);
}
...

float myAngle = 10.0;
float sinX = mySin(myAngle);

Can you spot the change. It is a single character between the data type and the parameter name. With this character in place we can translate the parameter declaration by reading it backward and using is the address of when we read the & character.

float & angle

This becomes

angle is the address of a float

Which float? Why whatever the caller provides on an actual call.

Trying to compile the original code will result in an error. The first call will work fine, since we called mySIn with a variable name. But the second call will fail. We are saying that we are sending down the address of a container that the function can use. That last call to the function does not pass in a variable name - it passes down a literal value, which has no address! So this last call will not work.

Now, if I let my mySin function play badly, I can alter the caller’s world:

int mySin(float & angle) {
    angle = 20.0;
    return sin(angle);
}

Here I am doing two things: returning a value the normal way, and returning a value through the caller’s variable. If you do this kind of thing it is vital that you document what you are up to - the user of your function will speak kindly of you if you do this!

Here is my output from this example:

myAngle before the call = 10
sinX = 0.912945
myAngle after the call = 20

Passing arrays to functions

We will go over arrays in detail later, but you should be familiar with array basics. Since arrays can involve a large number of data items, the designers of the array data type decided that arrays would always be passed to functions as address parameters. You do not need to specify the & character to set this up.

Remember, when we send down an array to a function, we are only providing the function with the location of the first element of the array, and the data type of each element. The function has no way to figure out how many items are in that array, so we always include another parameter to tell it where to stop:

void displayArray(int data[], int size) {
    for(int i=0;i<size;i++) {
        cout << i << " " << data[i] << endl;
    }
}

Here the notation on the parameter list simply tells the compiler that we will provide an array in this parameter spot, and that the elements are integers. The second parameter is our statement of how big the array is.

Please note that in telling the function where to stop, we could provide a bad piece of information. The function will happily continue printing out garbage if it is asked to display more data items than are really there.

It is much more serious to fill up an array with data past where the data really ends. This problem is called a Buffer overflow, and writing a program that does not protect itself from this kind of silliness is asking a hacker to use your program to do no good! Many viruses are injected into system by programs that are badly written this way! Default arguments (parameters)

It is possible to set up functions where certain parameter values have some predefined default value. If you do not want to change that value, you can leave that parameter off. This feature can make your code difficult to read and understand so you should use this feature carefully. Here is what it looks like:

float mySin(float angle = 10.0);

...

float sinX = mySin();

Since I did not provide a parameter here, the default value will be used when the function wakes up. If you have more than one parameter, those parameters that use defaults must be listed after any that have no defaults. When you call these functions, you cannot leave off parameters in the middle. Altogether, this is just so confusing, it is not recommended that you write code this way!

How do functions work?

When we call a function, the system magically stops whatever it was doing (processing your code) and puts its finger on the spot where it was, then branches off to the code for the function taking along the parameters it needs to do the required work. The system stays inside the function until it finishes that work, then we either fall off the end of the function code, or we process a return statement. In either case, we go back to the spot where we put our finger (so to speak) and continue processing as though nothing had happened. If we brought back a return value, and we are in the middle of an expression, that value will be used as part of the expression evaluation. If we brought back a value and we were called as a function call statement, we simply throw the value away.

How does all this magic happen?

Stacks

The program maintains a simple data structure called a stack that is much like the stack of trays in a cafeteria. When you place an item on the stack it covers all items placed there before, when you want an item off of the stack, you must take the item off of the top. Hmmm, how does this help?

When we call a subroutine, the processor knows what address the instruction it is currently processes live at. It also knows where the address of the very next instruction is - the one it would normally do next. When we call a function, the processor knows the address of the function as well. What happens is simple. It places the address of the next instruction to process normally on the top of the stack, and uses the function address to figure out what to do next. The result is we wander off to the code for the function, but we left behind the old address on that stack. When we get to the end of the function, we simply grab the address we need to go to next from the top of the stack and back we go. This is so simple it is amazing.

We can even call another function from inside the first function, and find our way back from inside both functions properly. Think about it and make sure you can follow this idea.

Using stacks for other purposes

Now, once we are inside a function, what should we do if we need a little space for variables that should only be known to the function? The answer is also simple, just assign them space on the stack as well. When we declare variables inside a function - variables known as local variables, they get space on the stack only as long as the function has control. When the function gets ready to quit, those local variable containers are released, and we return to the point where we were called.

It is important to realize that just because you call a function twice, there is no guarantee that the initial value in any local variables has the value you left there on the first call. Some other function may have used that stack space between your calls.

Global Variables

There is another kind of variable that can be used to pass information between functions, but it’s use is discouraged - global variables.

Here is the complete code for the value parameter example we discussed earlier:

#include <cstdlib>
#include <iostream>
#include <cmath>

using namespace std;

float myAngle = 10.0;

float mySin(float angle) {
      return sin(angle);
}

float sinX = mySin(myAngle);
float sinY = mySin(10.0);

int main(int argc, char *argv[])
{
    cout << "sinX = " << sinX << endl;
    cout << "sinY = " << sinY << endl;

    return EXIT_SUCCESS;
}

Look this over and see where things are defined. Most of the time, we define variables inside functions that need them. These are called local variables which means they are local to the function where they are defined. Other functions cannot see those names, and cannot access them either.

If we define variables outside of any function, we call those variables global, and all functions can see those variables and use them. (Technically, only those functions defined below the variable declaration can access them, but you can fix this by moving the global variables to the top of your code, above any functions definitions).

Why use global variables?

Well, the short answer is don’t! Many programmers discourage the use of globals, since the code becomes very non-portable if you use them. Consider the example of a function written to access the data it needs from a global variable. If you find this routine handy, and want to use it in another program, you need to hook your function into that new program and you are done, right? Nope!

You must then modify your new program and provide the required global variable. What happens if that name is in use in that new program already? Well, you can see how the problems keep multiplying. Ultimately, you must provide the code of your function to the programmers in the new project, because they might need to modify your names! Huge problem!

If you write a program that uses no global variables, you do not need to provide the source code with your function, just a linkable object file. This can be handy if you want to protect your source code from prying eyes! Overloading functions

We have mentioned this concept earlier in our study. If you provide a different signature for a function, that is a different prototype, you can define two (or more) functions with the same name but different parameters, and/or return types. This is handy when you want to provide the same functionality for different data types. Like this:

float mySin(float angle);
double mySin(double angle);

As you might suspect, the code inside these two functions will be very similar. So similar in fact, that C++ provides us a way to generate functions where the only difference is in data type. We will see this templating feature later in the course.