Return to index

Multiple files, And Compiling Therewith

When working on larger projects, or even well structured smaller ones, it is often advisable to break up the source code into several different files, each of which contains a small section of strongly related code -- whether that is a single function or a few.

In Java, we were mostly forced to do this, because the name of the file had to be consistent with the name of the class that it contained. In other words, we were forced to implement only one class per file. But, in C, we aren't forced into any particular code organization -- instead, we need to think about it and be smart. The goals are readability and maintainability, more generally.

Regardless, there are some details we'll need to consider in putting together multiple file projects. The first of these details is, "How do we compile?" To build a system consisting of multiple source files, we simply compile as usual, listing all of the source files. Consider the example below:

  gcc -Wall -Wextra -ansi -pedantic main.c mathlib.c -o addsubt
  

Multiple Files and Header files

Let's consider breaking our example from above into two files: main.c and mathlib.c:

main.c:

  #include <stdio.h>

  int main() {
    float a, b, c;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    c = subt (c,b);
    printf ("Difference is %f, which, incidentally, is the same as \"a\"\n", c);

    return 0;
  }

  

mathlib.c

  float add (float x, float y) {
    return x + y;
  }

  float subt (float x, float y) {
    return x-y;
  }

  

It is not great surprise that the compiler compains, much as it did before:

  main.c: In function 'main':
  main.c:9: warning: implicit declaration of function 'add'
  main.c:12: warning: implicit declaration of function 'subt'
  

But, we are actually much worse off this time. It didn't encounter the correct types before it finished compiling main.c. So, depending on the host system, this might actually link this, even if the two sets of types, "actual" and "implicit" are incompatible. This could be a big problem. Consider this broken output on an iMac. The output is wrong because the float return of the function is interpreted as an integer.

  Sum is 1075183616.000000
  Difference is 1104151936.000000, which, incidentally, is the same as "a"
  

We could fix this, as we did before, by putting the prototypes at the top of main.c. And, in fact we will. But, instead of putting them there ourselves, we're going to let the preprocessor do it for us. We are going to put them into a header, "mathlib.h". We'll then #include mathlib.h everywhere that uses any function from mathlib.c. This idiom will provide us a general solution to the problem. And, it saves us the risk of typing something wrong.

mathlib.h

  float add (float, float);
  float subt (float, float);
  

main.c

  #include <stdio.h>
  #include "mathlib.h"

  int main() {
    ...
  }
  

In the example above, notice that the system header file, stdio.h, was included using <>-brackets, but mathlib.h was included using ""-quotes. The brackets instruct the preprocessor to look for the file only within the pre-configured location for standard headers. The ""-quotes tell the preprocessor to look in the local directory before looking in the standard location. Since mathlib.h is our file, we have t use the ""-quotes.

We'll always use this idiom when we code. For every library file, we'll have a header file that we include where it is used. Also, one other note. Never, ever, ever #include a .c file, or any other file that contains actual code. This is another huge point penalty: Probably more than 10 points. Header files should only contain definitions -- not actual implementation. And implementation files should be linked together, not concatenated by the preprocessor. That generates an unmanageable mess and leaves the system vulnerable to a whole host of build problems and bugs.

Guarding Against Multiple Includes

In complex builds, it is possible that the same header file will be included more than once. The problem with this is that the compiler will run into the same definitions more than once. And, when it does, it'll warn about multiple definitions.

To prevent this from happening, we ask the preprocessor to include the file only if it hasn't done so already. To achieve this, we make use of the "if no define" and "define" directives. The first time the preprocessor encounters the file, we define a pre-processor macro. We put the entire content of the header file within an if-statement that excludes the code in the event that this macro is defined. In this way, the preprocessor only processes each header file once, at which time it defines the macro, which is, in effect, an annotation that flags it not to include the content of the file again.

Notice the name of the macro: MATHLIB_H. This is the convention: Capitalize the name of the file and use an _underscore in place of the .dot.

mathlib.h

  #ifndef MATHLIB_H
  #define MATHLIB_H

  float add (float, float);
  float subt (float, float);

  #endif /* No code below here */
  

Multiple Files, and Global Variables

Okay, now let's consider using global variables across multiple files. Consider the example below:

mathlib.c

  float lastresult;  
  
  float add (float x, float y) {
    lastresult = x + y;
    return lastresult;
  }

  float subt (float x, float y) {
    lastresult = x-y;
    return lastresult;
  }

  float getlastresult() {
    return lastresult;
  }

  

mathlib.h

  #ifndef MATHLIB_H
  #define MATHLIB_H

  float add(float, float);
  float subt(float, float);
  float getlastresult();
  
  #endif 
  

main.c

  #include <stdio.h>
  #include "mathlib.h"

  int main() {
    float a, b, c, d;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    d = getlastresult();
    printf ("The last result was: %f \n", d);

    return 0;
  }

  

Now, if global variables are truly global, we should be able to use the one defined within mathlib.c from within main.c, right? Yep. We can. But, we've got a problem if we try. Consider the following main.c. Notice that it tries to use the "result" defined within mathlib.c

  #include <stdio.h>
  #include "mathlib.h"

  int main() {
    float a, b, c, d;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    printf ("The last result was: %f \n", lastresult);

    return 0;
  }

  

The compiler gets angry with us:

  main.c: In function 'main':
  main.c:14: error: 'lastresult' undeclared (first use in this function)
  main.c:14: error: (Each undeclared identifier is reported only once
  main.c:14: error: for each function it appears in.)
  

What's the problem? If it is global, why does the compiler complain that it is undeclared? The problem is that the compiler compiles one file at a time. And, in processing main.c, it can't see the global variables declared within mathlib.c. It doesn't even know that they are there. And, since the individual files are seaprate until the linking step, the order of compilation doesn't matter -- it won't be remembered from one file to the next.

The fix for this is to tell the compiler that the global variable exists and is in another file that the linker will find later. We do this with the "extern" keyword:

  #include <stdio.h>
  #include "mathlib.h"

  extern float lastresult;

  int main() {
    float a, b, c, d;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    d = getlastresult();
    printf ("The last result was: %f \n", lastresult);

    return 0;
  }
  

But, as with the other definitions associatiated with mathlib.c, we don't really want to have to do this each time we use mathlib. So, we include the extern in our header file, so it gets included everywhere that we are using mathlib, along with everything else:

  #ifndef MATHLIB_H
  #define MATHLIB_H
  extern float lastresult;

  float add(float, float);
  float subt(float, float);
  float getlastresult();

  #endif

  

Now For Some Weirdness

What we are going to do below is wrong. Don't do this. Do what we did above. We just do this to learn about linking. We don't do this to learn how to code nicely.

What if, instead of extern'ing "lastresult", either in main.c or within mathlib.h, we just redefine (note: redfine, not extern) it within main.c:

  #include <stdio.h>
  #include "mathlib.h"

  float lastresult; /* this is a repeat fo the definition within mathlib.c */

  int main() {
    float a, b, c, d;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    d = getlastresult();
    printf ("The last result was: %f \n", lastresult);

    return 0;
  }

  

As it turns out, this works. Upon linking, the linker notices two identical definitions of the global variable and smashes them down to one.

But, just for a bit more fun, let's modify our mathlib.c so that it initializes "lastresult". Please note, we're leaving the definition within main.c, above, exaclty as it was, present -- but without the initialization.

  float lastresult = -1; /* Notice this initalization */

  float add (float x, float y) {
    lastresult = x + y;
    return lastresult;
  }

  float subt (float x, float y) {
    lastresult = x-y;
    return lastresult;
  }

  float getlastresult() {
    return lastresult;
  }

  

Okay. So, what's the big deal? it still works -- did we really expect anything different? Well, maybe not, but let's make one more change. Let's leave that initialization in mathlib.c, just as we see it above, and also mimic it exactly in main.c as below:

  #include <stdio.h>
  #include "mathlib.h"

  float lastresult = -1; /* Notice, now initialized to -1 in both places */

  int main() {
    float a, b, c, d;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    d = getlastresult();
    printf ("The last result was: %f \n", lastresult);

    return 0;
  }
  

So, what does our friend, the compiler, think? Well, it looks like the linker has some issues.

  /usr/bin/ld: multiple definitions of symbol _lastresult
  /var/tmp//ccaJeNXt.o definition of _lastresult in section (__DATA,__data)
  /var/tmp//ccoBUcZ0.o definition of _lastresult in section (__DATA,__data)
  

Weak Symbols vs Strong Symbols

Symbols are the names associated with functions, variables, &c. The names are assocaited with the actual objects via a data structure maintained by the compiler known as the symbol table. The symbol table keeps the symbol, the location of the object in memory, and its size. It also notes whetehr the symbol is strong, or weak.

Unless special non-ortable compiler black magic is performed, it works liked this:

When the linker encounters an apparent redefinition of multiple weak symbols, it can take any of them. In the context of uninitialized global variables, this makes sense -- they are all equivalent.

When the linker encounters a strong symbol and one or more weak definitions of the same symbol -- it takes the strong symbol. The reason for this is easily illustrated with global variables. If we have one that is uninitalized and one that is initialized, it makes sense to keep the initialization.

If the linker encounters two or more strong definitions, it fails. The reason for this is that the two different initializations might be conflicting. It can't take either, wihtout changing the source in one .c file or the other .c file might be initialized.

It is important to note that the liker can't see the actual initialization. It is looking only at the symbol table. So, it doesn't know if the actual initial values are conflicting or not. So, since it doesn't know that they are the same, it does the only safe thing -- it breaks. And, this makes the most sense, really. It would make no sense if changing the initialized value of a global variable could enable or break a compile. And that could happen if did actually allow two strong definitions to be collapsed into one in the special case that they happened to have the same initial value.

This is why our multiple defintion of "lastresult" failed when they were both inialized, even though it was to the same value -- and didn't when exactly one of the two was initialized or neither was initialized. When they were both initialized, even though it was to the same value, they were both strong symbols and therefor in conflict at link time.

"static" Global Variables

The qualifier "static" is one of the most, if no the most, overloaded reserved word in C. Its apparent meaning changes depending on the context in which it is used. The first use of "static" that we'll examine is its use as a qualifier for a global variable.

When "static" is used as a qualifer for a global variable, it means that the "static" global variable can only be used within the file in which it is declared.

Let's go back to our lastresult/getlastresult() example from above. Let's say that we want to force users of out library to call getlastresult() instead of accessing "lastresult" directly. We might want to make "lastresult" off-limits, for example, to make sure that a caller doesn't accidentally assign it a value and, thereby, break our library's state.

We can achieve this by declaring "lastresult" a "static", as seen in the example below:

  #include <stdio.h>
  #include "mathlib.h"

  extern float lastresult;

  int main() {
 
    extern float lastresult;
  
    float a, b, c, d;

    a = 5.5;
    b = 7.5;

    c = add (a,b);
    printf ("Sum is %f\n", c);

    d = getlastresult();
    printf ("The last result was: %f \n", lastresult);

    return 0;
  }
  

mathlib.c

  static float lastresult;  
  
  float add (float x, float y) {
    lastresult = x + y;
    return lastresult;
  }

  float subt (float x, float y) {
    lastresult = x-y;
    return lastresult;
  }

  float getlastresult() {
    return lastresult;
  }
  

mathlib.h

  #ifndef MATHLIB_H
  #define MATHLIB_H

  float add(float, float);
  float subt(float, float);
  float getlastresult();
  
  #endif 
  

When we attempt to compile this, we see that main.c compiles as it did before -- the extern enables it to compile, even without the "lastresult" object. But, the linker does not try to use the "static" "lastresult" from mathlib.c to satisfy the need for a "lastresult" in main.c. And, since no other object is available, the link fails:

"static" Functions

"static" functions are very similar to "static" global variables. By placing the "static" qualifier function before a function, we can lock it down so that it can only be called by other functions within the same file. In some respects, this is analagous to a "private" method in Java, whcih can only be called by other methods within the same class.

"static" Local Variables

As discussed earlier, local variables are normally "automatic" variables. Their normal lifetime is the duration of a function call. Each time a function is called, it gets brand-new local variables that are cleaned up upon the function's return.

But, if we use the "static" qualifier before a local variable, the local variable is not an "automatic" variable. it is not allocated on the runtime stack. Instead, it is allocated in the same space as the program's global variables. It isn't a global variable -- its scope is still limited to the function. But, since it isn't created and destroyed with each function call, it persists across functions.

These are considered "static" variables, because like "static" global variables, they are not allocated on the stack and have a lifetime spanning the program's execution. And, like static global variables, they are limited in where they can be used. But, it is probably easiest to think of "static local variables" and "static global variables" as two different things.

Consider the example of static local variables below. Look specifically at count(), which simply returns a monotonically increasing number with each call. Notice that the intialization, which is no more required than it is for any other variable, happens only once -- when the program, itself, is being loaded.

Notice that, when run, the value returned by count() starts out at zero and increased by one each time the function is called: 0, 1, 2, 3, 4, 5, 6, ... Unlike an "automatic" local variable, the kind that you get without the "static" qualifier, the variable persists across calls and maintains its value.

mathlib.c

  static float lastresult;  

  int count() {
    static int currentcount=0;

    return currentcount++;
  }
  
  float add (float x, float y) {
    lastresult = x + y;
    return lastresult;
  }

  float subt (float x, float y) {
    lastresult = x-y;
    return lastresult;
  }

  float getlastresult() {
    return lastresult;
  }
  

main.c

  #include <stdio.h>
  #include "mathlib.h"


  int main() {

    printf ("%d\n", count());
    printf ("%d\n", count());
    printf ("%d\n", count());
    printf ("%d\n", count());
    printf ("%d\n", count());
    printf ("%d\n", count());
    printf ("%d\n", count());

    return 0;
  }

  

mathlib.h

  #ifndef MATHLIB_H
  #define MATHLIB_H

  int count();
  float add (float, float);
  float subt (float, float);
  float getlastresult();

  #endif /* No code below here */
  

Above Program's Output:

  0
  1
  2
  3
  4
  5
  6