November 8, 2007 (Lecture 13)

November 8, 2007 (Lecture 13)

Memory Errors In C

By now, you guys have probably realized that the most incidious errors in C programs are very often memory-related problems. These problems are nasty because they are related to the language and environment -- not the problem that one is trying to solve.
We see memory errors in a lot of different ways, a few of which are listed below:

Allocating too little space, as might happen if a string-length is wrong, an input exceeds the expected and allocated size, or the sizeof() the wrong type is used

Walking past the end of an array (really a type of the above)
Freeing memory, then continuing to use it. This can happen, for example, if an object is freed upon detecting an error, but a caller retries to operation, or if both the caller and the callee free the same object.

"Leaking memory" by reassigning a pointer, without first freeing the object, or using malloc for a local purpose within a function, but not freeing it within that function.

So, it is pretty clear that if we use memory that is not properly allocated, one of three things can happen:

Nothing, we wrote onto unused space. This might, for example, happen if our request was rounded up to some standard size by malloc, so there is some extra at the end of the array, anyway. This type of error might seem innoculous enough, but it can be trouble. After a different feature is exercised, a recompile with optimzation, a port to a different system, a new version of tools, or anything else -- the error can suddly morph into a different more potent form.

We damage something. We scribble on top of something important. Later on our program relies on this damaged value and either crashes or generates incorrect results, or both. This situation is nasty, because the appearance of the errant behavior and the execution of the broken code are separated in time. This can make debugging challenging.

A segment fault or bus error. The pointer is bogus and doesn't point to allocated space. As a result, the hardware catches us and the OS strikes us dead.

But, what if we "leak" memory? Well, the textbook answer is that eventually the system will run out of emmory and either malloc() will fail or the program will be killed by the OS for exceeding some resource limit. And, this can surely happen.
But, thee days most VM systems are backed by not only a large amount of RAM -- but a truly huge amount of disk. Well before malloc() fails or the system kills off a process, things are likely to slow down, perhaps exponentially, due to pagging
You'll learn about paging in 15-213 and, in depth, in OS. But, to make a long story short, when a computer doesn't have enough memory, it plays a shell game and temporaily frees some memory by writing pages of memory off to disk. Then, shoudl they be needed in the future, they can be read back in -- perhaps after writing out other pages to make room. This shell game dramatically hamper system performance because the disk, which is being used in place of RAM, is much, much, much, much slower.
Those of you who were in class got to hear a story about my master's project. For expediency on morning, I used a malloc in place of a static allocation. I knew I should remove it, but never got around to figuring out how big a buffer I needed. And, I never freed it, because, well, it was there only temporaily, anyway.
Well, I forgot about it and the software rolled out to our project's sponsor. And, with large enough inputs, my software became slow. I optimized the code. I added caching. I restructured large portions of the code. I tried to improve the algorithm.
Months after my graduation, my advisor took a look at it. Puzzled by the behavior, he started using some tools to analyze the situation. And, among those tools, he used strace -- which traces system calls. He found that in just a few seconds of exeuction, brk() was called some 20,000 times. You'll recall that brk() is the system call that malloc() uses when it runs out of memory to request more from the OS.
He replaced my sloppy malloc() call with a proper static allocation -- and the problem was gone. It would have been similarly fixed if he had simply freed the allocated space at the end of the work loop. But, in truth, malloc() should only be used when a static allocation won't do. Static allocaitons are "born" with the program. But malloc() is dynamic and wastes time during execution. And, in my case, it didn't make sense to free soemthing a the bottom of a loop only to reallocated it again moments later at the top.

Valgrind

Valgrind is a tremendous tool for finding memory problems in C programs. For those who might be familiar, it is similar to IBM's Rational Purify tool. Regardless, it can help you to find tons of different problems, and, of particular concern to us:

Memory leaks
The use of unallocated pointers
Walking past the bounds of dynamically allocated arrays and other objects.
The use of uninitialized variables

It is a dynamic, or runtime, analysis tool. This means that it analyzes your code while it is actually running. Basically, when you run a program using valgrind, it, at runtime, injects its code into your program (or vice-versa, really), so that it is able to trace your code.
But, like all runtime analysis tools, it checks only the code that actually runs -- not all paths. So, in any execution, it won't find problems, for example, in error handlers that don't happen to be exercised or in features that aren't invoked.
This is different than, for example, splint, which is a static tool. It analyzes the source code, rather than the execution. But, as it turns out, unless you program using a very formal and restricted style, runtime tools generally provide a better analysis.
In class we took a look at an excellent tutorial from the kind folks at cprogramming.com. I refer you there for a primer on valgrind:

Valgrind Tutorial