Smallpearl - Typesafe Logging

Problem

One of the common sources of runtime errors in C++ comes from using the good old vararg function types -- printf() or its variants. The problem stems from type mismatch between the type specifier in the format string and the actual arguments supplied to match this format string. Though most of the present-day software do not employ the printf function for user interaction, a variant of it is quite heavily used -- sprintf() for outputting log messages.

Often log messages are written in response to an error condition and formal testing does not always simulate all the possible error conditions which means some code goes untested. Also, log messages typically have an associated logging level which is usually set to a medium value by default. Testers often do not set the logging level to the lowest possible level before embarking on the testing process meaning that messages at the lowest level are also not exercised during testing. Another real-world situation is when the released code is maintained by support engineers who adds one or more arguments to the log statement for their own debugging purposes, but fails to test it thoroughly.

This post discusses a simple technique of adapting the superior C++ streams infrastructure as an extended logging framework. The mechanism is built such that it can sit on top of an existing legacy logging framework so that introducing it would not require rampant system-wide code changes.

As the reader might be aware, C++ puts lot of emphasis on writing type-safe code and C++ streams provides just this. (Check out Stroustrup's tips on how to avoid unsafe code here and one of his tips is to avoid using vararg functions).

Design Goals and Constraints

To reiterate, this is a solution which has to fit into an existing logging framework. The framework can be thought of the providing the following features, which is typical of many legacy logging infrastructure.

Multilevel logging with a global system wide logging level
Each message has an attached logging level and is only output if the global logging level is equal or lower than this level.
Provides printf() like function to concatenate multiple argument values into a log statement.

Given this, the design goals and various constraints can be summarized as below:

Goals

Provide a type-safe logging framework by leveraging on the C++ streams framework
Maintain backward compatibility by retaining the old logging infrastructure which means that the log output should be the same for both types of logging.
All log messages should get output to the destination as soon as the statement is executed before the next line is is executed.
Maintain logging levels as in the legacy framework and integrate it as an implicit feature.

Constraints

Can only utilize on the STL libraries which is available on most platforms and compilers
Don't use C++11 language features as certain compilers do not support it yet.

Solution

To anyone familiar with the STL library and its streams facility, the excellent stringstream class can be applied effectively to compose and output the log message. This allows values of multiple variables/objects to be captured as a string through a single statement. A typical logging code using this (one without any explicit framework) could look like:

std::stringstream ss;
ss << "Error creating connection to the given address, error code: " 
    << ::GetLastError() << endl;
g_logger->log(LOG_LEVEL_ERROR, ss.str());

Where g_logger is a global instance of a class that provides the legacy logging framework through its vararg methods. While this would definitely work, it has couple of shortcomings:

Every log statement requires three lines of code. First one to declare a local variable, second to compose the log message and third to write it to the log destination.
There's a necessity to declare a local variable which means that two log statements in the same block of code have to be using two separate locals. Alternatively, each log statement has to be in its own block.

An alternative implementation could be to declare a globally accessible instance of a specialized STL stream (which could implement its own stream buffer so as to redirect output to the legacy module-wide logger instance), say g_ls, to which data can be written as below:

using namesapce std;
g_ls << "Error creating connection to the given address, error code: "
    << ::GetLastError() << endl;
g_ls << flush;

This has the drawback that the programmer needs to call std::flush() after every logging statement so as to achieve the stated goal of immediate output to the logger destination before the execution of the next line of code. Not to mention the additional work of specializing the STL stream with its own stream buffer. And as in the previous example, we've to rely on the programmer being disciplined enough to make a call to flush() repetitiously after composing every log message.

Both the possible approaches presented above have the major drawback that it requires conscious and disciplined action from the user to make the logging work as per the goals set forth earlier. And this involves repetitious coding practice that is best avoided if robust and resilient software is the desired result.

What we need is a mechanism which upon entering a statement provides a blank slate for users to accumulate their messages in and upon exit dumps the content to the log destination. Which means we need an entry and exit trigger where we can plug in the necessary code to interface it with the legacy logging framework. Naturally, the object constructor and destructor comes to mind. Upon construction we get an object which acts as a blank slate where different messges can be stored and the destructor can then dump the slate contents to the logging medium. This can be implemented as:

class AutoLogger {
    int loglevel_;
    std::stringstream ss_;

    AutoLogger(); // hide default constructor!
    AutoLogger(const AutoLogger&); // a good practice
public:
    AutoLogger(explicit int level) : loglevel_(level) {}
    ~AutoLogger() {
        g_logger->log(level_, ss_.str());
    }
    std::stringstream& getStream() { return ss_; }
};

Thereafter we can employ it as below:

AutoLogger al1(LOG_LEVEL_ERROR);
al1.getStream() 
    << "Error creating connection to the given address, error code: "
    << ::GetLastError() << endl;

The above code meets most of our goals set forth at the beginning, except the third one that log messages should be written as soon as the statement that composed it is executed. In the above code, the buffered message would only be written when the variable goes out of scope, which is when its destructor would get called. That means code like this fails to meet the requirement refreshed above:

AutoLogger al1(LOG_LEVEL_ERROR);
al1.getStream() 
    << "Error creating connection to the given address, error code: "
    << ::GetLastError() << endl;
// more code...
al1.getStream() 
    << "Error retrieving the secondary server address"
    << ::GetLastError() << endl;

So we need a mechanism to prevent users from instantiating AutoLogger locally and this can be easily accomplished by hiding its constructor. The amended code would like like:

class AutoLogger {
    int loglevel_;
    std::stringstream ss_;

    AutoLogger(); // hide default constructor!
    AutoLogger(const AutoLogger&); // a good practice
    AutoLogger(explicit int level) : loglevel_(level) {}
public:
    ~AutoLogger() {
        g_logger->log(level_, ss_.str());
    }
    std::stringstream& getStream() { return ss_; }
};

This however now presents another problem, if constructor is hidden how is the user going to be able to instantiate an object and use it? Well, we provide a friend function to do just this. Being a friend, the function can instantiate that class and return it. So the code becomes:

class AutoLogger {
    int loglevel_;
    std::stringstream ss_;

    AutoLogger(); // hide default constructor!
    AutoLogger(const AutoLogger&); // a good practice
    AutoLogger(explicit int level) : loglevel_(level) {}
    friend AutoLogger getAutoLogger(int level);
public:
    ~AutoLogger() {
        g_logger->log(level_, ss_.str());
    }
    std::stringstream& getStream() { return ss_; }
};
AutoLogger getAutoLogger(int level) {
    return AutoLogger(level);
}

Now to use the framework, user will have to write code like this:

getAutoLogger(LOG_LEVEL_ERROR).getStream()
    << "Error creating connection to the given address, error code: "
    << ::GetLastError() << endl;

[This truly meets all the requirements that we set forth at the beginning of our discussion. You get increased typesafety and consequently no more runtime errors, log messages are written at the completion of every statement and logging levels have to be attached to every log message.]

Note that the key trick here is hiding the constructor of the AutoLogger class. This in no way restricts its usage, only its instantiation is constrained. In fact this is a common technique adopted by the class factory pattern to force users to use the factory to instantiate classes.

Additional Benefits

Besides the runtime safety benefits extolled earlier, STL streams based logging provides yet another capability that can significantly improve programming productivity and code quality. Since we're using STL streams as the interface for logging, any C++ class can be easily extended to support dumping its state to the log medium by overriding the output operator '\<\<'.

The significance of this becomes relevant when we observe how various C++ classes in a software are typically developed. Usually they are written individually, often using a console program to perform unit testing. Thereafter, when a minimum amount of class facade is written and tested, the component is integrated with the larger project where the necessary client code is written. It's beneficial for such component classes to include a method to dump their state to the logger. If the streams based logger is employed throughout, the overridden streams output ('\<\<') operator can be used to dump the object's state in both the unit-test console program (through cout or cerr) as well as the larger project (presumably through a class scuh as AutolLogger above).

As an example let's consider the following Employee class:

// employee.hpp
class Employee {
    // data
    std::string firstname_;
    std::string lastname_;
    // needs to be declared as friend so that the function
    // can directly access the class private data members
    friend std::basic_ostream<char>& operator<<(
        std::basic_ostream<char>&, const Employee&);
public:
    Employee(std::string const& f, std::string const& l)
        : firstname_(f), lastname_(l)
    {}
};
// implementation
std::basic_ostream<char>& operator<<(std::basic_ostream<char>& os,
        const Employee& emp)
{
    os << "Employee: " 
        << emp.lastname_ << ", " << emp.firstname_;
    return os;
}

With this in place, now we can unit test programs as:

#include <iostream>
#include "employee.hpp"
int main(void)
{
    Employee e("John", "Smith");
    std::cout << e;
}

And in the production code (with the above logging framework integrated):

Employee e = getAuthenticatedUser();
getAutoLogger(LOG_LEVEL_INFO).getStream() 
<< "Initializing document rights for user: "
    << e << std::endl;

Conclusion

The above example illustrates one very important aspect of the C++ language. With only the STL and clever usage of the auto object scope rules, one can develop pretty sophisticated mechanisms that can yield very robust and high quality software.

The other important aspect illustrated by this solution is the variable scope rules and how it can be effectively used to completely eliminate the possibility of memory leaks. Local scoped objects and the auto destruction semantics is one of the most powerful features of the language and effective employment of this can eliminate the need for explicit memory management in the code.