As a C++ developer, you’ve mastered a high-performance programming language used to create applications in the world’s most exciting fields—from data mining and big data to self-driving cars and robotics to gaming and video. By this point you may have tackled topics like multi-threading and parallel programming. But have you ever taken a look behind the scenes to find out what happens during compilation?
The topic is worth learning about, and this article contains some of the most important details that you’ll want to know. The inner workings of the compiler can provide deep insights and improve your programming skills by helping you avoid common errors.
Steps in the C++ compilation process
Let’s cast some light on the black box of compilation by explaining in simple terms what a C++ compiler does. Being a high-level programming language, C++ makes coding easier for programmers; the nuts-and-bolts nature of low-level machine language makes it hard to write useful programs of sufficient complexity for the modern era of computing. The compiler bridges the gap between high-level C++ and machine language by converting your C++ source code into a binary file that computers can execute. The compilation process is fairly complex and can be divided into three steps:
Before the actual compilation, the preprocessor directives instruct the compiler to prepare the source code by temporarily expanding it. In C++, preprocessor directives begin with a # (hash) symbol; examples of some preprocessor directives are #include, #define, and #if. In the preprocessor stage, the compiler works with one C++ source file at a time. In the case of #define directives, the compiler replaces macros while with #if, #ifdef, and #ifndef directives, the compiler selects different parts of the text. For #include directives, it replaces the contents of the corresponding files, which are usually just declarations. Header files marked with the #include preprocessor directive can add a lot of lines to the code; the more header files you include, the longer the preprocessed output file becomes. In general, this preprocessed file is bigger than your simple C++ source code.
From the aforementioned replacements and extensions, a unified output is produced by the preprocessor. The preprocessor also inserts markers into the code to tell the compiler where each line comes from, in order to generate error messages that can be helpful for your C++ code development during the debugging process.
Compilation & assembly
In the next stage, which consists of two steps, the compiler creates an object file from the preprocessor’s output.
First, the compiler converts the pure C++ code, now stripped of preprocessor directives, into low-level assembly code. In this parsing step, the compiler optimizes the source code by pointing out syntax errors, overload resolution errors and any other compile-time errors. Even if a declaration without a definition is used, the compiler can still produce an object file from the source code, since that object file may also refer to symbols that the source code hasn’t defined.
Second, the assembler converts the assembly code from the previous step line by line into bit code, a.k.a. machine code. Compilation can actually be stopped at this point, useful if you wish to compile each piece of code separately. Object files from this process can be placed in archives called static libraries for later use; you don’t have to recompile all your source files if you change only one file.
The linker creates the final output from the object files generated by the compiler. In the process of linking the object files created by the compiler in the previous stage, the linker replaces all references to undefined symbols with their correct addresses. Without linking the object files, you would not have a working program—like an index to a book with no page numbers, it would be of little use. The linker’s next task is to create either a dynamic library or an executable file.
Linking may also generate errors, usually related to duplicate or missing definitions. This isn’t limited to definitions that you failed to write; a definition can also be missing when you forget to include a reference to a library or object file in which the linker can find that definition. Duplicate definition errors, in contrast, occur when two libraries or object files contain definitions for the same symbol.
Why understanding the compilation process is useful
With your new knowledge of the individual stages of compilation, you can better understand compiler or linker errors and avoid potential bugs in your code related to compilation. For example, if you understand preprocessing, you can make good use of header guards: code snippets used to protect the contents of the header file from multiple inclusions.
Knowing how C++ compilation works can help you look at the whole process differently and can give you more insight into processes you might otherwise take for granted in C++ development.
How to use a C++ compiler
The basic steps for building and running a C++ program are as follows:
- Create a syntactically correct C++ source file with the help of an editor or programming environment (IDE).
- Run the compiler to produce an executable file.
- Execute the resultant file.
Compilers’ features vary widely, even between versions of the same compiler, as do their options for example code generation, debugging, floating-point behavior, library handling, and more.
Overview of C++ compilers
Now that you’re ready to compile your C++ program, which C++ compiler should you use?
In general, one can group compilers by their licensing (free vs. paid), by how they are used (locally installed vs. accessed online) or by operating system (Windows, OS X, Linux).
Here are a few suggestions:
- If you are running Linux, the GNU Compiler Collection (GCC) is a popular choice. It’s free, of course, and typically available in your Linux distribution’s package repositories.
- On macOS, Clang is the default choice, installed with the Xcode command-line tools. Using Clang is free.
- The Cygwin project provides a collection of Linux tools, including GCC, for the Windows operating system. You can use Cygwin to run GCC or Clang, but take note that code produced this way will require Cygwin to run.
- Another option for Windows is MinGW, which doesn’t require Cygwin and produces executables that run natively on Windows.
Some IDEs include a compiler along with a code editor, such as Xcode on macOS and Visual Studio on Windows. There are many specialized compilers like Intel’s C++ compiler that provide special features for niche uses. For example, Intel’s compiler makes better use of the multi-core architecture in Intel processors and produces code that runs faster on Intel hardware. Such specialized compilers, however, often require the user to purchase an expensive license in order to use them.
Bjarne Stroustrup, the creator of C++, offers an incomplete list of C++ compilers on his website.
If you find yourself considering a compiler that’s not very popular, take standards compliance seriously. Avoid compilers that do not comply with ISO standards or that do not provide a solid implementation of the standard library—an extensive library C++ comes with. A library file, in turn, is a collection of precompiled code that has been “packaged” for reuse in other programs.
Some compilers are embedded in the frameworks of software development tools (IDEs) along with libraries. These frameworks can be useful, but it can be difficult to switch away from them if you ever decide to replace your tooling.
Online C++ compilers
An online compiler can be a useful tool for quickly compiling code without having to install a full compiler on the computer. They make it easy for a developer to play with the latest language features, to share code snippets online, to do collaborative live editing, and to test out various compilers. Beyond compilation in the strict sense, most online compilers also execute the compiled program and display its output.
Just like offline compilers, the features and C++ standard version support offered by online compilers vary widely, from using flags to parameterize the compilation to handling standard inputs to passing in command-line and runtime parameters.
A few popular online C++ compilers:
Check out this list of other online C++ compilers organized by features.
In this article, we walked through the stages of C++ compilation to understand the process in more detail. In learning how to use a C++ compiler and through this article’s overview of various C++ compilers, you got a glimpse behind the compilation curtain and gained some hopefully helpful insights.
Want to learn more about the C++ compilation process? Check out the articles from Toptal and Freelancer that include examples of how the compiler works with different parts of the program.
Looking to learn more about C++? Sign up to earn a C++ Nanodegree program.