C++ - C++ String Length - Programming Languages

Our Guide to C++ String Length

Calculating the length of a string in C++ seems like a pretty straightforward task. But in the world of C++, appearances are often deceiving. In this article, we’ll show you why string length in C++ is anything but uncomplicated. 

Read on to learn about the various methods for calculating string length in C++, and what to do when you encounter a string in a foreign language.

Why Is String Length Important?

When we discuss the length of a string, we’re usually referring to the number of that string’s characters. In C++, string length really represents the number of bytes used to encode the given string. Since one byte in C++ usually maps to one character, this metric mostly means “number of characters,” too. Things get more complex when we talk about language encodings other than the default ASCII, but we’ll get to that later.

As C++ developers, there are many reasons why we might be interested in the length of a string. For example, we might want to check whether two strings have the same length, or if a string has an even or odd number of characters. Likewise, we may want to know the minimum and maximum string lengths.

Say that we’re designing a social network app. We stipulate that all usernames on our app should have at least four letters. Now, if someone tries to enter a username with only two letters, we may ask them to try a longer name. On the other hand, we might want to truncate names that are too long. For all of these operations, we need to know a string’s length.

But first, let’s take a deeper look into the C++ string.

What Are C++ Strings?

C++ strings are sequences of characters stored at contiguous memory addresses. There are two ways to define strings in C++. The more common strategy uses the standard library’s string class. If we define a string as an std::string object, it will come with several methods, including two that return the object’s size. An object of the std::string class is always defined using double quotation marks, as in this example:

std:string name = "Duc";

The second strategy derives from the C language and is more low-level. A “C-style string,” as it’s commonly called, is defined as an array of characters. Here’s one way to declare it:

char name[4] = "Duc";

But wait a minute: Our “name” string is short, consisting of only three letters. So why did we declare it as an array of length four? 

The answer lies in how C-style strings are represented internally. They have a final element “\0,” that signals to the compiler that it has reached the string’s end. Accordingly, we can explicitly add the terminal zero to our string per our second way of defining a C-style string, where we spell it out as an array of single characters:

char name[4] = {'D','u','c','\0'};

But this is optional, and the below version is just as fine:

char name[4] = {'D','u','c'};

How To Get String Length in C++

Since there are two different formats for strings in C++, we also have different methods for determining string length. For a string of the std::string class, we would compute its length by using an appropriate member method, such as size() or length(). These are really just two names for the same function:

std::string name = "Duc";
std::cout << "Length: " << name.length() << std::endl;
std::cout << "Size: " << name.size() << std::endl;

The code outputs the following:

Length: 3
Size: 3

For a C-style string, the strlen() function is in charge of string length computation. Here’s how it works (make sure to include the cstring namespace in the header):

char name[4] = {'D','u','c','\0'};
std::cout << "Length: " << std::strlen(name) << std::endl;

Since the terminal zero is not part of the string, it’s not considered in the calculation:

Length: 3

But why is there even more than one way to represent strings in C++? Most programming languages make do with a single string data type. By his own account, when Bjarne Stroustrup released C++ in 1986, it simply was not sufficiently developed and was still missing some basic data structures, including a string class. In his seminal paper on the history of C++, Bjarne calls this shortcoming the “worst mistake” in the development of the C++ language. 

The 1998 standard finally introduced the string class, which has become the de facto standard for encoding strings in C++. You might still encounter C-style strings in older C++ code, in low-memory settings or to ensure backwards compatibility. However, using std::string is usually both safer and easier, as it takes care of allocating and freeing memory automatically. With C-style strings, on the other hand, memory-related issues have to be handled manually. 

A Practical Example Using C++ String Length 

Let’s say we want to welcome our new users to our app by printing out a nice personalized message with a frame. The size of our frame would depend on the number of letters in the username. This example is adapted from the excellent book “Accelerated C++: Practical Programming by Example,” by Andrew Koenig and Barbara E. Moo:

#include <iostream>
#include <string>
int main()
{
std::string name;
// Ask for the user's name
std::cout << "Hi! What's your name? ";
std::cin >> name;

std::string welcome = "Welcome, " + name + "!";// Create a string with spaces the size of the "welcome" string
std::string spaces(welcome.size(), ' ');// Create the second and fourth line of the frame
std::string l2 = "* " + spaces + " *";// Create the first and fifth line of the frame
std::string l1(l2.length(), '-');// Create the third line of the frame
std::string l3 = "* " + welcome + " *";

// Put it all together
std::cout << std::endl;
std::cout << l1 << std::endl;
std::cout << l2 << std::endl;
std::cout << l3 << std::endl;
std::cout << l2 << std::endl;
std::cout << l1 << std::endl;

return 0;}

See how we’ve used length() and size() interchangeably in our code? Let’s try it out:

Hi! What's your name? Duc

*****************
*               *
* Welcome, Duc! *
*               *
*****************

What happens if we enter a longer name?

Hi! What's your name? Cassiopeia-Iphigenia

**********************************
*                                *
* Welcome, Cassiopeia-Iphigenia! *
*                                *
**********************************

As we intended, the frame expands to make room for the longer name. 

String Length and Foreign Languages

We established earlier that length() does not actually count the number of characters, but the bytes used to encode a string. However, you’re likely well aware that the vast majority of the world’s languages do not, like English, use a 26-character Latin alphabet. So if we wanted to represent all characters used worldwide, we would need to go far beyond the 128 combinations provided by the ASCII standard (the default encoding in C and C++). 

The UTF-8 encoding standard is capable of encoding all the characters of the world’s alphabets, by representing them in variably sized byte sequences. This property causes a glitch in our code when our friend Duc decides to write his name in accordance with its original Vietnamese spelling:

Hi! What's your name? Đức

********************
*                  *
* Welcome, Đức! *
*                  *
********************

Suddenly the frame no longer fits! It seems that our function thinks that the name is longer than it really is, resulting in lines that are too long to fit our name. But why does it think that? Let’s look at what our length() method has to say about that: 

std::string name = "Đức";
std::cout << "Length :" << name.length() << std::endl;

Output:

Length: 6

The answer is not wrong, considering that length() counts the number of bytes used to encode the string, and not its actual number of letters. Certain characters are encoded as two, three or even four bytes in the UTF-8 standard. Incidentally, our name is represented by six bytes. But is this the answer we wanted? Most likely not.

C++11 introduced the codecvt class for translating between different string encodings as part of the locale header. Using the solution outlined here, we would rewrite our code to include a custom utf8_len() function:

#include <iostream>
#include <string>
#include <locale>
#include <codecvt>
#include <iomanip>

std::size_t utf8_len(const std::string& utf8_string)
{
    return std::wstring_convert< std::codecvt_utf8<char32_t>, char32_t >{}.from_bytes(utf8_string).size();
}

int main()
{
  std::string name = "Đức";
  std::cout << "Bytes length: " << name.length() << std::endl;
  std::cout << "Characters length: " << utf8_len(name) << std::endl;
}
Bytes length: 6
Characters length: 3

See how the number of characters used is just half that of the bytes used to store the string? 

As a little challenge, we’ll leave it to you to update our earlier frame code with the custom length function.

Become a C++ Developer

From calculating string length to juggling foreign-language characters, we hope you’ve learned a thing or two from this guide: In fact, mastering string manipulation is an important milestone in your journey as a C++ developer.

Don’t stop here! Enroll in our expert-taught C++ Nanodegree program to take your coding skills to the next level.  

Start Learning