Posts by Topic: cpp

Want to Unlock Performance and Clarity? Use Strong Types!

Christian Eltzschig - 01/06/2024

C++ and Rust are strongly typed languages, meaning whenever you declare a variable, you must either explicitly specify the variable

Rust

C++

let fuu: i32 = 123;
int32_t fuu{123};

or you do it implicitly by assigning a value of a specific type.

Rust

C++

let fuu = "hello world";
auto fuu = "hello world";

The type of the variable also comes with a specific contract. An integer can only contain numbers but not floating point values like 3.14 or a string. The size of the integer also defines the range of numbers it can store. An int8 can store numbers in the range of [-128; 127], and an int16 offers the range of [-32768, 32767].

Strong Types Implementation

With strong types, we have a powerful tool in our hands. When defining a function input argument as uint32_t, we never need to verify that the user accidentally gave us a negative number or a string. We never need to test this case. All subsequent calls can rely on the fact that this is indeed an integer, and the API of the function clearly communicates that it is expecting an integer and nothing else.

But we can also add semantic contracts to the type. Let's take a POSIX user name for example. The POSIX standard states that it is allowed to consist of:

  • lower and upper ASCII letters (a-zA-Z),
  • digits (0-9)
  • and period (.), underscore (_) and hyphen (-)

Furthermore, it is not allowed to start with a hyphen.

All of those constraints can be baked into a type called UserName. The basic idea is that the UserName cannot be created directly with a constructor. However, it comes with a static factory method called create - the Rust idiomatic approach is to call such a method new - which takes a string-literal as input argument and checks whether it meets the above requirements. This method returns an optional value that contains either a valid UserName object or nothing when the user name contract is violated.

Rust

C++

struct UserName {
   value: String,
};

impl UserName {
   pub fn new(value: &str)
                -> Option<UserName>
   {
       //...
   }
}
class UserName {
  private:
    std::string value;

  public:
    static create(const std::string & value)
                -> std::optional<UserName>
    {
        // ...
    }
}

The UserName type now guarantees that it always contains a semantically correct user name since creating a UserName with invalid characters is impossible.

Fewer Bugs, More Expressive APIs

Let's assume we now have a collection of strong types like UserName, GroupName, and FileName, and we can use them directly in our API. We introduce two functions. The first function do_stuff uses the new and shiny strong types, but the second one buggy_stuff uses the underlying string of those types directly.

Rust

C++

fn do_stuff(
    reader: &UserName,
    writer: &GroupName,
    storage: &FileName
)

fn buggy_stuff(
    reader: &String,
    writer: &String,
    storage: &String
)
void do_stuff(
    const UserName& reader,
    const GroupName& writer,
    const FileName& storage
)

void buggy_stuff(
    const std::string& reader,
    const std::string& writer,
    const std::string& storage
)

The first issue of the buggy_stuff function is that the API is not expressive. Should the reader be a group name, a user name, or maybe it is even something completely different? This requires some detailed documentation for the user. And if you know the saying, "The compiler doesn't read comments and neither do I.", you understand this is not a perfect solution.

Furthermore, it can be easily misused. When either variable names are not expressive enough or the function is called directly with values, they can be easily mixed up. Also, what happens when you refactor and swap or replace some arguments? Maybe the storage shall be no longer a file name but now a database name. How do you ensure that all function usages are ported?

Additionally, the implementer of buggy_stuff is now responsible for verifying all arguments! Whenever this function is called, we must check that the reader, writer, and storage are semantically correct. When this is not the case, we must handle it and inform the user. Of course, we could move this check into a free function and use it whenever we expect a type with a semantic contract. However, this can be easily forgotten due to refactoring.

The error handling introduces further overhead! We have to write additional tests to check if the error handling works correctly and the function's users require extra logic to handle potential errors. And this extra logic needs to be tested as well!

Finally, it will cost us performance. Why? Because whenever one has a call chain where those arguments are forwarded to other functions, especially when they are not directly under your control, the same semantical verification has to be performed. Over and over again. And those function calls one uses to implement buggy_stuff may also fail for semantically incorrect values. This has to be handled and tested again. This costs even more performance!

All of those problems, performance costs, additional tests on the user and implementer side, additional error handling, and an unexpressive API can be avoided when we integrate the semantical check into the type itself so that we have a guarantee that it always contains a valid value.

Summary

Using strong types like UserName or FileName comes with a bunch of benefits. Firstly, the API becomes more expressive and we no longer require extensive documentation to convey all the semantic details. Strong types can also prevent parameter mixups in functions with multiple arguments. Furthermore, they also minimize the lines of code by ensuring validity through the type system. With this, they decrease the need for error handling both within the implementation and for the user's code. Even the performance may improve when the semantic content is centrally verified and not in every function repeatedly....