Posts by Topic: cpp
Want to Unlock Performance and Clarity? Use Strong Types!
Christian Eltzschig - 01/06/2024
C++ and Rust are strongly typed languages, meaning whenever you declare a variable, you must either explicitly specify the variable
Rust |
C++ |
|
|
or you do it implicitly by assigning a value of a specific type.
Rust |
C++ |
|
|
The type of the variable also comes with a specific contract. An integer can
only contain numbers but not floating point values like 3.14
or a string. The
size of the integer also defines the range of numbers it can store.
An int8
can store numbers in the range of [-128; 127]
, and
an int16
offers the range of [-32768, 32767]
.
Strong Types Implementation
With strong types, we have a powerful tool in our hands.
When defining a function input argument as uint32_t
, we never
need to verify that the user accidentally gave us a negative number or a string. We never
need to test this case. All subsequent calls can rely on the fact that this is
indeed an integer, and the API of the function clearly communicates that it is
expecting an integer and nothing else.
But we can also add semantic contracts to the type. Let's take a POSIX user name for example. The POSIX standard states that it is allowed to consist of:
- lower and upper ASCII letters (
a-zA-Z
), - digits (
0-9
) - and period (
.
), underscore (_
) and hyphen (-
)
Furthermore, it is not allowed to start with a hyphen.
All of those constraints can be baked into a type called UserName
. The basic
idea is that the UserName
cannot be created directly with a constructor.
However, it comes with a static factory method called create
- the Rust
idiomatic approach is to call such a method new
- which takes a string-literal
as input argument and checks whether it meets the above requirements.
This method returns an optional value that contains either a
valid UserName
object or nothing when the user name contract is violated.
Rust |
C++ |
|
|
The UserName
type now guarantees that it always contains a semantically
correct user name since creating a UserName
with invalid characters is
impossible.
Fewer Bugs, More Expressive APIs
Let's assume we now have a collection of strong types like UserName
,
GroupName
, and FileName
, and we can use them directly in our API. We
introduce two functions. The first function do_stuff
uses the new and shiny
strong types, but the second one buggy_stuff
uses the underlying string of
those types directly.
Rust |
C++ |
|
|
The first issue of the buggy_stuff
function is that the API is not expressive.
Should the reader be a group name, a user name, or maybe it is even something completely different? This
requires some detailed documentation for the user. And if you know the saying,
"The compiler doesn't read comments and neither do I.", you understand this is not a perfect solution.
Furthermore, it can be easily misused. When either variable names are not expressive enough or the function is
called directly with values, they can be easily mixed up. Also, what happens when you refactor and swap or
replace some arguments? Maybe the storage
shall be no longer a file name but now a database name. How do
you ensure that all function usages are ported?
Additionally, the implementer of buggy_stuff
is now responsible for verifying all arguments! Whenever
this function is called, we must check that the reader
, writer
, and storage
are semantically correct.
When this is not the case, we must handle it and inform the user.
Of course, we could move this check into a free function and use it whenever we expect a type with a semantic
contract. However, this can be easily forgotten due to refactoring.
The error handling introduces further overhead! We have to write additional tests to check if the error handling works correctly and the function's users require extra logic to handle potential errors. And this extra logic needs to be tested as well!
Finally, it will cost us performance. Why? Because whenever one has a call chain where those arguments are
forwarded to other functions, especially when they are not directly under your control, the same semantical
verification has to be performed. Over and over again. And those function calls one uses to implement
buggy_stuff
may also fail for semantically incorrect values. This has to be handled and
tested again. This costs even more performance!
All of those problems, performance costs, additional tests on the user and implementer side, additional error handling, and an unexpressive API can be avoided when we integrate the semantical check into the type itself so that we have a guarantee that it always contains a valid value.
Summary
Using strong types like UserName
or FileName
comes with a bunch of benefits.
Firstly, the API becomes more expressive and we no longer require extensive documentation to convey all the semantic details.
Strong types can also prevent parameter mixups in functions with multiple arguments.
Furthermore, they also minimize the lines of code by ensuring validity through the type system. With this, they decrease the need for error handling both within the implementation and for the user's code.
Even the performance may improve when the semantic content is centrally verified and not in every function repeatedly....