Disclaimer: The codesnippets in this post are made up to highlight the problems of hungarian notation. They are not actual snippets from production code.
I’m a huge proponent of having a consistent style in a codebase – ideally the reader should be able to “tune out” of the actual code and be able to just understand what is happening. Now, with this in mind, I was recently set loose on a customer’s codebase, that was using hungarian notation for all variable names. Given that it’s 2019 this choice struck me as odd, and even counter intuitive. After I had worked with the code for a couple of weeks I actually grew furious with that design choice, because it seemed to somehow slow me down. To add insult to injury the customer’s static codeanalysis would keep bugging me to adhere to the weird naming convention (all the while happily ignoring any forgotten NULL checks). That got me to think: Why would this style feel so atrocious? This is just another codestyle, right? Well, not all codingstyles are created equal, and there are quite a few of things fundamentally wrong with hungarian notation, as we’ll soon find out.
For the readers who don’t know what hungarian notation means: It is the practice of prefixing variables with their type and additional information. Note: For all examples I’ll use C++.
e.g.:
uint32_t u32Foo; bool bSomeVal; unsigned char auSomeBytes[256];
This doesn’t seem too bad, however, if we add another odd convention, namely, prefixing class members with “m” we get:
uint32_t mu32Foo; bool mbSomeVal; unsigned char mauSomeBytes[256];
This can still get worse, consider, that we shall prefix classes with “C” (yes, that is actually quite common in the legacy C++ world) and structs with “st”:
struct stAppData { uint32_t mu32Foo; bool mbSomeVal; unsigned char mauSomeBytes[256]; }
oh, and now we need an array of appDatas…:
stAppdata mast_theAppdata[10];
These conventions stem from times where our IDEs were not as sophisticated as today, back when something like codecompletion was still years away. In these circumstances, the described style might actually be beneficial (although some of the problems I shall now point out existed back then as well):
Mismatch of concepts
This is something, that is only apparent in the codebase I had to work with, and from whose styleguide the above examples were derived: The hungarian notation is used to annoate two quite different concepts, namely: Typenames and variablenames. This leads to gruesome variablenames such as mast_ApplicationData. The mismatch actually gets very visible, when we look at the variable name, where the prefix does actually not encode the type (stAppData), but the “type of the type”, i.e. that stAppData is a struct, whereas the members of stAppData encode their actual type. If we were to consequently encode the type into the name, we would need a prefix for each struct (which is obviously not really feasible with any reasonably sized codebase), but let’s go with it for the sake of an argument. Let’s say we want to have a three letter prefix for each struct (given that one letter prefixes are mostly taken for primitive types, and two letters don’t really spell much), so we’d end up with something like:
stAppdata maapd_theAppdata[10];
if we were to add more structs like (arbitrarily chosen names):
struct stVersionInformation{}; struct stFilePath{}; struct stDirectoryAttributes{};
and a struct containing arrays of all 4:
struct stOtherStructContainingThing:
{
stVersionInformation mavei_myVersionInformation[];
stFilePath mafip_myFilePath[];
stDirectoryAttributes madia_myDirAttributes[];
maapd_theAppdata maapd_theAppdata[10];
}
I guess most people will agree that this boils down to unintelligible gibberish. Luckily the authors of the original codebase went for the less consequent way and just encoded all struct variables with st as prefix regardless of the actual structname.
Hungarian Names tend to lie
Yes, they often do, after a while. Point in case: A simple refactoring, consider the following interface:
void DoStuff(uint16_t u16_StuffParam)
if we were to use one of the common refactoring tools such as ReSharper or VisualAssist to change the type of the parameter to uint32_t we’re prone to forget to also rename the parameter. This is especially true for actual interfaces (read: abstract classes), and will lead to subtle hard to spot bugs in production. I’ve seen this kind of oversight countless times and also had the pleasure to fix bugs that occured due to programmers relying on the typeinformation encoded in the variable name instead of using the actual type of the variable. Example:
void DoStuff(uint16_t u16_timeout) { if(u16_timeout == USHRT_MAX) { // No timeout, wait indefinitely } else { // wait with timeout } }
The documented intent here is, that we either want to call the function using a
timeout or, if no timeout is to be used, the maximum allowed value for timeout should be used. (Note that this is not a wildly contrived example but something I’ve actually encountered) If we change the type of the parameter to uint32_t this goes pear-shaped, as we now have a situation where we have a single value somewhere in the uint32_t range that behaves differently from it’s surrounding values. I.e.: We are allowed to wait 65535 timeunits and we are allowed to wait 65537 timeunits. But if – for some reason – we want to wait 65536 timeunits the code will actually block for ever (at least until the event we’re waiting for happens). The bad: The offending line actually looks correct due to the encoded typeinfo:
if(u16_timeout == USHRT_MAX)
Timeout is an uint16 right? That’s what the name says – this would probably even slip through a codereview. Granted, if the type prefix weren’t present, the code would still exhibit the same wrong behavior, but the line changed to:
if(timeout == USHRT_MAX)
is far more likely to spark the reader’s curiosity to actually check the type of “timeout”.
Hungarian Names make codecompletion less useful
The productivity gains modern IDEs have enabled are huge, and a lot of these gains can be attributed to codecompletion. Using hungarian notation will effectively nerf the codecompletion as the programmer will now have to always write the typeprefix before the codecompletion can do anything meaningful. This is especially aggravating if classmembers are also prefixed with “m” as our codecompletion window will now be cluttered with name all starting with m. Even worse: In order to make decent use of the codecompletion the programmer has to know the type of all variables he intends to use, as
"int32_t i32_foo"
will not show up in the same part of the list as
"uint32_t u32_bar"
indeed it might not even be visible if the class in question has many members. This last bit is actually quite a burden on the “working set” the programmer can keep in his mind, and it is completely unnecessary for the most part. Yes, it is important to make a distinction between signed and unsigned, however the places where we actually have to make this distinction are few and far between. In most case we’ll want to do some arithmetic where it is not all that important if the type is signed or unsigned.
Hungarian notation creates implicit namespaces
This is a nice one: When we use hungarian notation we can effectively have variables that have sort-of the same name but different types. Consider this:
int32_t i32TheValue; int16_t i16TheValue; uint8_t u8TheValue;
These implicit namespaces get especially bad, once the programmer’s mind has adapted to the hungarian notation and starts to filter out the typeprefix and only notices the actual name of the variable.
TL;DR:
Avoid hungarian notation, as it adds codenoise, imposes unnecessary burden on your programmers and will introduce new failuremodes into your development process.
Image Credit: Joshua Aragon