How To Write Lean Code
After more than 20 years of writing software (see "Personal experience highlight" at the end) I've experienced enough ups and downs to be able to confidently say what matters when writing lean code. Lean code is code which is reliable, secure and delivered on time, not code which meets some ideals of the art of programming.
Programmers generally write code as a job.
Some people view this as a job which is done because of what is required, considering that writing code is devoid of mental investment and sparkle. This hit-and-run approach gets the job done fast and with a low immediate cost, but with a significant number of bugs and a cost which increases geometrically as the product grows, because the existing parts either need to be rewritten, or will be the cause of severe bugs when extended.
Some people view this as an art, considering that writing code is a mastery that any programmer must achieve and live by. This approach gets the job done slowly and with a very high immediate cost. Abstracting for the sake of the art of programming is a very expensive path to take; the more formal the code is written, the more expensive writing it gets. A company, for better or worse, has to deliver products, and has a limited time, money and people with whom to do it, so art is unlikely to be its goal.
This section describes how to deal with the logic of lean code.
A pattern is a specific type of solutions for a specific type of problems.
In order to create reliable, secure and delivered on time products, two types of patterns can be used to achieve this: cheap and expensive.
"Pattern" means something that's used most of the time, so a pattern that should not be used is a pattern that should not be used as a habit, but may be used from time to time, when it brings something of value.
A product's lean development makes use of the cheap patterns, but not of (all) the expensive ones. This does result in a reliable, secure, and delivered on time product, but may also result in an increase of the number of bugs found throughout the lifetime of the product.
Using the expensive patterns (except unit tests) increases the lean development time 2...3 fold. Comprehensive unit tests alone increase the lean development time 3 fold. The advantage of all the expensive patterns is a reduction of the number of bugs found throughout the lifetime of the product. The main disadvantage is a much longer development time.
Pessimistic versus optimistic pattern
Pessimistic and optimistic patterns can be used, even combined in various proportions, as a way to handle the potential bugs depending on their perceived importance at the start of the development of a product.
The decision to use mostly a pessimistic or optimistic pattern is not only a matter of potentially critical errors lurking in code until they cause problems, but also a matter development time. Ideally, neither the hardware nor the software should contain bugs, but in practice their complexity and the need to compete in practical rather than idealistic markets always leads to bugs.
Is a pattern where bugs are considered to be potentially critical, so most of the development of the product is done so as to ensure that the bugs are detected as soon as possible, and that the code flow stops as soon as a bug is encountered.
An example of this is that most compilers and interpreters thrown exceptions when a null reference's content is accessed. However, this example is not enough to indicate that a product is using a pessimistic pattern, it just means that there is the pessimistic pattern is used to some extent.
The biggest disadvantage of a pessimistic pattern is that the cost required to prevent critical bugs is far greater than the cost required to fix various bugs, because prevention requires a comprehensive handling, throughout the code, even though in many cases it's not necessary, while fixing requires a specific handling, that is, only where it's necessary.
Another disadvantage is that a pessimistic handling of even a single error normally leads to a complete interruption of a feature, because an exception is usually thrown. For example, if a list with database records contains an inconsistency, the entire dialog which shows the list is likely to crash upon loading, or at best not show anything.
So, while errors are detected as soon as it's possible, the side effects are not necessarily desirable.
Is a pattern where bugs are considered to be mostly non-critical, so most of the development of the product is done so as to minimize the complexity of the product and its development time. This means that validations are made only where it's clearly critical.
However, this means that the bugs are not detected as soon as possible, and that the code flow will not stop as soon as a bug is encountered.
Some people, myself included (at least in some cases), don't like this pattern because it may hide bugs for a long time. But what is so bad about those bugs when compared to the negative impact of features not working in their entirety (as explained in the pessimistic pattern). How would users feel about not being able to edit a list of employees because there is an inconsistency in a single record; surely, they would want to be able to edit everyone whose records are consistent. This is why it's better to recover the code flow than just say "it's a bug which must be detected immediately" and throw an exception.
This pattern should really be used only together with code flow tracing which can be activated at runtime to show very extensive information. Code flow tracing should have an option to output the stack trace, even when there are no exceptions, because simple trace messages from various methods only show that those methods were called, not by whom they were called.
In most applications, the majority of the scenarios will not be critical, so avoiding a pessimistic pattern will lead to great development time savings. It's always quicker to decide whether or not a scenario is critical than to write code to handle critical scenarios everywhere.
Handling of invalid states
What should the programmer do when a method receives a null parameter? Sometimes a null is used to represent a valid state, sometimes an invalid one, and sometimes an ambiguous one when it's not clear whether it represents a valid or invalid state.
Should the code throw exceptions when it finds invalid states, or should it recover the flow by returning some default value? Throwing an exception as soon as an invalid state is discovered leads to a more secure product, but recovering the flow allows the programmers to more easily extend the existing data structures in the future, because the recovery code is always simpler than the data validation code since it doesn't need to keep track of the flow context, and doesn't need messages to be thought of and typed.
For example, it's easier to write "if( x == null ) return null" than "if( x == null ) throw new CustomException( "x has an invalid state" )", where "CustomException" must be declared as a class in a file, and the exception message may possibly have to put in resources and be translated. This example may seem simple, but the required decisions and typing add up to a huge slice from the development time. Complexity is generally not an issue in programming, time is!
Data initialization plays a big part in the handling of invalid states.
One pattern is to always create the data in a valid state, either by using constructors with parameters, or by loading the data for a data store (including through deserialization). However, this requires a heavy discipline (for the programmers) and is dependent on the capabilities of the programming platform.
For example, if the deserializer included in the programming platform requires parameterless constructors to be public, despite them not being able to create valid objects, programmers will end up using them directly in code and forgetting to properly initialize all the data in all the cases, especially when objects are extended, which leads to hard-to-detect causes of errors. If the deserializer were to not require a parameterless constructors to be public, this pattern would become far more reliable.
Another pattern is to not initialize the data, but to always attempt to detect and recover from invalid and ambiguous object states. In this case the cost is bore by the detection and recovery code, code which becomes tedious quickly.
This pattern would likely be the best to follow if the compilers were to provide support for it in the sense that they should not throw exceptions when evaluating and accessing null objects. For example, in modern languages, evaluating the "a.b.c" expression throws an exception if either "a" or "b" is null (but not if "c" is null). If this would not happen and instead the expression would simply return null (or a default value for scalars), the programmer would be able to avoid checking for nulls inside methods.
If a default value for scalars is deemed not acceptable, it could be possible (for languages where the types are determined during compilation), for example, to (always require to) write scalar expressions like "a.b.c ?? -1" which means that "-1" would be returned if either "a" or "b" is null.
This null evaluation feature is, in a way, available in C# as the "?." operator: here (see the wishlist here). Unfortunately, this operator still leads to complex expressions due to the fact that when used with scalars it returns a nullable scalar and this requires special handling in code which expects a simple scalar.
A truly interesting pattern is the non nullable reference types, where the compiler provides support for passing data around which may not be null; the compiler would enforce this during compilation. Sadly, for .Net, for example, the people who work on it say it's too complex to introduce now. You can read more about this pattern here and here. A possible current solution can be found here (using NullGuard).
An interesting pattern is one which supports Antony Hoare's opinion, the one who introduced null references in modern programming languages, that "[null reference] has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years."
The pattern is to never use nulls in code, but to always initialize the data with empty objects, by using the (public) parameterless constructor, and to always return empty objects instead of nulls from methods.
An empty object is an object which may be constructed with a parameterless constructor, and which initializes all its nullable fields with empty objects; such objects are likely to be in an invalid state after construction. Scalar objects don't require manual initialization, but keep in mind that numbers will be 0 and enumeration values will also be 0 (which usually means the first value of the enumeration type). Texts / strings should be initialized with empty texts. Arrays / lists should be initialized with arrays / lists with no items.
The empty object pattern is not the same as the null object pattern, where there are some derived classes and possibly singletons with a null / special meaning, because it lacks its development overhead.
The empty object pattern makes null checking code unnecessary, though the infrastructure is not necessarily built this way, so there will still have to be some null checking code where interfacing with the infrastructure.
Empty objects will not crash an application with automatic null reference exceptions, but they may still crash it due to exceptions thrown when invalid object states are manually detected.
While some memory is going to be allocated for the empty objects, it's not important enough to matter. Also, while this allocation is going to take a bit of processor time, this is going to be offset by the lack of all null checking "if" statements; the same applies for the execution of code which uses empty objects: it's just too rare to matter in the bigger picture.
The significant problem of this pattern is its habit of passing around invalid data as if it were valid, thus introducing hard-to-detect causes of errors. Logging may help to detect the causes of such errors, but it's important to note that the code and data stores should be forgiving of such invalid data. For example, an application should not crash if empty objects are loaded from the database and shown in a list, to the user.
This pattern detects coding errors with a significant delay. If null reference exceptions are thrown, the coding errors are detected as soon as the execution flow passes through that code, but if all class objects are always initialized (possibly with objects in invalid states) then the coding errors will not be detected when the execution flow passes through code which uses them (because there is no sign to indicate a coding error).
It's possible to introduce in each class a property which specifies whether the object is in a valid state or not, but the management of this property ends up becoming as costly as validating the object states all the time.
But the important question her is: there is a coding error, so what? That is, what is the importance of the error, what are the consequences? The thing is, most of the time, in most applications, errors have non-destructive consequences, like saving an empty object in the database and loading it later and showing to the user an empty line in a list.
On the other hand, when a coding error is triggering an exception, the consequences are clear: the functionality can't be used. And if this exception is triggered due to inconsistent database records, the application may become unusable since the (home) user may be unable to ever correct the data inconsistency.
An example of a critical scenario is the case of a method which deletes files based on a filter. A programmer surely doesn't want to pass an empty object as a filter (due to a coding error upstream) which may allow the method to delete all the files from the user's harddrive.
In the case of a method which saves the content of a file, a programmer surely doesn't want to pass an empty object as content (due to a coding error upstream) which may allow the method to overwrite / delete the file's content, or to tell the user that it has saved the file's new content though it has not.
In the case of a method which encrypts some data with a key which is an empty byte array, the method should throw an exception or return an empty byte array. This avoids exposing unencrypted sensitive data. However, care should be taken because if the original data is erased and the saved encrypted data is an empty byte array, the original data may be lost; to prevent this, a (partial) decryption may be required in order to check that the data can be retrieved later.
Critical scenarios are rare and the programmer must only then take steps to guarantee that such bugs don't occur. In the cases above, the programmer should never allow files to be deleted or overwritten when receiving an empty text / string input parameter. Also, in general, in such cases, the programmer should explicitly check that a flag field in the input parameter object indicates the validity of the object's state; this flag must be explicitly set only after all the preparation steps have been completed. Considering that empty texts / strings and the integer 0 are never used as valid file names or database IDs, they can be safely used to initialize fields from empty objects, since they will not cause a program to do anything destructive with them.
In the end, this pattern also requires a heavy discipline (for the programmers), as described below:
Conversion of an invalid state to a valid state
Calling "v = f( i )", where "i" is in an invalid state (like a null), leads to "v" being in a perfectly valid state in an optimistic pattern where "f" doesn't check the validity of "i".
So how can the programmer protect critical scenarios, like deleting all the files if the input file name parameter is empty, from such conversions? The first and simplest way is to not perform critical operations when the input parameters are empty. In this case, since the input file name is null or empty, the programmer must not delete anything.
In this case, there is no need to throw an exception, but if some data were to be saved in a file with an empty name, an exception should be thrown or else the user would likely not know that the data can't be retrieved. While the two approaches are different, the potential consequences of these cases are fundamentally different: in the first case a file may remain stored on the harddrive, in the second case some data may be lost.
This optimistic pattern means that the programmer must make decisions and write specific code only for critical scenarios, and since most scenarios are not critical, the savings in decision making and development time are huge for most applications.
Aim to write code using structure, isolation and inheritance patterns. If you can do this in object-oriented, typed languages, it's even better.
Concentrate on writing code which can be later refactored, on logging and on descriptive errors.
It would perhaps be better to use nulls (as a presence indicator, instead of a separate boolean value).
Avoid using any of the patterns below because they require the programmer to constantly add code to support the pattern, and this means that the main benefits of the optimistic pattern, simplicity and frugality, disappear, making it as time consuming as a pessimistic pattern:
Try to convert generic dictionary (and list) declarations to classes which derive from (or contain) the generic dictionary (and list) declarations. This allows you to declare (and therefore name) fields for both the key and the value which make the dictionary items, so that anyone can easily understand what they mean. This is especially useful for chained generic dictionaries. You can also add inside methods specific to the dictionary's content.
When you add a new feature which may break existing features, like when you add a lot of extra validation and you know that testing is unlikely to cover the changes, add a configuration option to disable the new feature. This option would act as a quick workaround in deployment environments, to revert to the old behavior.
This section describes how to actually write lean code.
Minimize pressed keys
Every extra key that is pressed takes time. For example, the parenthesis in boolean expression may be unnecessary: "if( (a == b) && (c == d) )" can be written faster with "if( a == b && c == d )".
Group related code
Establishing a hierarchy of solutions, projects, folders, files and classes is a way of keeping related code grouped together.
A file's content should also be grouped together in several areas: constants, private fields and properties, public properties, constructors, private methods, public methods.
Each backing field of a property (C#) should be grouped together with the property, not together with the rest of the private fields. This allows the programmer to more easily refactor the code.
In programming languages where letter capitalization matters, avoid using prefixes like "a" for parameters and "m" for private fields. For the backing fields of properties, use the prefix "_".
As advantages, aside from having fewer keys to press, this will also isolate the backing fields from the rest of the identifiers in IntelliSense.
Regions, used as a pattern to separate file sections, force the programmers to constantly expand (and collapse) them in order to see within them. It's preferable to avoid them, organize the data into files which keeps each file relatively small, and use the programming environment to navigate to the definition of the identifiers that need to be investigated.
Instead of comments, use long, descriptive identifiers for variables, methods, classes, files, folders, projects and solutions. IntelliSense will help to avoid wasting time typing long identifiers. Renaming identifiers will also be easier because the programming environment will automatically replace them in all the places where they are used.
Comments would then be mostly repetitions of the descriptive identifiers. If you still think that comments are necessary, it's likely that you actually feel the need for a glossary documentation which explains the specific terminology, and perhaps for an architecture documentation.
There certainly are some areas where the local architecture and choices could be explained in the code, but beyond such cases the comments become repetitions.
Data transfer objects
Avoid using DTOs because they require large amounts of time to be written, and quickly become a synchronization nightmare. The programmer should refer the output DLLs of all the data structure projects that are used, or use autogeneration features like VisualStudio's WCF-proxy or web-proxy generator, entity framework class or database generator.
Comprehensive unit tests increase the development time 3 fold. This increase exists because the code's architecture must be much more flexible in order to allow testing (especially for mocking data), but also because the tests have to be thought of and written.
For example, a company boasted to have 2 lines of testing code for each production code. Considering the scope of the project and the man-power involved, the 3 fold estimated increase is correct.
When deciding whether to use or not use unit tests, it's important to understand where critical errors usually occur. Unit tests can detect if a feature no longer works after code refactoring.
However, critical errors are likely to occur in code which can't be tested with unit tests, in cases which are hard to reproduce and hard to debug, like multithreading, error handling, client-service operations, or database operations which are supposed to affect only a subset of data but may affect much more (yet nobody tests what data was NOT affected). These cases require tracing and integration tests which test real world scenarios.
Also consider how countries are defended: are there walls built around them, or are people sent where they are needed? Some countries have tried to build walls, and the main examples are the Great Wall of China, the German Hindenburg Line, and the French Maginot Line. All these were abandoned because their use was limited. They've worked for their original purpose, but quickly became obsolete because other factors have proved to be more important, other weaknesses have been exploited by attackers.
Unit tests are the most expensive programming pattern, and one whose benefits are limited to ensuring that refactoring doesn't break the tests. Cost wise, integration tests are far more effective because just a few tests can test how large amounts of code work in real environments, don't require a special way of writing code (only entry points are needed), and can be used by developers, testers and API integrators.
Avoid checking for errors that the infrastructure also checks
When avoiding to check for errors that the infrastructure also checks, it's true that the infrastructure's exceptions give fewer details about what has happened, for example which accessed variable is null, but this can be compensated on a bigger context with logging (which includes the stack trace on demand) and issue reproduction steps.
As a matter of fact, logging is good not only for saving the stack trace in the case of exceptions, but also for following the application's flow in case a detailed view of this flow is required to see what happens when errors and odd events occur.
Detailed logging should be available only when it's manually activated from an option. The programmer should not rely on a debug build for a detailed logging. It has to be possible for the user to activate and deactivate this at any time (even if it's in a configuration file). In environments such as .Net, by using specific methods for detailed logging it's possible to later remove them from the build by applying on them the "[Conditional( "DEBUG" )]" conditional compilation attribute.
The detailed logging mechanism should send messages to the system's default listener (which can be seen with DebugView), and (possibly) also to a file (on the computer where the application runs, not remotely, for simplicity).
Patterns which improve reliability
Validate the inputs (if the underlying framework doesn't).
Try to use immutable data structures, as close as they can be to immutability.
Immediately stop a flow which causes an exceptional case, and go into a default, safer mode.
Test where exceptions go, test if they stop faulty flows, test if those flows can be safely restarted.
Test the multithreading under stress.
Ask yourself what you will do when something does go wrong. Add extensive logging to help with debugging.
Write things like "if(index <= 0)" instead of "if(index == 0)" in case something does go wrong (here with a normally positive index), even if it appears impossible.
Duplicate the data processing and compare the results before deciding what to do.
Perhaps some people believe by now that reducing the number of pressed keys is a goal. However, the goal is to allow the programmers to concentrate over the architecture and logic of the code, not waste time with actions which wear down their creativity and lead to a desire to use shortcuts instead of high quality patterns.
Writing code is not a goal in itself, it's not an art. It's purpose is a reliable, secure, practical, delivered on time product. Increasing the development time, and consequently the cost, several times in order to remove a few bugs is not practical.
Any pattern has advantages and disadvantages, and in the end they all shift costs around, emphasizing them either during the development phase or throughout the lifetime of the product.
No matter what pattern the programmers choose, they must be prepared for the aftermath of potential bugs, and this is best done with comprehensive logging and working as detectives to find the cause of the bugs.
Evaluating people's skills
Pick a number of skills.
Ask people to rate themselves for each skill from 0 to 10, with an average of 5, taking into consideration that the sum of the ratings must be equal with or less than X (= 6 * the number of skills).
If you have skills of different importance, you can group the skills by importance, but ensure that each group has its own total rating (calculated in the same way). You can also group skills by domain: general and specific.
The purpose of this test is to see what people believe are their strong and weak parts, and if they can manage scarce resources.
After evaluating the ratings and seeing who fits in the needed (job) profile, the people can be asked questions specific to the skills for which they have rated themselves highly.
In order to obtain the most accurate results, tell people all these things.
Examples of skills:
I have developed, over the course of 8 years, in about 2500 hours, an instant messenger application which brings together the following technologies: .Net, WPF, C#, OOP, socket level networking, asynchronous client-server architecture based on request-response messaging with retry mechanism and database storage, (de)serializer based on reflection (which allows a private parameterless constructor), object relational mapper similar to Entities Framework to (de)serialize objects from / to the database, advanced cryptography.
The patterns used for the development of the messenger include the cheap and the expensive patterns, except for unit tests. A pessimistic handling of all scenarios has been used. Objects are always created in a valid state by using constructors with parameters or validated deserialization. The main reason for which these patterns have been employed and maintained is that this was a security related project in which bugs may have severe consequences.
The used expensive patterns have roughly doubled the lean development time.
The number of bugs found 8 months after release was 24, the most important of which would not have been detected by unit tests because the affected features were either working properly but were also affecting other data, or were logical flaws not data state mishandling.
Published on 28.08.2014