Software contains bugs - this truth has held up ever since the inception of programming. With the ever increasing complexity in software development today, there are so many possible causes for errors that it is unfeasible for any one person to imagine everything that could go wrong with every change of code or infrastructure. In order to produce software with confidence, companies need to use multiple types of testing to ensure the quality, correctness and stability of their products.
Static analysis
Often overlooked, static analysis is a type of software testing that works solely by inspecting the source code, without actually executing any of it. Static analysis is mainly used to find common errors made by developers in a rush, like typos in variable names, null pointers, uninitialized arrays etc. It can find performance-related issues like memory leaks or scope issues, and even security exploits like sql injections. Running static analysis on source code is extremely cheap compared to other testing types, and is often integrated into developer workflows directly, for example combined with a linting or code formatting pipeline.
It requires no further actions from the developers to get started and can be fully automated, but false positives and potential configuration adjustments can initially increase development time.
Unit testing
Modern software development is done by groups of teams working on the same codebase simultaneously, meaning a function written once may be changed any number of times over time, by different developers. It is easy to overlook edge case scenarios or expected behavior when adjusting a function to account for new features or requirements, especially when that change seems minor or needs to be done within a deadline.
Unit testing helps mitigate these issues by letting developers write tests for their functions, to ensure that they behave correctly. For example a developer writing a function to validate email addresses would add several unit tests to it to ensure that valid addresses are accepted, but invalid ones are rejected. If later another developer needs to adjust the function, the tests would remain in place and alert them if their new version had changed some of the previous assumptions or behaviors.
Unit testing is a great way to enable teams of developers to move fast and change existing code with confidence, but it does come at a significant cost in terms of productivity: every developer now not only needs to write the logic needed by the application, but also many lines of code to verify it works as expected. The upfront cost to introduce unit testing to an application is quite high, but makes maintaining the application easier in the long run. Test quality also hinges on developers, as only tests they write are confirmed to work; if a developer doesn't think of an edge case, software could still break after changes.
When code depends on external resources (databases, apis, storage services etc), unit testing needs to use a fake version of them to test functions, called a mock implementation. These mocks can have stark downsides, as they do not fully validate the code works when using a real dependency (for example an SQL mock wouldn't guarantee the code works with a real SQL database), and writing or setting up mocks for a unit test can be a lot of work depending on the extend of the mocked behavior. Tests that heavily rely on mocking are often better served using integration tests instead to reduce developer workload while increasing test accuracy.
This type of testing also does nothing to catch logical errors caused by the interaction of multiple functions or external dependencies, thus making unit testing insufficient as the only type of testing.
Performance testing
There are three primary types of performance testing, focusing on validation, measurement and efficiency, respectively.
The first type validates a software performs as expected in known conditions. A classic example is using tools like valgrind to observe a program at runtime and ensure it does not leak memory over time. Other types of testing can be used as well to ensure an application stays within a certain processing budget or verify caching mechanisms. Another type of validation in this ca is regression testing, ensuring that a new version of the software does not degrade performance unexpectedly compared to the previous one.
Secondly, performance testing includes a category of stress tests to measure how a system performs under heavy or maximum load conditions. Often used to test the theoretical limits of networked software, it can help identify safe limits for production environments or thresholds for alerts or automatic scaling triggers.
Lastly, load testing is very similar to stress testing in that it observes how the application works under load, but focusing on internal behavior rather than measuring throughput. This type of testing focuses on ensuring load-related mechanisms like load-balancing and distribution work properly, and to identify bottlenecks or underused resources in distributed systems.
Integration testing
Integration tests take a different approach than the previous testing types, in that they don't know or care what the internal structure of the application looks like. Instead, they simply confirm that a process works as intended and the components within an application work together as expected.
There are two approaches to integration tests: integrated unit tests and end-to-end tests.
Integrated unit tests are very similar to normal unit tests, but are hidden behind build tags or flags and test components with real dependencies rather than mocking. For example. a function creating user accounts wouldn't use a simulated mock database, but instead connect to a real database and ensure it works as expected and produces the desired results. While slightly more complex for the ci pipeline, it has significant advantages over mocked unit tests.
End-to-end tests, also called blackbox tests, typically run the entire application and all dependencies like databases and caches in an environment as close to the production deployment as possible, then run tests against it to ensure complete workflows are functional, for example a user logging in, putting products in their shopping cart and checking out. This type of test is performed against a production version of the application (compiled/bundled) and doesn't have access to the source code or separate units anymore.
The aim of integration tests is not to pinpoint why an issue occurs or where it originates from, but rather to serve as a smoke test to catch major issues or show stoppers before deploying a new release to a production environment.
These tests can be further extended by mechanisms like rolling deployments / rollouts, where new releases are softly rolled out to a percentage to the live traffic while monitoring it works correctly and performs as expected, slowly increasing the traffic share it receives up to 100% as long as it behaves as expected, but with the option to immediately revert back to the old version if issues arise during deployment.
Integration tests can vary wildly in scope and complexity, with more extensive tests requiring separate infrastructure for testing purposes, introducing overhead on operating personnel and company cost, making it unsuitable for small companies or low-risk products.
Simulation testing
Verifying that a software's logic works as expected under normal conditions is important, but what happens if the environment itself is faulty? Where integration tests confirm that a program works when networking, disks and dependencies are normal, simulation testing intentionally injects faults into these to confirm the application doesn't cause severe issues like data loss or corruption during error conditions.
There are two approaches to simulation testing: deterministic simulation testing and chaos engineering.
Chaos engineering, as the name implies, works by causing chaos in the environment: unplugging virtual network connections, turning servers off, injecting packet loss and so on, at random. It is a minimal complexity testing approach to find problems , but the cheap price comes at decreased reliability. It is not guaranteed to actually find all existing issues, within what time frame, meaning they often have to run over long periods of time for confidence. Secondly, debugging a found issue can be tricky, because not all chaos engineering approaches make found issues reproducible or traceable.
As a more complete approach, deterministic simulation testing works by injecting predefined faults intentionally, in a reproducible way so the exact steps taken to arrive at a found issue can be traced and replayed over and over again during debugging. Using this approach is the inverse of chaos engineering, causing much more overhead to create the tests but saving time when locating and debugging found issues. Deterministic simulation testing not only requires that operators think of and implement test scenarios, but also that the application itself it deterministic (behaves the same every time). Achieving this can be tricky or require large software rewrites, because some portions (e.g. cryptographic functions based on randomness sources) need to be adjusted to work with dummy/mock implementations for non-deterministic portions (e.g. using a pseudo-random number generator using an algorithm with a static seed instead of a cryptographically secure randomness source).
Fuzzing
While not applicable for all applications, fuzzing can help some software to meet stability requirements with less strain on developer time. It works by generating data to throw at the testing application, either fully random or pseudo random. Fuzz testing can take many forms, for example generating a million pseudo email addresses within a unit test or hundreds of fake product names for an integration test.
Fuzzing has two benefits: first of all, it cuts down on the time needed to create test scenarios, secondly it often tests edge cases that developers did not think about themselves. It is important to understand that while fuzzing can be a valuable tool to test software, it is also unreliable: generated data based on any kind of randomness means the test is likely not easily reproducible and can make debugging more difficult. Also noteworthy is the fact that while fuzzing can find edge cases that escaped developer planning, there is no guarantee it does so (or when) - it may randomly generate a problematic test value on the first run, or the 100th, or never at all.
Security testing
Ensuring the security of products is an essential part of the software development and operation pipeline, especially for hosted services or software working with sensitive data (think personal data, medical records, financial transactions, ...).
As initially discussed, static analysis can help spot mistakes very early in the testing cycle just by processing the raw source code. It is a good first start, but nowhere near complete on its own to achieve reasonably secure application states.
A second type of security testing involves scanners that are applied directly after some build steps. A good example of this is clair, which is used to scan containers images for known vulnerabilities. Such vulnerabilities often don't come from the application itself, but rather dependencies like installed packages or vulnerable base images. Automatic detection of these provides operators with a better understanding of the threat surface, and allows them to decide if the vulnerability is applicable to them, if they can mitigate them at runtime/during operation (for example by block certain ports or setting proper permissions), or requires an immediate fix.
As a third option, end-to-end tests can incorporate automated penetration tests by deploying the application to a staging area that mimicks the real deployment environment, then running automated security scanners on or against it. Lynis is a great example of a tool that finds misconfigurations of the resulting linux operating system, while tools like sqlmap or metasploit provide more specific exploits that can be run against the staging system. This form of security testing is quite involved and requires a solid understanding of possible attack vectors to set up properly.
Penetration testing using human testers is also used by larger companies, because it provides a much better picture and catches critical vulnerabilities like zero-day exploits or vulnerabilities that are created by combining several services (databases, caches, dns services etc) in unexpected ways, but the cost of this extensive human labor from highly skilled professionals leaves it out of the budget range for small or medium companies.
Finding the right approach to testing software depends heavily on the company, it's clients and the type of software they are building. A mobile game has vastly different testing, correctness and performance requirements than a database for a financial service provider. The complexity and time spent on testing and test-related tasks like writing unit tests needs to be carefully weighed against the possible damage an unidentified issue in the software could cause, and what consequences that may have (outages, liability claims, data loss etc).
As software differs, so must their testing pipelines and requirements.