A common coding error in a CrowdStrike Falcon update caused critical system outages around the world starting on Friday July 19th 2024. The culprit? A Null Pointer Dereference (also known as CWE-476) in a piece of C++ program that ran with privileged access to the Windows operating system.
X posters offered all kinds of hypothesis on the specific coding errors and access to Windows that managed to crash systems worldwide to the point of blue screen. For example, one poster on X expressed, “This is an invalid region of memory for any program. Any program that tries to read from this region WILL IMMEDIATELY GET KILLED BY WINDOWS.” Another suggested the pointer led to a loop that kept crashing the systems, while others call out memory corruption. CrowdStrike itself posted the cause as a ‘logic error’ in the execution on Windows, which doesn’t tell us much.
A Null Pointer Dereference is the likely culprit, but perhaps mixed with an uninitialized variable issue (CWE-457). Or, if the memory was clobbered, a buffer overrun might occur somewhere else (CWE-121 or CWE-122).
Ultimately, the crash was caused by a faulty line of code. So, how does a faulty line of code make it all the way through the release process without being detected through Static Application Security Testing (SAST), code reviews, unit testing, regression testing and such? And how does a faulty update get rolled out to such a large base of computers without the damaging error being detected in time to cancel the roll-out?
Beyond that, why did this piece of code have access to a privileged part of the Microsoft Windows kernel enabling it to crash the entire machine instead of just the application? I bet that the team at CrowdStrike is now taking a hard look at their software development and roll-out practices.
Since the crash, questions have poured into CodeSecure (the vendor of CodeSonar, a deep Static Application Security Testing solution) asking whether SAST can catch such an error.
Since the code that caused the problem has not been published (yet), this is a difficult question to answer. To detect this such problems, a deep SAST solution must perform cross compilation unit static analysis with abstract execution, meaning that the solution needs to consider all paths through the source code.
Not all SAST solutions have this capability, but CodeSonar, an advanced SAST tool for developers, does. CodeSonar is well known for its abstract execution capability and for the depths that it goes through to calculate all possible paths through the source code to find problems such as dereferences of NULL pointers.
Problematically, most SAST tools need to approximate all of the possible paths and perform calculations on them, while avoiding generating too many false positives (aka flagged warnings that aren’t actually real problems). Before flagging a developer to potential problems, the tool needs to develop sufficient confidence that something bad can happen as a result of that potential defect.
The biggest challenge for SAST tools involves balancing False Negatives (problems the tool fails to detect) and False Positives (warnings that aren’t actually problems).
CodeSonar’s Advanced SAST capabilities. include Null Pointer Dereference checker algorithms, which are extensive enough to flag dereferences of NULL itself, and also dereference of values close to NULL. The coding error in the CrowdStrike case involves program dereferences ‘0x9c’, or 156 bytes into possibly a struct pointed to by a pointer. CodeSonar uses its configuration value NULL_POINTER_THRESHOLD to determine when to flag a Null Pointer Dereference, default value of this being 4096.
The dereference, in this case, happened in a driver called csagent.dll. If CodeSonar has access to the driver, then it could calculate the path through the code as part of its whole program analysis capability and would have detected this … That is, depending on the complexity of the code path, the level of indirections in the path, and such. Hard to tell without having access to the CrowdStrike source code itself, but there is a good chance that CodeSonar would have caught this error if it did have that access to the source code.
CodeSonar’s advanced algorithms also scan for and detect the types of uninitialized variables and heap- and stack-based buffer overruns that might have been involved here too. Here’s an example of an uninitialized variable that formerly impacted some NASA programs (but is now fixed).
Following the outage, people also asked CodeSecure whether following the MISRA C++ 2023 coding standard would have prevented this error. The MISRA standards, usually associated with automotive software, has defined a subset of C/C++ that limit the developer to a memory safe versions of C/C++.
In answer to these questions, a couple of MISRA rules come to mind, including:
Catch-all Categories: Catch-all categories dictate that developers need to prevent undefined or unspecified behavior and limit run-time failures. Specifically, MisraC++2023:4.1.3, states that there shall be no occurrence of undefined or critical unspecified behaviour. MISRA classifies this rule as ‘Undecidable’, meaning that it is impossible for a SAST tool to fully decide everytime whether the rule is violated. CodeSonar includes a set of checkers specific to rule 4.1.3 to assist developers in avoiding such failures.
Specific Rules: Without the Falcon update code itself, we can only make educated guesses as to what happened for the pointer to become Null. That said, several MISRA rules specifically direct developers to keep code safe, easy to read, and easy to maintain in order to avoid these types errors that lead to huge problems.
For example:
7.11.1, 8.26, 8.2.7 and 11.6.1 are decidable, while 8.7.1, 8.7.2 and 11.6.2 are undecidable, meaning that for these latter rules it is impossible to guarantee that the code does not violate the rule.
In the case of the CrowdStrike Falcon update, MISRA 7.11.1 might apply here. It directs the developer not to use NULL and/or 0 for a pointer, but to use the null-pointer-constant. This helps the compiler and the SAST tool detect problems earlier—before code goes out the door and causes damage.
MISRA C++ 2023 as a standard has many rules that help developers write memory safe C++ code and such prevent a lot of problems. Use of advanced SAST in software development and deployment process is key to preventing problems in deployed software.
Advanced SAST catches many Null Pointer Dereferences, Uninitialized Variables, and Buffer Overruns before the code hits production systems, saving product companies money and embarrassment, while also protecting their customers from damage and outages like what occurred last week. If CrowdStrike had time to provide its source code, we’d be hapy to test it with CodeSonar. But I expect the CrowdStrike team may be a bit busy right now.
The post Using SAST and MISRA Memory Safety Standards to Prevent the Next CrowdStrike Debacle appeared first on CodeSecure.
*** This is a Security Bloggers Network syndicated blog from CodeSecure authored by Mark Hermeling. Read the original post at: https://codesecure.com/learn/using-sast-and-misra-memory-safety-standards-to-prevent-the-next-crowdstrike-debacle/