In today’s age of shifting left—an approach to coding that integrates security checks earlier into the software development lifecycle (SDLC)—developers are expected to be proficient at using security tools. This additional responsibility can be overwhelming for developers who don’t specialize in security. The main issue: on top of their normal responsibilities, developers have to sift through many false positive alerts to find and address the real, critical vulnerabilities.
But shifting left isn’t going anywhere. Its benefits have been proven. So, what can developers do to improve their security experience? They can start by understanding how different security tooling works, the latest advancements, and why they matter. By understanding the inner workings of a static application security testing (SAST) tool, developers can better interpret its results and fix vulnerable code, feel empowered to contribute to security discussions and decisions, and improve their relationship with security teams.
In this post, we’ll cover what our security experts, Sylwia Budzynska, Keith Hoodlet, and Nick Liffen, have written about SAST tools—from what they are and how they work—and break down why they’re important to developers who are coding in the age of security-first development.
GitHub Security Researcher Sylwia Budzynska wrote a post about common uses of SAST tools. Here’s a quick recap.
Developers and security experts rely on SAST tools to:
Expand vulnerability detection. Through a technique called variant analysis, SAST tools can find new vulnerabilities by detecting variants of a known vulnerability in different parts of the code base.
Assist with manual code reviews. In CodeQL, GitHub’s SAST tool, your code is treated and analyzed as data. This allows you to execute queries against the database to retrieve the data you want from your code, like patterns that highlight potential vulnerabilities. You can run standard CodeQL queries written by GitHub researchers and community contributors, or write your own to conduct a custom analysis.
GitHub’s security team used a combination of AI-generated models and variant analysis to discover a new vulnerability. Here’s how.
For comprehensive coverage, organizations often use SAST in tandem with other security testing, including:
Let’s start with the pros:
Now, some cons. Well, mainly the big one: false positives.
Problems and consequences | Solutions |
Your SAST tool might match vulnerable patterns in a database to patterns found in comments throughout the source code and in harmless function names. | Adding a lexical analysis function–which transforms code into tokens and ignores characters that aren’t related to the semantics of code—filters out pattern matches unrelated to the source code (like patterns found in your code comments). |
A legacy SAST tool might not be able to differentiate between input data that comes from a user (and therefore exploitable) and input data that comes from a local source (and therefore benign).
Your SAST tool might not detect when input data has been sanitized or validated as it moves throughout your source code (making the data safe). |
Abstracting your code into a hierarchical structure provides a better understanding of where input data enters and is used throughout your code. As a result, the SAST tool will better determine when input data is actually exploitable and raises fewer false positives. |
With so many false positives, developers and security experts may lose confidence in the tool’s data and get alert fatigue, which can cause them to skim past critical alerts. | A SAST tool with an alert system that can be set with custom and automated triage rules ensures that the most urgent security alerts are addressed first. Engineering teams should also be able to filter and search alerts to sift through all the results and focus on a particular type of alert. |
For further reading on false positives, read:
SQL injections—malicious SQL code that allows users to gain access to sensitive data—are a common vulnerability that is easier to find with SAST than with other testing methods. This is because SAST tools trace data flows, and SQL injections target applications that store and retrieve data in SQL databases.
Methods used by SAST tools to find vulnerabilities include:
🕵🏻♀️ Let’s focus on semantic and taint analysis, which provide more flexibility, precision, and broader coverage than signature-based matching (or other legacy methods of static analysis).
We’ll break down how an advanced SAST tool like CodeQL uses semantic and taint analysis to trace a full data flow and find vulnerabilities in your code. 👇
Here’s code that contains a user-controlled parameter, or a parameter where the user submits data:
@postmapping("/sqlinjection/attack11")
@responsebody
public Attackresult completed(@requestparam string name, @requestparam String auth_tan) {
return injectablequeryintegrity(name, auth_tan);
}
SAST tools use lexical analysis to transform code into tokens. Because tokens are categorized according to the grammar rules of a programming language, your code becomes a list of standardized parts that makes it easier to analyze. Tokenizing source code allows the SAST tool to conduct a semantic analysis that ignores characters unrelated to the semantics of your code.
To help decipher the meaning of source code and its structure, most SAST tools visualize your source code as a tree. An abstract syntax tree (AST) transforms lines of your code into a hierarchical structure to show relationships between code, which code belongs with which function, and more.
Here’s what an AST looks like and here’s how to view the AST of your source code.
Aided by an abstraction of the source code, this analysis allows the SAST tool to understand the code’s meaning and structure. As we mentioned above, a semantic analysis enables CodeQL to ignore tokens that aren’t related to the semantics of your source code. As a result, the SAST tool scans your source code (and not your code comments) for vulnerabilities.
Earlier, we noted that SQL injections enter your source code through unsanitized or invalidated user input data. But SAST tools look for vulnerabilities in the way your source code handles data, not the data itself. In other words, SAST tools scan the source code written by a developer, not input data entered by a user. This is where taint analysis comes in.
A SAST tool uses taint analysis to do three things:
Advanced SAST tools like CodeQL can evaluate how well these functions actually sanitize or validate the data, and use that judgment to decide whether or not to raise the path as a potential vulnerability. If any input data doesn’t pass through these sanitizing functions, the tool will flag the path as a potential vulnerability.
Above is an example of how CodeQL traces data flow. The two statements in the last step, Query might include code from this user input
is evidence of the CodeQL data flow at work.
The SAST tools recognizes that user-provided data, name
and auth_tan
, are directly embedded into the SQL query, SELECT * FROM employees WHERE last_name = ‘“ + name + “‘ AND auth_tan = ‘“ + auth_tan + “‘“
.
Executing this query might generate a security alert if the user input hasn’t passed through sufficient sanitizers or any sanitizers at all.
📝 Two important notes:
🔒 At the end of these four processes, engineering teams will receive security alerts from the SAST tool. To ensure teams address the most urgent security alerts first, it’s important to set custom and automated triage rules and have the ability to filter and search alerts to focus on a particular type.
Another way that some SAST tools can find vulnerabilities is when developers and security experts write queries to search for certain vulnerabilities. CodeQL, in particular, is known for its flexibility in that it allows developers to write custom queries that meet the needs of their codebase.
You can run standard CodeQL queries or write your own to conduct a custom analysis.
Learn how to practice writing your own CodeQL query with these resources:
Instead of manually pushing a code scan, developers can integrate a modern SAST tool into their current CI/CD pipeline, automating vulnerability code scans with every push or build.
A SAST tool that’s integrated into your build process while having access to your codebase means the tool can better understand the semantic elements of your code and conduct a more comprehensive taint analysis, according to Keith Hoodlet, principal security specialist at GitHub.
When you create or work with an application, you need to make sure that it handles data securely. For instance, an educational project designed to be used by students might be subject to the Children’s Online Privacy Protection Act (COPPA), which requires websites that gather data from children under 13 report any breaches to parents.
Wordplay, an educational programming language and web-based IDE is designed to be used by students, which means most project contributors are aspiring developers who are just starting to learn best practices of secure code writing. Consequently, the project needs to protect against data breaches that could expose the projects and identity of those students.
“When students submit pull requests that touch any private data, they need to know that they aren’t shipping common vulnerability patterns that might leak it,” says Amy Ko, founder.
But limited bandwidth makes it difficult to review every single line of submitted code. That’s why the Wordplay team relies on CodeQL to integrate vulnerability checks into its CI/CD pipeline.
“CodeQL is like having a community of experienced security developers regularly code reviewing our work,” Ko adds. “The key reason we use it is to expand our team’s expertise and capacity.”
In addition to using CodeQL to prevent data breaches, the Wordplay team has ideas about how it might use the tool to find patterns that could lead to accessibility issues, like lack of feedback in response to keyboard inputs. For example, CodeQL could be used to identify input sequences that don’t provide feedback.
Modern SAST tools help developers adapt to this new age of coding. As they take on additional security responsibilities in the shift left movement, developers can rely on SAST tools to trace data flows and locations of exploitable vulnerabilities throughout their projects. What’s more: as developers write more code with the help of AI coding assistants, they can feel more confident that SAST is analyzing their entire source code for vulnerabilities.
Getting to know the workings of a SAST tool, which is one of the most widely used security tools, will give developers a better understanding of its results and security alerts, and that can empower them to actively participate in security discussions and decisions—ultimately benefiting engineering and security teams alike, and organizations overall.
Harness the power of CodeQL. Learn more or get started now.