By Paweł Płatek
We are publishing a set of custom CodeQL queries for Go and C. We have used them to find critical issues that the standard CodeQL queries would have missed. This new release of a continuously updated repository of CodeQL queries joins our public Semgrep rules and Automated Testing Handbook in an effort to share our technical expertise with the community.
For the initial release of our internal CodeQL queries, we focused on issues like misused cryptography, insecure file permissions, and bugs in string methods:
Language | Query name | Vulnerability description |
Go | Message not hashed before signature verification | This query detects calls to (EC)DSA APIs with a message that was not hashed. If the message is longer than the expected hash digest size, it is silently truncated. |
Go | File permission flaws | This query finds non-octal (e.g., 755 vs 0o755 ) and unsupported (e.g., 04666 ) literals used as a filesystem permission parameter (FileMode ). |
Go | Trim functions misuse |
This query finds calls to string.{Trim,TrimLeft,TrimRight} with the second argument not being a cutset but a continuous substring to be trimmed. |
Go | Missing MinVersion in tls.Config |
This query finds cases when you do not set the tls.Config.MinVersion explicitly for servers. By default, version 1.0 is used, which is considered insecure. This query does not mark explicitly set insecure versions. |
C | CStrNFinder |
This query finds calls to functions that take a string and its size as separate arguments (e.g., strncmp , strncat ) but the size argument is wrong. |
C | Missing null terminator | This query finds incorrectly initialized strings that are passed to functions expecting null-byte-terminated strings. |
CodeQL is the static analysis tool powering GitHub Advanced Security and is widely used throughout the community to discover vulnerabilities. CodeQL operates by transforming the code being tested into a database that is queryable using a Datalog-like language. While the core engine of CodeQL remains proprietary and closed source, the tool offers open-source libraries implementing various analyses and sets of security queries.
To test our queries, install the CodeQL CLI by following the official documentation. Once the CodeQL CLI is ready, download Trail of Bits’ query packs and check whether the new queries are detected:
codeql pack download trailofbits/{cpp,go}-queries codeql resolve qlpacks | grep trailofbits
Now go to your project’s root directory and generate a CodeQL database, specifying either go
or cpp
as the programming language:
codeql database create codeql.db --language go
If the generation hasn’t succeeded or the project has a complex build system, use the command
flag. Finally, execute Trail of Bits’ queries against the database:
codeql database analyze database.db --format=sarif-latest --output=./tob.sarif -- trailofbits/go-queries
Output of the analysis is in the Static Analysis Results Interchange Format (SARIF). Use Visual Studio Code with SARIF Viewer plugin to open it and triage findings. Alternatively, upload results to GitHub or use --format
csv
to get results in text form.
Let’s sign the /etc/passwd file using ECDSA. Is the following implementation secure?
func main() { privateKey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader) if err != nil { panic(err) } data, err := os.ReadFile("/etc/passwd") if err != nil { panic(err) } sig, err := ecdsa.SignASN1(rand.Reader, privateKey, data) if err != nil { panic(err) } fmt.Printf("signature: %x\n", sig) valid := ecdsa.VerifyASN1(&privateKey.PublicKey, data, sig) fmt.Println("signature verified:", valid) }
Figure 1: An example signature generation and verification function
Of course it isn’t. The issue lies in passing raw, unhashed, and potentially long data to the ecdsa.SignASN1
and ecdsa.VerifyASN1
methods, while the Go crypto/ecdsa
package (and a few other packages) expects data for signing and verification to be a hash of the actual data.
This behavior means that the code signs and verifies only the first 32 bytes of the file, as the size of the P-256 curve used in the example is 32 bytes.
The silent truncation of input data occurs in the hashToNat
method, which is used internally by the ecdsa.{SignASN1,VerifyASN1}
methods:
// hashToNat sets e to the left-most bits of hash, according to // SEC 1, Section 4.1.3, point 5 and Section 4.1.4, point 3. func hashToNat[Point nistPoint[Point]](c *nistCurve[Point], e *bigmod.Nat, hash []byte) { // ECDSA asks us to take the left-most log2(N) bits of hash, and use them as // an integer modulo N. This is the absolute worst of all worlds: we still // have to reduce, because the result might still overflow N, but to take // the left-most bits for P-521 we have to do a right shift. if size := c.N.Size(); len(hash) > size { hash = hash[:size]
Figure 2: The silent truncation of input data (crypto/ecdsa/ecdsa.go)
We have seen this vulnerability in real-world codebases and the impact was critical. To address the issue, there are a couple of approaches:
go-ethereum
library.
func VerifySignature(pubkey, msg, signature []byte) bool { if len(msg) != 32 || len(signature) != 64 || len(pubkey) == 0 { return false }
Figure 3: Validation function from the go-ethereum library
(go-ethereum/crypto/secp256k1/secp256.go#126–129)
tob/go/msg-not-hashed-sig-verify
query, which detects all data flows to potentially problematic methods, ignoring flows that initiate from or go through a hashing function or slicing operation.
An interesting problem we had to solve was how to set starting points (sources) for the data flow analysis? We could have used the UntrustedFlowSource
class for that purpose. Then the analysis would be finding flows from any input potentially controlled by an attacker. However, UntrustedFlowSource
often needs to be extended per project to be useful, so using it for our analysis would result in a lot of flows missed for a lot of projects. Therefore, our query focuses on finding the longest data flows, which are more likely to indicate potential vulnerabilities.
Can you spot a bug in the code below?
if err := os.Chmod(“./secret_key”, 400); err != nil { return }
Figure 4: Buggy Go code
Okay, so file permissions are usually represented as octal integers. In our case, the secret key file would end up with the permission set to 0o620
(or rw--w----
), allowing non-owners to modify the file. The integer literal used in the call to the os.Chmod
method is—most probably—not the one that a developer wanted to use.
To find unexpected integer values used as FileModes
, we implemented a WYSIWYG (“what you see is what you get”) heuristic in the tob/go/file-perms-flaws
CodeQL query. The “what you see” is a cleaned-up integer literal (a hard-coded number of the FileMode
type)—with removed underscores, a removed base prefix, and left-padded zeros. The “what you get” is the same integer converted to an octal representation. If these two parts are not equal, there may be a bug present.
// what you see fileModeAsSeen = ("000" + fileModeLitStr.replaceAll("_", "").regexpCapture("(0o|0x|0b)?(.+)", 2)).regexpCapture("0*(.{3,})", 1) // what you get and fileModeAsOctal = octalFileMode(fileModeInt) // what you see != what you get and fileModeAsSeen != fileModeAsOctal
Figure 5: The WYSIWYG heuristic in CodeQL
To minimize false positives, we filter out numbers that are commonly used constants (like 0755
or 0644
) but in decimal or hexadecimal form. These known, valid constants are explicitly defined in the isKnownValidConstant predicate
. Here is how we implemented this predicate:
predicate isKnownValidConstant(string fileMode) { fileMode = ["365", "420", "436", "438", "511", "509", "493"] or fileMode = ["0x16d", "0x1a4", "0x1b4", "0x1b6", "0x1ff", "0x1fd", "0x1ed"] }
Figure 6: The CodeQL predicate that filters out common file permission constants
Using non-octal representation of numbers isn’t the only possible pitfall when dealing with file permissions. Another issue to be aware of is the use of more than nine bits in calls to permission-changing methods. File permissions are encoded only as the first nine bits, and the other bits encode file modes such as sticky
bit
or setuid
. Some permission changing methods—like os.Chmod
or os.Mkdir
—ignore a subset of the mode bits, depending on the operating system. The tob/go/file-perms-flaws
query warns about this issue as well.
API ambiguities are a common source of errors, especially when there are multiple methods with similar names and purposes accepting the same set of arguments. This is the case for Go’s strings.Trim
family of methods. Consider the following calls:
strings.TrimLeft("file://FinnAndHengest", "file://") strings.TrimPrefix("file://FinnAndHengest", "file://")
Figure 7: Ambiguous Trim methods
Can you tell the difference between these calls and determine which one works “as expected”?
According to the documentation, the strings.TrimLeft
method accepts a cutset
(i.e., a set of characters) for removal, rather than a prefix. Consequently, it deletes more characters than one would expect. While the above example may seem innocent, a bug in a cross-site scripting (XSS) sanitization function, for example, could have devastating consequences.
When looking for misused strings.Trim{Left,Right}
calls, the tricky part is defining what qualifies as “expected” behavior. To address this challenge, we developed the tob/go/trim-misuse
CodeQL query with simple heuristics to differentiate between valid and possibly mistaken calls, based on the cutset
argument. We consider a Trim
operation invalid if the argument contains repeated characters or meets all of the following conditions:
While the heuristics look oversimplified, they worked well enough in our audits. In CodeQL, the above rules are implemented as shown below. The cutset
is a variable corresponding to the cutset
argument of a strings.Trim{Left,Right}
method call.
// repeated characters imply the bug cutset.length() != unique(string c | c = cutset.charAt(_) | c).length() or ( // long strings are considered suspicious cutset.length() > 2 // at least one alphanumeric and exists(cutset.regexpFind("[a-zA-Z0-9]{2}", _, _)) // exclude probable false-positives and not cutset.matches("%1234567%") and not cutset.matches("%abcdefghijklmnopqrstuvwxyz%") )
Figure 8: CodeQL implementation of heuristics for a Trim operation
Interestingly, misuses of the strings.Trim
methods are so common that Go developers are considering deprecating and replacing the problematic functions.
When using static analysis tools, it’s important to know their limitations. The official go/insecure-tls
CodeQL query finds TLS configurations that accept insecure (outdated) TLS versions (e.g., SSLv3, TLSv1.1). It accomplishes that task by comparing values provided to the configuration’s MinVersion
and MaxVersion
settings against a list of deprecated versions. However, the query does not warn about configurations that do not explicitly set the MinVersion
.
Why should this be a concern? The reason is that the default MinVersion
for servers is TLSv1.0. Therefore, in the example below, the official query would mark only server_explicit
as insecurely configured, despite both servers using the same MinVersion
.
server_explicit := &http.Server{ TLSConfig: &tls.Config{MinVersion: tls.VersionTLS10} } server_default := &http.Server{TLSConfig: &tls.Config{}}
Figure 9: Explicit and default configuration of the MinVersion setting
The severity of this issue is rather low since the default MinVersion
for clients is a secure TLSv1.2. Nevertheless, we filled the gap and developed the tob/go/missing-min-version-tls
CodeQL query, which detects tls.Config
structures without the MinVersion
field explicitly set. The query skips reporting configurations used for clients and limits false positives by filtering out findings where the MinVersion
is set after the structure initialization.
Building on top of the insightful cstrnfinder
research conducted by one of my Trail of Bits colleagues, we developed the tob/cpp/cstrnfinder
query. This query aims to identify invalid numeric constants provided to calls to functions that expect a string and its corresponding size as input—such as strncmp
, strncpy
, and memmove
. We focused on detecting three erroneous cases:
if (!strncmp(argv[1], "org/tob/test/SafeData", 20)) { puts("Secure"); } else { puts("Not secure"); }
Figure 10: A buffer underread bug example
Here, the length of the "org/tob/test/SafeData"
string is 21 bytes (22 if we count the terminating null byte). However, we are comparing only the first 20 bytes. Therefore, a string like "org/tob/test/SafeDatX"
is incorrectly matched.
int check(const char *password) { const char pass[] = "Silmarillion"; return memcmp(password, pass, 14); }
Figure 11: A buffer overread bug example
In the example, the length of the "Silmarillion"
string is 12 bytes (13 with the null byte). If the password is longer than 13 bytes and starts with the "Silmarillion"
substring, then the memcmp
function reads data outside of the pass buffer. While functions operating on strings stop reading input buffers on a null byte and will not overread the input, the memcmp
function operates on bytes and does not stop on null bytes.
BUFSIZE-1
in the example below) is greater than the source string’s length (the length of “, Beowulf\x00”
, so 10 bytes), the size argument may be incorrectly interpreted as the destination buffer’s size (BUFSIZE
bytes in the example), instead of the input string’s size. This may indicate a buffer overflow vulnerability.
#define BUFSIZE 256 char all_books[BUFSIZE]; FILE *books_f = fopen("books.txt", "r"); fgets(all_books, BUFSIZE, books_f); fclose(books_f); strncat(all_books, ", Beowulf", BUFSIZE-1); // safe version: strncat(all_books, ", Beowulf", BUFSIZE-strlen(dest)-1);
Figure 12: A strncat function misuse bug example
In the code above, the all_books
buffer can hold a maximum 256 bytes of data. If the books.txt
file contains 250 characters, then the remaining space in the buffer before the call to the strncat
function is 6 bytes. However, we instruct the function to add up to 255 (BUFSIZE-1
) bytes to the end of the all_books
buffer. Therefore, a few bytes of the “, Beowulf”
string will end up outside the allocated space. What we should do instead is instruct the strncat
to add at most 5 bytes (leaving 1 byte for the terminating \x00
).
There is a similar built-in query with ID cpp/unsafe-strncat
, but it doesn’t work with constant sizes.
Both C and C++ allow developers to construct fixed-size strings with an initialization literal. If the length of the literal is greater than or equal to the allocated buffer size, then the literal is truncated and the terminating null byte is not appended to the string.
char b1[18] = "The Road Goes Ever On"; // missing null byte, warning char b2[13] = "Ancrene Wisse"; // missing null byte, NO WARNING char b3[] = "Farmer Giles of Ham"; // correct initialization char b4[3] = {'t', 'o', 'b'} // not a string, lack of null byte is expected
Figure 13: Example initializations of C strings
Interestingly, C compilers warn against initializers longer than the buffer size, but don’t raise alarms for initializers of a length equal to the buffer size—even though neither of the resulting strings are null-terminated. C++ compilers return errors for both cases.
The tob/cpp/no-null-terminator
query uses data flow analysis to find incorrectly initialized strings passed to functions expecting a null-terminated string. Such function calls result in out-of-bounds read or write vulnerabilities.
This will be a continuing project from Trail of Bits, so be on the lookout for more soon! One of our most valuable developments is our expertise in automated bug finding. This new CodeQL repository, the Semgrep rules, and the Automated Testing Handbook are key methods to helping others benefit from our work. Please use these resources and report any issues or improvements to them!
If you’d like to read more about our work on CodeQL, we have used its capabilities in several ways, such as detecting iterator invalidations, identifying unhandled errors, and uncovering divergent representations.
Contact us if you’re interested in customizing CodeQL queries for your project.
*** This is a Security Bloggers Network syndicated blog from Trail of Bits Blog authored by Trail of Bits. Read the original post at: https://blog.trailofbits.com/2023/12/06/publishing-trail-of-bits-codeql-queries/