Let’s talk about buffer overflow

A buffer overflow, or buffer overrun, occurs when more extra data is put into a fixed-length buffer than the buffer can manage.

Buffer overflow is possibly the best-known form of software security vulnerability. Most software developers know what a buffer overflow vulnerability is, but buffer overflow attacks corresponding to both legacy and newly-developed applications are still quite obvious. Part of the difficulty is due to the wide category of ways buffer overflows can happen, and part is due to the error-prone procedures often used to prevent them.

In a classic buffer overflow exploit, the attacker transmits data to a program, which it stores in an undersized stack buffer. The result is that data on the call stack is overwritten, including the function’s return pointer. The data sets the value of the return pointer so that when the function returns, it gives control to malicious code contained in the attacker’s data.

Although this type of stack buffer overflow is still prevalent on some policies and in some development communities, there is a mixture of other types of buffer overflow, including heap buffer overflows and off-by-one errors between others.

At the code level, buffer overflow vulnerabilities normally require the destruction of a programmer’s theories. Many memory manipulation functions in C and C++ do not perform bounds terminating and can easily overwrite the designated bounds of the buffers they operate upon. Even bounded functions, such as strncpy(), can cause vulnerabilities when used inaccurately. The combination of memory manipulation and misguided assumptions about the size or makeup of a piece of data is the problem of most buffer overflows.

Buffer overflow vulnerabilities typically occur in code that:

- Relies on external data to control its behaviour.

- Depends upon characteristics of the data that are enforced outside of the quick scope of the code.

Is so complex that a programmer cannot accurately predict its behaviour.

This is an part of the second situation in which the code depends on sections of the data that are not verified locally. In this pattern, a function named lccopy() takes a sequence as its argument and returns a heap-allocated copy of the string with all uppercase letters converted to lowercase. The function does no bounds checking on its input because it expects str to always be smaller than BUFSIZE. If an attacker bypasses checks in the code that calls, or if a change in that code appropriates the size of str false, then lccopy() will overflow buf with the unbounded call to strcpy().

char *lccopy(const char *str) {
char buf[BUFSIZE];
char *p;strcpy(buf, str);
for (p = buf; *p; p++) {
if (isupper(*p)) {
*p = tolower(*p);
}
}
return strdup(buf);
}

The following sample code demonstrates a simple buffer overflow that is often caused by the first scenario in which the code relies on external data to control its behaviour. The code uses the gets() use to read an arbitrary number of data into a stack buffer. Because there is no way to restrict the number of data read by this function, the safety of the code depends on the user to always enter fewer than BUFSIZE characters.

...
char buf[BUFSIZE];
gets(buf);
...

This example shows how easy it is to mimic the unsafe action of the gets() function in C++ by using the >> operator to read input into a char[] string.

...
char buf[BUFSIZE];
cin >> (buf);
...

The code in this example also relies on user input to manage its behaviour, but it adds a level of indirection with the use of the bounded memory copy function memcpy(). This function accepts a destination buffer, a reference buffer, and the amount of bytes to copy. The information of buffer is filled by a bounded call to, but the user specifies the number of bytes that memcpy() copies.

...
char buf[64], in[MAX_SIZE];
printf("Enter buffer contents:\n");
read(0, in, MAX_SIZE-1);
printf("Bytes to copy:\n");
scanf("%d", &bytes);
memcpy(buf, in, bytes);
...

This code describes the third scenario in which the code is so complicated its performance cannot be easily predicted. This code is from the popular libPNG image decoder, which is used by a wide array of applications, including Mozilla and some all other browsers.

The code appears to safely perform bounds checking because it checks the size of the variable length, which it later uses to control the amount of data copied by png_crc_read(). However, immediately before it tests length, the code performs a check on png_ptr->mode, and if this check fails a warning is issued and processing continues. Since length is tested in an else if block, length would not be tested if the first check fails, and is used blindly in the call to png_crc_read(), potentially allowing a stack buffer overflow.

Although the code in this example is not the most complex we have seen, it demonstrates why complexity should be minimized in code that performs memory operations.

if (!(png_ptr->mode & PNG_HAVE_PLTE)) {
/* Should be an error, but we can cope with it */
png_warning(png_ptr, "Missing PLTE before tRNS");
}
else if (length > (png_uint_32)png_ptr->num_palette) {
png_warning(png_ptr, "Incorrect tRNS chunk length");
png_crc_finish(png_ptr, length);
return;
}
...
png_crc_read(png_ptr, readbuf, (png_size_t)length);

This example also demonstrates the third scenario in which the program’s complexity exposes it to buffer overflows. In this case, the exposure is due to the ambiguous interface of one of the functions rather than the structure of the code (as was the case in the previous example).

The getUserInfo() the function takes a username specified as a multibyte string and a pointer to a structure for user information and populates the structure with information about the user. Since Windows authentication uses Unicode for usernames, the username the argument is first converted from a multibyte string to a Unicode string.(UNLEN+1)*sizeof(WCHAR)*sizeof(WCHAR) bytes, to the unicodeUser array, which has only (UNLEN+1)*sizeof(WCHAR) bytes allocated. If the username the string includes more than UNLEN characters, the call to MultiByteToWideChar() will overflow the buffer unicodeUser.

void getUserInfo(char *username, struct _USER_INFO_2 info){
WCHAR unicodeUser[UNLEN+1];
MultiByteToWideChar(CP_ACP, 0, username, -1,
unicodeUser, sizeof(unicodeUser));
NetUserGetInfo(NULL, unicodeUser, 2, (LPBYTE *)&info);
}

Never use inherently unsafe functions, such as gets(), and avoid the use of functions that are difficult to use safely such as strcpy(). Replace unbounded functions like strcpy() with their bounded equivalents, such as strncpy() or the WinAPI functions defined in strsafe.h .

Although the careful use of bounded functions can greatly reduce the risk of buffer overflow, this migration cannot be done blindly and does not go far enough on its own to ensure security. Whenever you manipulate memory, particularly strings, get that buffer overflow vulnerabilities typically happen in code that:

- Relies on external data to control its behaviour

- Depends upon properties of the data that are enforced outside of the immediate scope of the code

- Is so complex that a programmer cannot accurately predict its behaviour.

Additionally, consider the following principles:

- Never trust an external source to provide correct control information to a memory operation.

- Never trust that properties about the data your program is manipulating will be maintained throughout the program. Sanity check data before you operate on it.

- Limit the complexity of memory manipulation and bounds-checking code. Keep it simple and clearly document the checks you perform, the assumptions that you test, and what the expected behaviour of the program is in the case that input validation fails.

- When input data is too large, be leery of truncating the data and continuing to process it. Truncation can change the meaning of the input.

Do not rely on tools, such as StackGuard, or non-executable stacks to prevent buffer overflow vulnerabilities. These approaches do not address heap buffer overflows and the more subtle stack overflows that can change the contents of variables that control the program. Additionally, many of these approaches are easily defeated, and even when they are working properly, they address the symptom of the problem and not its cause.

On Windows, less secure functions memcpy() can be replaced with their more secure versions, such as memcpy_s(). However, this still needs to be done with caution. Because parameter validation provided by the _s family of functions varies, relying on it can lead to unexpected behaviour. Furthermore, incorrectly specifying the size of the destination buffer can still result in buffer overflows.