Sorry, but the answer is no. Several interesting buffer overflow flaws in Notepad++ surfaced recently, which has just been fixed, illustrates this perfectly. The problem is text is not necessarily simple. That’s because it needs to encode an awful lot of different characters in an efficient way. Various schemes have been invented to do this, and the modern way is to use one of the standard encodings for Unicode – that’s the international set of all characters used in every language on the planet.
These encodings aren’t that complicated, but they do have a lot of edge conditions waiting to entrap the unwary programmer. In the case of Notepad++, an assumption was made as to the number of characters represented by a sequence of bytes encoded using one of the common Unicode methods. If the bytes were a valid encoding of some characters, everything was fine. But if the sequence was malformed, with “half a character” tagged on the end, the size calculation went wrong, the buffer allocated was too small to contain the characters and a buffer overflow occurred, leading to remote code execution. In other words, when the application opens a carefully crafted malformed text file, the attacker gets their code to run inside the editor.
How do you defend against this kind of attack, which is exploiting flaws that are hard to find? The obvious answer is to deploy some guard function in front of your applications which checks the data they receive. In the case of a text file, it would check that the file contains a valid encoding of some text. If the data is not valid text, it either gets blocked or gets fixed up so it is valid, meaning at least some of the content is delivered.
This works, but it doesn’t solve the problem – it just moves it. Now we must worry about the guard function. It is handling the text, so how do we know it doesn’t have the same kind of flaw that was found in Notepad++? How do we know the defence isn’t itself open to an attack via the text files it is checking? – a successful attack here is bad news indeed, as it gives the attacker access to all data being exchanged and probably privileged access to the entire network.
What Forcepoint does to solve this problem is to add an additional step into the defence. The first step decodes the file to create the text it encodes, represented in the simplest way possible. The second step, which is isolated from the first, takes this simple representation and turns it into a file using the original encoding. If the data is normal, this two-step process does nothing, but if the data is malformed the first step may fail, but the second step prevents any damage propagating. Forcepoint calls this Zero Trust Content Disarm and Reconstruction (CDR), and the principle applies to text files just as much as Office and PDF.
For critical defence and infrastructure systems, where the impact of failure is extreme, yet another measure is put in place. For these systems, special hardware logic devices, placed between the two-steps of the decoding / encoding process, verify that the encoded data is valid and safe. The hardware ensures the checks cannot be bypassed and, unlike software, cannot be changed by an attack. With this in place, no malformed text file would ever reach Notepad++, so its vulnerability would be hidden from an attacker. That’s the power of Forcepoint’s Cross Domain Solutions software and hardsec defences.
Regardless, patching the software tools you use is always a recommended strategy. So, if you are a Notepad++ user, now’s a great time to install version 8.5.7, to fix the four recent vulnerabilities.