This is part 5 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.
Posted by Maddie Stone, Project Zero
A deep-dive into the implant used by a high-tier attacker against Android devices in 2020
This post covers what happens once the Android device has been successfully rooted by one of the exploits described in the previous post. What’s especially notable is that while the exploit chain only used known, and some quite old, n-day exploits, the subsequent code is extremely well-engineered and thorough. This leads us to believe that the choice to use n-days is likely not due to a lack of technical expertise.
This post describes what happens post-exploitation of the exploit chain. For this post, I will be calling different portions of the exploit chain as “stage X”. These stage numbers refer to:
This post details stage 3, the code that runs post exploitation. Stage 3 is an ARM ELF file that expects to run as root. This stage 3 ELF is embedded in the stage 2 binary in the data section. Stage 3 is a downloader for stage 4.
As stated at the beginning, this stage, stage 3, is a very well-engineered piece of software. It is very thorough in its methods to hide its behavior and ensure that it is running on the correct targeted device. Stage 3 includes obfuscation, many anti-analysis checks, detailed logging, command and control (C2) server communications, and ultimately, the downloading and executing of Stage 4. Based on the size and modularity of the code, it seems likely that it was developed by a team rather than a single individual.
So let’s get into the fun!
Once stage 2 has successfully rooted the device and modified different security settings, it loads stage 3. Stage 3 is embedded in the data section of stage 2 and is 0x436C bytes in size. Stage 2 includes a variety of different methods to load the stage 3 ELF including writing it to /proc/self/mem. Once one of these methods is successful, execution transfers to stage 3.
This stage 3 ELF exports two functions: init and d. init is the function called by stage 2 to begin execution of stage 3. However, the main functionality for this binary is not in this function. Instead it is in two functions that are referenced by the ELF’s .init_array. The first function ensures that the environment variables PATH, ANDROID_DATA, and ANDROID_ROOT are set to expected values. The second function spawns a new thread that runs the heavy lifting of the behavior of the binary. The init function simply calls pthread_join on the thread spawned by the second function in the .init_array so it will wait for that thread to terminate.
In the newly spawned thread, first, it cleans up from the previous stage by deleting most of the environment variables that stage 2 set. Then it will kill any processes that include the word “knox” in the cmdline. Knox is a security platform that is built into Samsung devices.
Next, the code will check how often this binary has been running by reading a file that it drops on the device called state.parcel. The execution proceeds normally as long as it hasn’t been run more than 6 times on the current day. In other cases, execution changes as described in the state.parcel file section.
The binary will then iterate through the process’s open file descriptors 0-2 (usually stdin, stdout, and stderr) and points them to /dev/null. This will prevent output messages from appearing which may lead a user or others to detect the presence of the exploit chain. The code will then iterate through any other open file descriptors (/proc/self/fd/) for the process and close any that include “pipe:” or “anon_inode:” in their symlinks. It will also close any file descriptors with a number greater than 32 that include “socket:” in the link and any that don’t include /data/dalvik-cache/arm or /dev/ in the name. This may be to prevent debugging or to reduce accidental damage to the rest of the system.
The thread will then call into the function that includes significant functionality for the main behavior of the binary. It decrypts data, sets up configuration data, performs anti-analysis and debugging checks, and finally contacts the C2 server to download the next stage and executes it. This can be considered the main control loop for Stage 3.
The rest of this post explains the technical details of the Stage 3 binary’s behavior, categorized.
Stage 3 uses quite a few different layers of obfuscation to hide the behavior of the code. It uses a similar string obfuscation technique to stage 2. Another way that the binary obfuscates its behavior is that it uses a hash table to store dynamic configuration settings/status. Instead of using a descriptive string for the “key”, it uses a series of 16 AES-decrypted bytes as the “keys” that are passed to the hashing function.The binary encrypts its static configuration settings, communications with the C2, and a hash table that stores dynamic configuration setting with AES. The state.parcel file that is saved on the device is XOR encoded. The binary also includes multiple techniques to make it harder to understand the behavior of the device using dynamic analysis techniques. For example, it monitors what is mapped into the process’s memory, what file descriptors it has opened, and sends very detailed information to the C2 server.
Similar to the previous stages, Stage 3 seems to be well engineered with a variety of different techniques to make it more difficult for an analyst to determine its behavior, either statically or dynamically. The rest of this section will detail some of the different techniques.
The vast majority of the strings within the binary are obfuscated. The obfuscation method is very similar to that used in previous stages. The obfuscated string is passed to a deobfuscation function prior to use. The obfuscated strings are designated by 0x7E7E7E (“~~~”) at the end of the string. To deobfuscate these strings, we used an IDAPython script using flare_emu that emulated the behavior of the deobfuscation function on each string.
A data block within the binary, containing important configuration settings, is encrypted using AES256. It is decrypted upon entrance to the main control function. The decrypted contents are written back to the same location in memory where the encrypted contents were. The code uses OpenSSL to perform the AES256 decryption. The key and the IV are hardcoded into the binary.
Whenever this blog post refers to the “decrypted data block”, we mean this block of memory. The decrypted data includes things such as the C2 server url, the user-agent to use when contacting the C2 server, version information and more. Prior to returning from the main control function, the code will overwrite the decrypted data block to all zeros. This makes it more difficult for an analyst to dump the decrypted memory.
Once the decryption is completed, the code double checks that decryption was successful by looking at certain bytes and verifying their values. If any of these checks fail, the binary will not proceed with contacting the C2 server and downloading stage 4.
Another block of data that is 0x140 bytes long is then decrypted in the same way. This decrypted data doesn’t include any human-readable strings, but is instead used as “keys” for a hash table that stores configuration settings and status information. We’ll call this area the “decrypted keys block”. The information that is stored in the hash table can change whereas the configuration settings in the decrypted data block above are expected to stay the same throughout execution. The decrypted keys block, which serves as the hash table keys, is shown below.
00000000: 9669 d307 1994 4529 7b07 183e 1e0c 6225 .i....E){..>..b%
00000010: 335f 0f6e 3e41 1eca 1537 3552 188f 932d 3_.n>A...75R...-
00000020: 4bf4 79a4 c5fd 0408 49f4 b412 3fa3 ad23 K.y.....I...?..#
00000030: 837b 5af1 2862 15d9 be29 fd62 605c 6aca .{Z.(b...).b`\j.
00000040: ad5a dd9c 4548 ca3a 7683 5753 7fb9 970a .Z..EH.:v.WS....
00000050: fe71 a43d 78b1 72f5 c8d4 b8a4 0c9e 925c .q.=x.r........\
00000060: d068 f985 2446 136c 5cb0 d155 ad8d 448e .h..$F.l\..U..D.
00000070: 9307 54ba fc2d 8b72 ba4d 63b8 3109 67c9 ..T..-.r.Mc.1.g.
00000080: e001 77e2 99e8 add2 2f45 1504 557f 9177 ..w...../E..U..w
00000090: 9950 9f98 91e6 551b 6557 9c62 fea8 afef .P....U.eW.b....
000000a0: 18b8 8043 9071 0f10 38aa e881 9e84 e541 ...C.q..8......A
000000b0: 3fa0 4697 187f fb47 bbe4 6a76 fa4b 5875 ?.F....G..jv.KXu
000000c0: 04d1 2861 6318 69bd 7459 b48c b541 3323 ..(ac.i.tY...A3#
000000d0: 16cd c514 5c7f db99 96d9 5982 f6f1 88ee ....\.....Y.....
000000e0: f830 fb10 8192 2fea a308 9998 2e0c b798 .0..../.........
000000f0: 367f 7dde 0c95 8c38 8cf3 4dcd acc4 3cd3 6.}....8..M...<.
00000100: 4473 9877 10c8 68e0 1673 b0ad d9cd 085d Ds.w..h..s.....]
00000110: ab1c ad6f 049d d2d4 65d0 1905 c640 9f61 [email protected]
00000120: 1357 eb9a 3238 74bf ea2d 97e4 a747 d7b6 .W..28t..-...G..
00000130: fd6d 8493 2429 899d c05d 5b94 0096 4593 .m..$)...][...E.
The binary uses this hash table to keep track of important values such as for status and configuration. The code initializes a CRC table which is used in the hashing algorithm and then the hash table is initialized. The structure that manages the hashtable shown below:
struct hashtable_mgr {
int * hashtable_ptr;
int maxEntries;
int numEntries;
}
The first member of this struct points to the hash table which is allocated on the heap and has size 0x1400 bytes when it’s first initialized. The hash table uses sets of 0x10 bytes from the decrypted keys block as the key that gets passed to the hashing function.
There are two main functions that are used to interact with this hashtable throughout the binary: we’ll call them getValueFromHashtable and putValueInHashtable. Both functions take four arguments: pointer to the hashtable manager, pointer to the key (usually represented as an offset from the beginning of the decrypted keys block), a pointer for the value, and an int for the value length. Through the rest of this post, I will refer to values that are stored in the hash table. Because the key is a series of 0x10 bytes, I will refer to values as “the value for offset 0x20 in the hash table”. This means the value that is stored in the hashtable for the “key” that is 0x10 bytes and begins at the address of the start of the decrypted keys block + 0x20.
Each entry in the hashtable has the following structure.
struct hashtable_entry {
BYTE * key_ptr;
uint key_len;
uint in_use;
BYTE * value_ptr;
uint value_len;
};
I have documented the majority of the entries in the hashtable here. I use the key’s offset from the beginning of the decrypted keys block as the “key” instead of typing out the series of 0x10 bytes. As shown in the linked sheet, the hashtable contains the dynamic variables that stage 3 needs to keep track of. For example, the filename where to save stage 4 and the install and failure counts.
The hashtable is periodically written to a file named uierrors.txt as described in the Persistence section. This is to save state in case the process exits.
The whole exploit chain diligently cleans up after itself to leave as few indicators as possible of its presence. However, stage 3 does save a couple of files and adds environment variables in order to function. This is in addition to the stage 4 code which will be discussed in the “Executing the Next Stage” section. Each of the files and variables described in this section will be deleted as soon as they’re no longer needed, but they will be on a device for at least a period of time. For each of the files that are saved to the device, the directory path is often randomly selected from a set of potential paths. This makes it more time consuming for an analyst to detect the presence of the file on a device because the analyst would have to check 5 different paths for each file rather than 1.
During startup, the code will record the current time in a file named state.parcel. After it records the current time at the beginning of the file, it will then check how many times per day this has been done by reading all of the times currently in the file. If there are less than 6 entries for the current day, the code proceeds. If there are 6 entries in the file from the current day and there are at least 5 entries for each of the previous 3 days, the binary will set a variable that will tell the code to clean up and exit. If there are 6 entries for the current day and there’s at least one entry for each of the past 3 days, the binary will clean up the persistent files for both this and other stages and then do a max sleep: sleep(0xFFFFFFFF), which is the equivalent of sleeping for over 136 years.
If the effective UID is 0 (root), then the code will randomly choose one of the following paths to write the file to:
If the effective UID is not 0, then the state.parcel file will be written to whatever directory the binary is executing out of according to /proc/self/exe. The contents in state.parcel are obfuscated by XOR’ing each entry with 0xFF12EE34.
Stage 3 periodically writes the hash table that contains configuration and static information to a file named uierrors.txt. The code uses the same process as for state.parcel to decide which directory to write the file too.
Whenever the hashtable is written to uierrors.txt it is encrypted using AES256. The key is the same AES key used to decrypt the configuration settings data block, but it generates a set of 0x10 random bytes to use as the IV. The IV is written to the uierrors.txt file first and then is followed by the encrypted hash table contents. The CRC32 of the encrypted contents of the file is written to the file as the last 4 bytes.
On start-up, stage 3 will remove the majority of the environment variables set by the previous stage. It then sets its own new environment variables.
Environment Variable Name | Value |
abc | Address of the decryption data block |
def | Address of the function that will send logging messages to the C2 server |
def2 | Address of the function that adds logging messages to the error and/or informational logging message queues |
ghi | Points the the decrypted block of hashtable keys |
ddd | Address of the function that performs inflate (decompress) |
ccc | Address of the function that performs deflate (compress) |
0x10 bytes at 0x228CC | ??? |
0x10 bytes at 0x228DC | Pointer to the string representation of the hex_d_uuid |
0x10 bytes at 0x228F0 | Pointer to the C2 domain URL |
0x10 bytes at 0x22904 | Pointer to the port string for the C2 server |
0x10 bytes at 0x22918 | Pointer to the beginning of the certificate |
0x10 bytes at 0x2292C | 0x1000 |
0x10 bytes at 0x22940 | Pointer to +4AA in decrypted data block |
0x10 bytes at 0x22954 | 0x14 |
0x10 bytes at 0x22698 | Pointer to the user-agent string |
PPR | Selinux status such as “selinux-init-read-fail” or “selinux-no-mdm” |
PPMM | Set if there is no “persist.security.mdm.policy” string in /init |
PPQQ | Set if the “persist.security.mdm.policy” string is in /init |
The binary has a very detailed and mature logging mechanism. It tracks both “error” and “informational” logging messages. These messages are saved until they’re sent to the C2 server either when stage 3 is automatically reaching out to the C2 server, or “on-demand” by calling the subroutine that is saved as environment variable “def”. The subroutine saved as environment variable “def2”, adds messages to the error and/or informational message queues. There are hundreds of different logging messages throughout the binary. I have documented the meaning of some of the different logging codes here.
This code is very diligent with trying to clean up its tracks, both while it's running and once it finishes. While it’s running, the binary forks a new process which runs code that is responsible for cleaning up logs while the other code is executing. This other process does the following to clean up stage 3’s tracks:
There are also a couple of different functions that clean up all potential dropped files from both this stage and other stages and remove the set environment variables.
The whole point of this binary is to download the next stage from the command and control (C2) server. Once the previous unpacking steps and checks are completed, the binary will begin preparing the network communications. First the binary will perform a DNS test, then gather device information, and send the POST request to the C2 server. If all these steps are successful, it will receive back the next stage and prepare to execute that.
Prior to reaching out to the C2 server, the binary performs a DNS test. It takes a pointer to the decrypted data block as its argument. First the function generates a random hostname that is between 8-16 lowercase latin characters. It then calls getaddrinfo on this random hostname. It’s trying to find a host that will cause getaddrinfo to return EAI_NODATA, meaning that no address information could be found for that host. It will attempt 3 different addresses before it will bail if none of them return EAI_NODATA. Some disconnected analysis sandboxes will respond to all hostnames and so the code is trying to detect this type of malware analysis environment.
Once it finds a hostname that returns EAI_NODATA, stage 3 does a DNS query with that hostname. The DNS server address is found in the decrypted block in argument 1 at offset 0x14C7. In this binary that is 8.8.8.8:53, the Google DNS server. The code will connect to the DNS server via a socket and then send a Type A query for the randomly generated host name and parse the response. The only acceptable response from the server is NXDomain, meaning “Non-Existent Domain”. If the code receives back NXDomain from the DNS server, it will proceed with the code path that communicates with the C2 Server.
The C2 server hostname and port is read from the decrypted data block. The port number is at offset 0x84 and the hostname is at offset 0x4.
The binary first connects via a socket to the C2 server, then connects with SSL/TLS. The SSL/TLS certificate, a root certificate, is also in the decrypted data block at offset 0x4C7. The binary uses the OpenSSL library.
Once it successfully connects to the C2 server via SSL/TLS, the binary will then begin collecting all the device information that it would like to send to the C2 server. The code collects A LOT of data to be sent to the C2 server. Six different sets of information are collected, formatted, compressed, and encrypted prior to sending to the remote server. The different “sets” of data that are collected are:
For this set, the binary is collecting device characteristics such as the Android version, the serial number, model, battery temperature, st_mode of /dev/mem and /dev/kmem, the contents of /proc/net/arp and /proc/net/route, and more. The full list of device characteristics that are collected and sent to the server are documented here.
The binary uses a few different methods for collecting this data. The most common is to read system properties. They have 2 different ways to read system properties:
To get the device ID, subscriber ID, and MSISDN, the binary uses the service call shell command. To call a function from a service using this API, you need to know the code for the function. Basically, the code is the number that the function is listed in the AIDL file. This means it can change with each new Android release. The developers of this binary hardcoded the service code for each android SDK version from 8 (Froyo) through 29 (Android 10). For example, the getSubscriberId code in the iphonesubinfo service is 3 for Android SDK version 8-20, the code is 5 for SDK version 21, and the code is 7 for SDK versions 22-29.
The code also collects detailed networking information. For example, it collects the MAC address and IP address for each interface listed under the /sys/class/net/ directory.
To collect information about the applications installed on the device, the binary will send all of the contents of /data/system/packages.xml to the C2 server. This XML file includes data about both the user-installed and the system-installed packages on the device.
To gather information about the physical location of the device, the binary runs dumpsys location in a shell. It sends the full output of this data back to the C2 server. The output of the dumpsys location command includes data such as the last known GPS locations.
The binary collects information about the status of the exploits and subsequent stages (including this one) to send back to the C2 server. Most of these values are obtained from the hash storage table. There are 22 value pairs that are sent back to the server. These values include things such as the installation time and the “repair count”, the build id, and the major and minor version numbers for the binary. The full set of data that is sent to the C2 server is available here.
The binary sends information about every single running process back to the C2 server. It will iterate through each directory under /proc/ and send back the following information for each process:
As described in the Error Processing section, whenever the binary encounters an error, it creates an error message. The binary will send a maximum of 0x1F of these error messages back to the C2 server. It will also send a maximum of 0x1F “informational” messages back to the server. “Info” messages are similar to the error messages except that they are documenting a condition that is less severe than an error. These are distinctions that the developers included in their coding.
Once all of the “sets” of information are collected, they are compressed using the deflate function. The compressed “messages” each have the following compressedMessage structure. The messageCode is a type of identification code for the information that is contained in the message. It’s calculated by calculating the crc32 value for the 0x10 bytes at offset 0x1CD8 in the decrypted data block and then adding the “identification code”.
struct compressedMessage {
uint compressedDataLength;
uint uncompressedDataLength;
uint messageCode;
BYTE * dataPointer;
BYTE[4096] data;
};
Once each of the messages, or sets of data, have been individually compressed into the compressedMessage struct, the byte order is swapped to change the endianness and then the data is all encrypted using AES256. The key from the decrypted data block is used and the IV is a set of 0x10 random bytes. The IV is prepended to the beginning of the encrypted message.
The data is sent to the server as a POST request. The full header is shown below.
POST /api2/v9/pass HTTP/1.1
User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; SM-G600FY Build/LRX22C) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/3.0 Chrome/38.0.2125.102 Mobile Safari/537.3
Host: REDACTED:443
Connection: keep-alive
Content-Type:application/octet-stream
Content-Length:%u
Cookie: %s
The “Cookie” field is two values from the decrypted data block: sid and uid. The values for these two keys are base64 encoded values from the decrypted data block.
The body of the POST request is all of the data collected and compressed in the section above. This request is then sent to the C2 server via the SSL/TLS connection.
The response received back from the server is parsed. If the HTTP Response Code is not 200, it’s considered an error. The received data is first decrypted using AES256. The key used is the key that is included in the decrypted data block at offset 0x48A and the IV is sent back as the first 0x10 bytes of the response. After being decrypted, the byte order is swapped using bswap32 and the data is then decompressed using inflate. This inflated response body is an executable file or a series of commands.
The binary will also store and delete cookies for the C2 server domain and the exploit server domain. First, the binary will delete the cookie for the hostname of the exploit server that is the following name/value pair: session=<XXX>. This name/value is hardcoded into the decrypted data block within the binary. Then it will re-add that same cookie, but with an updated last accessed time and expire time.
As stated previously, stage 3’s role in the exploit chain is to check that the binary is not being analyzed and if not, collect detailed device data and send it to the C2 server to receive back the next stage of code and commands that should be executed. The detailed information that is sent back to the C2 server is likely used for high-fidelity targeting.
The developers of stage 3 purposefully built in a variety of different ways that the next stage of code can be executed: a series of commands passed to system or a shared library ELF file which can be executed by calling dlopen and dlsym, and more. This section will detail the different ways that the C2 server can instruct stage 3 to save and begin executing the next stage of code.
If the POST request to the C2 server is successful, the code will receive back either an executable file or a set of commands which it’ll “process”. The response is parsed differently based on the “message code” in the header of the response. This “message code” is similar to what was described in the “Constructing the Request” section. It’s an identification code + the CRC32 of the 0x10 bytes at 0x25E30. When processing the response, the binary calculates the CRC32 of these bytes again and subtracts them from the message code. This value is then used to determine how to treat the contents of the response. The majority of the message codes distinguish different ways for the response to be saved to the device and then be executed.
There are a few functions that are commonly used by multiple message codes, so they are described here first.
func1 - Writes the response contents to files in both the /data/dalvik-cache/arm and /mnt directories.
This function does the following:
func2 - Fork child process and inject code using ptrace.
This function forks a new process where the child will call the function init from an ELF library, then the parent will inject the code from the response into the child process using ptrace. The ELF library that is opened with dlopen and then init is called on is named /system/bin/%016lx%016lx with both values being the address of the buffer pointer.
func3 - Writes the buffer of the reply contents to file and sets the permissions and SELinux attributes.
This function will write the buffer to either the provided file path in the third argument or it will generate a new file path. If it’s generating a new temporary file name, the code will go down the following list of directory names beginning with /cache in the first directory that it can stat, it will create the temporary file using mkstemp(“%s/XXXXXX”).
After the new file is created, the code sets the permissions on the file path to those supplied to the function as the fourth argument. Then it will set the SELinux attributes of the file to those passed in in the fifth argument.
The following section gives a simplified summary of how the response from the C2 server is handled based on the response’s message code:
That concludes our analysis of Stage 3 in the Android exploit chain. We hypothesize that each Stage 2 (and thus Stage 3) includes different configuration variables that would allow the attackers to identify which delivered exploit chain is calling back to the C2 server. In addition, due to the detailed information sent to the C2 prior to stage 4 being returned to the device it seems unlikely that we would successfully determine the correct values to have a “legitimate” stage 4 returned to us.
It’s especially fascinating how complex and well-engineered this stage 3 code is when you consider that the attackers used all publicly known n-days in stage 2. The attackers used a Google Chrome 0-day in stage 1, public exploit for Android n-days in stage 2, and a mature, complex, and thoroughly designed and engineered stage 3. This leads us to believe that the actor likely has more device-specific 0-day exploits.
This is part 5 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To continue reading, see In The Wild Part 6: Windows Exploits.