网上有不少ELF文件结构相关的文章,但大都介绍原理,具体的代码实现并不多(或许是因为有开源代码)。然而阅读开源代码不是我的强项(看的头大),于是依据当年学习PE文件结构的思路,学习ELF文件格式。
仿照 readelf 的输出结果编写解析器, 最后编写了简单的ELF加载器。
代码支持x86和x64的ELF文件:
◆解析器针对x86/x64有两套实现, 支持解析x86和x64平台的ELF文件
◆加载器依赖编译环境,只能加载对应平台的ELF文件,要分别编译x86和x64的加载器
◆内容讲解演示主要以x86为主
环境&工具:
◆VMware pro 17.6.1
◆Kali Linux 2023.4 vmware amd64
◆gcc (Debian 14.2.0-8) 14.2.0
◆CLion 2024.2.3
◆010 Editor 13.0.1
◆IDA Pro 7.7
附件:
◆Sources.zip
◆CompiledTools.zip
◆TestFiles.zip
由于本人水平有限, 内容错误之处还望大佬多多包涵, 批评指正
一
ELF文件结构概述
ELF是UNIX系统实验室(USL)作为应用程序二进制接口(Application Binary Interface,ABI)而开发和发布的,也是Linux的主要可执行文件格式, 全称是Executable and Linking Format,这个名字相当关键,包含了ELF所需要支持的两个功能——执行和链接
ELF文件包含3大部分,ELF头,ELF节,ELF段:
◆节头表指向节, 类似PE的节表, 描述各个节区的信息
◆程序头表描述段信息,一个段可以包含多个节,指导ELF文件如何映射至文件
◆在OBJ文件中,段是可选的,在可执行文件中,节是可选的,但NDK编译的ELF文件同时有段和节
ELF文件封装了部分数据类型
#include <stdint.h>typedef uint16_t Elf32_Half;
typedef uint16_t Elf64_Half;
/* Types for signed and unsigned 32-bit quantities. */
typedef uint32_t Elf32_Word;
typedef int32_t Elf32_Sword;
typedef uint32_t Elf64_Word;
typedef int32_t Elf64_Sword;
/* Types for signed and unsigned 64-bit quantities. */
typedef uint64_t Elf32_Xword;
typedef int64_t Elf32_Sxword;
typedef uint64_t Elf64_Xword;
typedef int64_t Elf64_Sxword;
/* Type of addresses. */
typedef uint32_t Elf32_Addr;
typedef uint64_t Elf64_Addr;
/* Type of file offsets. */
typedef uint32_t Elf32_Off;
typedef uint64_t Elf64_Off;
/* Type for section indices, which are 16-bit quantities. */
typedef uint16_t Elf32_Section;
typedef uint16_t Elf64_Section;
/* Type for version symbol information. */
typedef Elf32_Half Elf32_Versym;
typedef Elf64_Half Elf64_Versym;
可以发现,32和64位定义的数据结构仅有Addr和Off有位宽差距,我们可以定义对应的通用类型。
ELF数据结构 | 原始类型 | 备注 |
---|---|---|
Elfn_Half | uint16_t | |
Elfn_Word | uint32_t | |
Elfn_Sword | int32_t | |
Elfn_Xword | uint64_t | |
Elfn_Sxword | int64_t | |
Elf32_Addr | uint32_t | 地址 |
Elf64_Addr | uint64_t | |
Elf32_Off | uint32_t | 文件偏移 |
Elf64_Off | uint64_t | |
Elfn_Section | uint16_t | 节索引 |
Elfn_Versym | uint16_t |
使用gcc分别编译32/64位的elf可执行文件用于测试。
#include <stdio.h>int main(int argc, char* argv[]){
printf("Hello ELF!\n");
return 0;
}
gcc -m32 -O0 main.c -o HelloELF32
gcc -m64 -O0 main.c -o HelloELF64
编写ELF解析器/加载器前,定义文件读取函数。读取指定路径文件,返回字节指针和读取文件大小。
// 读取文件,返回buffer和读取字节数
uint8_t* readFileToBytes(const char *fileName,size_t* readSize) {
FILE *file = fopen(fileName, "rb");
if (file == NULL) {
printf("Error opening file\n");
fclose(file);
return NULL;
}
fseek(file, 0,SEEK_END);
size_t fileSize = ftell(file);
fseek(file, 0,SEEK_SET);
uint8_t *buffer = (uint8_t *) malloc(fileSize);
if (buffer == NULL) {
printf("Error allocating memory\n");
fclose(file);
return NULL;
}
size_t bytesRead = fread(buffer, 1, fileSize, file);
if(bytesRead!=fileSize) {
printf("Read bytes not equal file size!\n");
free(buffer);
fclose(file);
return NULL;
}
fclose(file);
if(readSize)
*readSize=bytesRead;
return buffer;
}
二
ELF Header
定义在elf.h中:
#define EI_NIDENT (16)
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf32_Half e_type; /* Object file type */
Elf32_Half e_machine; /* Architecture */
Elf32_Word e_version; /* Object file version */
Elf32_Addr e_entry; /* Entry point virtual address */
Elf32_Off e_phoff; /* Program header table file offset */
Elf32_Off e_shoff; /* Section header table file offset */
Elf32_Word e_flags; /* Processor-specific flags */
Elf32_Half e_ehsize; /* ELF header size in bytes */
Elf32_Half e_phentsize; /* Program header table entry size */
Elf32_Half e_phnum; /* Program header table entry count */
Elf32_Half e_shentsize; /* Section header table entry size */
Elf32_Half e_shnum; /* Section header table entry count */
Elf32_Half e_shstrndx; /* Section header string table index */
} Elf32_Ehdr;//64位
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf64_Half e_type; /* Object file type */
Elf64_Half e_machine; /* Architecture */
Elf64_Word e_version; /* Object file version */
Elf64_Addr e_entry; /* Entry point virtual address */
Elf64_Off e_phoff; /* Program header table file offset */
Elf64_Off e_shoff; /* Section header table file offset */
Elf64_Word e_flags; /* Processor-specific flags */
Elf64_Half e_ehsize; /* ELF header size in bytes */
Elf64_Half e_phentsize; /* Program header table entry size */
Elf64_Half e_phnum; /* Program header table entry count */
Elf64_Half e_shentsize; /* Section header table entry size */
Elf64_Half e_shnum; /* Section header table entry count */
Elf64_Half e_shstrndx; /* Section header string table index */
} Elf64_Ehdr;
可以使用readelf查看
16字节ELF标识,前4字节是ELF文件标识"\x7fELF",不可修改。
010editor中解析如下
1.e_ident[EI_CLASS]
2.e_ident[EI_DATA]
3.e_ident[EI_VERSION]
2字节,表明目标文件属于哪种类型
Android5.0后,可执行文件全部为so,这个标志只能为03不可修改。
/* Legal values for e_type (object file type). */#define ET_NONE 0 /* No file type */
#define ET_REL 1 /* Relocatable file */
#define ET_EXEC 2 /* Executable file */
#define ET_DYN 3 /* Shared object file */
#define ET_CORE 4 /* Core file */
#define ET_NUM 5 /* Number of defined types */
#define ET_LOOS 0xfe00 /* OS-specific range start */
#define ET_HIOS 0xfeff /* OS-specific range end */
#define ET_LOPROC 0xff00 /* Processor-specific range start */
#define ET_HIPROC 0xffff /* Processor-specific range end */
2字节,该字段用于指定ELF文件适用的处理器架构,部分定义如下, 对于intel,固定为EM_386
#define EM_NONE 0 /* No machine */
#define EM_M32 1 /* AT&T WE 32100 */
#define EM_SPARC 2 /* SUN SPARC */
#define EM_386 3 /* Intel 80386 */
#define EM_68K 4 /* Motorola m68k family */
#define EM_88K 5 /* Motorola m88k family */
#define EM_IAMCU 6 /* Intel MCU */
#define EM_860 7 /* Intel 80860 */
#define EM_MIPS 8 /* MIPS R3000 big-endian */
#define EM_S370 9 /* IBM System/370 */
#define EM_MIPS_RS3_LE 10 /* MIPS R3000 little-endian */
/* reserved 11-14 */
#define EM_PARISC 15 /* HPPA */
/* reserved 16 */
4字节,指明目标文件版本
Android不检查该字段,IDA检查,但对反汇编无影响
4或8字节,程序入口点(OEP) RVA, 如果e_type=2 即可执行程序, 则该字段为VA; 如果是so,则为0
4或8字节,程序头表偏移FOA,如果没有程序头表则该字段为0
4或8字节,节头表偏移FOA,如果没有节头表则该字段为0
Android对抗中经常会删除节表
4字节标志,无用
2字节,ELF文件头大小
Android不检查,默认ELF Header大小为52字节; IDA检查,修改该字段只会产生警告不影响反汇编
2字节,表示程序头表每一个表项的大小
2字节,表示程序头表的表项数目
2字节,节头表表项大小
2字节,节头表表项个数
2字节,节头表中与节名表相对应表项的索引
根据枚举值,定义对应的字符串数组以打印相关信息。
// Print ELF Header
char ELF_Class[3][6] = {"NONE", "ELF32", "ELF64"};
char ELF_Data[3][14] = {"NONE", "Little Endian", "Big Endian"};
char objectFileType[7][7] = {"NONE", "REL", "EXEC", "DYN", "CORE", "LOPROC", "HIPROC"};
void printELFHeader32(const Elf32_Ehdr* pElfHeader) {
printf("ELF Header:\n");
printf("\tMagic:\t");
for (int i = 0; i < EI_NIDENT; i++) {
printf("%02x ", pElfHeader[i].e_ident[i]);
}
printf("\n");
printf("\t%-36s%s\n", "Class:", ELF_Class[pElfHeader->e_ident[EI_CLASS]]);
printf("\t%-36s%s\n", "Data:", ELF_Data[pElfHeader->e_ident[EI_DATA]]);
printf("\t%-36s%#x\n", "Version:", pElfHeader->e_version);
printf("\t%-36s%#x\n", "Machine:", pElfHeader->e_machine);
printf("\t%-36s%s\n", "Type:", objectFileType[pElfHeader->e_type]);
printf("\t%-36s%#x\n", "Size Of ELF Header:", pElfHeader->e_ehsize);
printf("\t%-36s%#x\n", "Entry point:", pElfHeader->e_entry);
printf("\t%-36s%#x\n", "Start Of Program Headers:", pElfHeader->e_phoff);
printf("\t%-36s%#x\n", "Start Of Section Headers:", pElfHeader->e_shoff);
printf("\t%-36s%#x\n", "Size Of Program Headers:", pElfHeader->e_phentsize);
printf("\t%-36s%#x\n", "Number Of Program Headers:", pElfHeader->e_phnum);
printf("\t%-36s%#x\n", "Size Of Section Headers:", pElfHeader->e_shentsize);
printf("\t%-36s%#x\n", "Number Of Sections:", pElfHeader->e_shnum);
printf("\t%-36s%d\n", "Section Header String Table Index:", pElfHeader->e_shstrndx);
printf("ELF Header End\n");
}
打印效果如下
三
Section Header
类似PE文件的节表(IMAGE_SECTION_HEADER)
节表保存了节的基本属性,是ELF文件中除了文件头之外最重要的结构,编译器,链接器和装载器都依赖节表定位和访问各个节的属性。
节表数组第0个元素固定为SHN_UNDEF, 节表成员结构定义如下:
typedef struct
{
Elf32_Word sh_name; /* Section name (string tbl index) */
Elf32_Word sh_type; /* Section type */
Elf32_Word sh_flags; /* Section flags */
Elf32_Addr sh_addr; /* Section virtual addr at execution */
Elf32_Off sh_offset; /* Section file offset */
Elf32_Word sh_size; /* Section size in bytes */
Elf32_Word sh_link; /* Link to another section */
Elf32_Word sh_info; /* Additional section information */
Elf32_Word sh_addralign; /* Section alignment */
Elf32_Word sh_entsize; /* Entry size if section holds table */
} Elf32_Shdr;typedef struct
{
Elf64_Word sh_name; /* Section name (string tbl index) */
Elf64_Word sh_type; /* Section type */
Elf64_Xword sh_flags; /* Section flags */
Elf64_Addr sh_addr; /* Section virtual addr at execution */
Elf64_Off sh_offset; /* Section file offset */
Elf64_Xword sh_size; /* Section size in bytes */
Elf64_Word sh_link; /* Link to another section */
Elf64_Word sh_info; /* Additional section information */
Elf64_Xword sh_addralign; /* Section alignment */
Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr;
readelf查看节表
4字节,偏移值,通过ELF File Header.e_shstrndx拿到节表中节名称表对应项的索引
然后在节表中找到该项,找到sh_offset的文件偏移 sh_name+sh_offset即为该节名的字符串的FOA。
4字节,指示节的类型,定义如下:
/* Legal values for sh_type (section type). */#define SHT_NULL 0 /* Section header table entry unused */
#define SHT_PROGBITS 1 /* Program data */
#define SHT_SYMTAB 2 /* Symbol table */
#define SHT_STRTAB 3 /* String table */
#define SHT_RELA 4 /* Relocation entries with addends */
#define SHT_HASH 5 /* Symbol hash table */
#define SHT_DYNAMIC 6 /* Dynamic linking information */
#define SHT_NOTE 7 /* Notes */
#define SHT_NOBITS 8 /* Program space with no data (bss) */
#define SHT_REL 9 /* Relocation entries, no addends */
#define SHT_SHLIB 10 /* Reserved */
#define SHT_DYNSYM 11 /* Dynamic linker symbol table */
#define SHT_INIT_ARRAY 14 /* Array of constructors */
#define SHT_FINI_ARRAY 15 /* Array of destructors */
#define SHT_PREINIT_ARRAY 16 /* Array of pre-constructors */
#define SHT_GROUP 17 /* Section group */
#define SHT_SYMTAB_SHNDX 18 /* Extended section indices */
#define SHT_RELR 19 /* RELR relative relocations */
#define SHT_NUM 20 /* Number of defined types. */
#define SHT_LOOS 0x60000000 /* Start OS-specific. */
#define SHT_GNU_ATTRIBUTES 0x6ffffff5 /* Object attributes. */
#define SHT_GNU_HASH 0x6ffffff6 /* GNU-style hash table. */
#define SHT_GNU_LIBLIST 0x6ffffff7 /* Prelink library list */
#define SHT_CHECKSUM 0x6ffffff8 /* Checksum for DSO content. */
#define SHT_LOSUNW 0x6ffffffa /* Sun-specific low bound. */
#define SHT_SUNW_move 0x6ffffffa
#define SHT_SUNW_COMDAT 0x6ffffffb
#define SHT_SUNW_syminfo 0x6ffffffc
#define SHT_GNU_verdef 0x6ffffffd /* Version definition section. */
#define SHT_GNU_verneed 0x6ffffffe /* Version needs section. */
#define SHT_GNU_versym 0x6fffffff /* Version symbol table. */
#define SHT_HISUNW 0x6fffffff /* Sun-specific high bound. */
#define SHT_HIOS 0x6fffffff /* End OS-specific type */
#define SHT_LOPROC 0x70000000 /* Start of processor-specific */
#define SHT_HIPROC 0x7fffffff /* End of processor-specific */
#define SHT_LOUSER 0x80000000 /* Start of application-specific */
#define SHT_HIUSER 0x8fffffff /* End of application-specific */
比较常见的节类型如下:
SHT_NULL //无效节
SHT_STRTAB //本节是字符串表 ELF文件可以有多个字符串表节
SHT_RELA //重定位节
SHT_HASH //表明本节包含一张哈希表 目前一个ELF文件最多只能有一张哈希表
SHT_DYNAMIC //表明本节包含动态链接信息 目前一个目标文件最多一个dynamic节
SHT_NOBITS //表明本节内容为空,不占用实际内存空间
SHT_REL //重定位节
SHT_DYNSYM //表明本节是符号表,同SHT_SYMTAB
4字节,由一系列标志bit位组成
1.SHF_WRITE 表示本节在进程中可写
2.SHF_ALLOC 表示本节在运行中需要占用内存
3.SHF_EXECINSTR 表示本节的内容是指令代码
4.SHF_MASKPROC 被该值覆盖的位都保留做特殊处理器扩展用
4字节,节的内存虚拟地址
4字节,节的FOA
4字节,段的大小
4字节,索引值
4字节,节的附加信息
根据节类型不同,sh_info和sh_link有不同的含义。
4字节,段地址对齐值,假如为0或者1表示该段没有对齐要求; 假如为3表示对齐2^3=8
节的sh_addr必须能被sh_addralign整除,即sh_addr%sh_addralign=0
4字节,部分节的内容是一张表,每个表项的大小固定(例如符号表), 该字段指定其每个表项的大小,为0则表示不是这些表。
// Print ELF Section Headers
char *getSectionTypeString(Elf_Word sectionType) {
switch (sectionType) {
case SHT_NULL: return "NULL";
case SHT_PROGBITS: return "PROGBITS";
case SHT_SYMTAB: return "SYMTAB";
case SHT_STRTAB: return "STRTAB";
case SHT_RELA: return "RELA";
case SHT_HASH: return "HASH";
case SHT_DYNAMIC: return "DYNAMIC";
case SHT_NOTE: return "NOTE";
case SHT_NOBITS: return "NOBITS";
case SHT_REL: return "REL";
case SHT_SHLIB: return "SHLIB";
case SHT_DYNSYM: return "DYNSYM";
case SHT_INIT_ARRAY: return "INIT_ARRAY";
case SHT_FINI_ARRAY: return "FINI_ARRAY";
case SHT_PREINIT_ARRAY: return "PREINIT_ARRAY";
case SHT_GROUP: return "GROUP";
case SHT_SYMTAB_SHNDX: return "SYMTAB_SHNDX";
case SHT_RELR: return "RELR";
case SHT_NUM: return "NUM";
case SHT_LOOS: return "LOOS";
case SHT_GNU_ATTRIBUTES: return "GNU_ATTRIBUTES";
case SHT_GNU_HASH: return "GNU_HASH";
case SHT_GNU_LIBLIST: return "GNU_LIBLIST";
case SHT_CHECKSUM: return "CHECKSUM";
case SHT_LOSUNW: return "LOSUNW";
case SHT_SUNW_COMDAT: return "SUNW_COMDAT";
case SHT_SUNW_syminfo: return "SUNW_syminfo";
case SHT_GNU_verdef: return "GNU_verdef";
case SHT_GNU_verneed: return "GNU_verneed";
case SHT_GNU_versym: return "GNU_versym";
case SHT_LOPROC: return "LOPROC";
case SHT_HIPROC: return "HIPROC";
case SHT_LOUSER: return "LOUSER";
case SHT_HIUSER: return "HIUSER";
default: return "UNKNOWN";
}
}
const char* getSectionFlagStr(Elf_Word flags) {
switch (flags) {
case SHF_ALLOC: return " A";
case SHF_WRITE: return " W";
case SHF_WRITE | SHF_ALLOC: return " WA";
case SHF_EXECINSTR: return " X";
case SHF_ALLOC | SHF_EXECINSTR: return " AX";
case SHF_MASKPROC: return "MKP";
default: return " ";
}
}
void printElfSectionHeader32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,const char* pStringTable) {
printf("ELF Section Headers:\n");
printf("\t[Nr] Name\t\t\tType\t\t\tAddr\t\tOffset\t\tSize\t\tEntSize\tFlag\tLink\tInfo\tAlign\n");
for (int i = 0; i < sectionNum; i++) {
printf("\t[%2d] %-20s", i, (char *) &pStringTable[pSectionHeader[i].sh_name]);
printf("\t%-16s", getSectionTypeString(pSectionHeader[i].sh_type));
printf("\t%08x", pSectionHeader[i].sh_addr);
printf("\t%08x", pSectionHeader[i].sh_offset);
printf("\t%08x", pSectionHeader[i].sh_size);
printf("\t%x", pSectionHeader[i].sh_entsize);
printf("\t%s", getSectionFlagStr(pSectionHeader[i].sh_flags));
printf("\t%x", pSectionHeader[i].sh_link);
printf("\t%x", pSectionHeader[i].sh_info);
printf("\t%x\n", pSectionHeader[i].sh_addralign);
}
printf("ELF Section Headers End\n");
}
打印结果如下
四
Program Header
程序头表用于描述ELF文件如何映射到内存中,用段(segment)表示。
定义如下:
typedef struct
{
Elf32_Word p_type; /* Segment type */
Elf32_Off p_offset; /* Segment file offset */
Elf32_Addr p_vaddr; /* Segment virtual address */
Elf32_Addr p_paddr; /* Segment physical address */
Elf32_Word p_filesz; /* Segment size in file */
Elf32_Word p_memsz; /* Segment size in memory */
Elf32_Word p_flags; /* Segment flags */
Elf32_Word p_align; /* Segment alignment */
} Elf32_Phdr;typedef struct
{
Elf64_Word p_type; /* Segment type */
Elf64_Word p_flags; /* Segment flags */
Elf64_Off p_offset; /* Segment file offset */
Elf64_Addr p_vaddr; /* Segment virtual address */
Elf64_Addr p_paddr; /* Segment physical address */
Elf64_Xword p_filesz; /* Segment size in file */
Elf64_Xword p_memsz; /* Segment size in memory */
Elf64_Xword p_align; /* Segment alignment */
} Elf64_Phdr;
指定了程序头描述的段类型(或如何解析本程序头的信息)
段类型如下:
/* Legal values for p_type (segment type). */#define PT_NULL 0 /* Program header table entry unused */
#define PT_LOAD 1 /* Loadable program segment */
#define PT_DYNAMIC 2 /* Dynamic linking information */
#define PT_INTERP 3 /* Program interpreter */
#define PT_NOTE 4 /* Auxiliary information */
#define PT_SHLIB 5 /* Reserved */
#define PT_PHDR 6 /* Entry for header table itself */
#define PT_TLS 7 /* Thread-local storage segment */
#define PT_NUM 8 /* Number of defined types */
#define PT_LOOS 0x60000000 /* Start of OS-specific */
#define PT_GNU_EH_FRAME 0x6474e550 /* GCC .eh_frame_hdr segment */
#define PT_GNU_STACK 0x6474e551 /* Indicates stack executability */
#define PT_GNU_RELRO 0x6474e552 /* Read-only after relocation */
#define PT_GNU_PROPERTY 0x6474e553 /* GNU property */
#define PT_GNU_SFRAME 0x6474e554 /* SFrame segment. */
#define PT_LOSUNW 0x6ffffffa
#define PT_SUNWBSS 0x6ffffffa /* Sun Specific segment */
#define PT_SUNWSTACK 0x6ffffffb /* Stack segment */
#define PT_HISUNW 0x6fffffff
#define PT_HIOS 0x6fffffff /* End of OS-specific */
#define PT_LOPROC 0x70000000 /* Start of processor-specific */
#define PT_HIPROC 0x7fffffff /* End of processor-specific */
段的文件偏移值
段的内存虚拟地址
段的内存物理地址, 由于多数现代操作系统的设计不可预知段的物理地址,故该字段多数情况下保留
段的文件大小
段的内存大小
段的属性
/* Legal values for p_flags (segment flags). */#define PF_X (1 << 0) /* Segment is executable */ //可读
#define PF_W (1 << 1) /* Segment is writable */ //可写
#define PF_R (1 << 2) /* Segment is readable */ //可执行
#define PF_MASKOS 0x0ff00000 /* OS-specific */ //系统指定
#define PF_MASKPROC 0xf0000000 /* Processor-specific */ //进程指定
段的内存对齐值
// Print ELF Program Headers
const char *getSegmentTypeStr(Elf32_Word segmentType) {
switch (segmentType) {
case PT_NULL:return "NULL";
case PT_LOAD: return "LOAD";
case PT_DYNAMIC: return "DYNAMIC";
case PT_INTERP:return "INTERP";
case PT_NOTE: return "NOTE";
case PT_SHLIB:return "SHLIB";
case PT_PHDR: return "PHDR";
case PT_TLS:return "TLS";
case PT_NUM: return "PT_NUM";
case PT_LOOS:return "LOOS";
case PT_GNU_EH_FRAME: return "GNU_EH_FRAME";
case PT_GNU_STACK:return "GNU_STACK";
case PT_GNU_RELRO: return "GNU_RELRO";
case PT_GNU_PROPERTY: return "GNU_PROPERTY";
case PT_GNU_SFRAME: return "GNU_SFRAME";
case PT_SUNWBSS: return "SUNWBSS";
case PT_SUNWSTACK: return "SUNWSTACK";
case PT_HIOS: return "HIOS";
case PT_LOPROC: return "LOPROC";
case PT_HIPROC: return "HIPROC";
default: return "UNKNOWN";
}
}
const char* getSegmentFlagStr(Elf_Word segmentFlags) {
static char segmentFlagStr[5] = " ";
int count = 0;
if (segmentFlags & PF_R) {
segmentFlagStr[count++] = 'R';
}
if (segmentFlags & PF_W) {
segmentFlagStr[count++] = 'W';
}
if (segmentFlags & PF_X) {
segmentFlagStr[count++] = 'X';
}
return segmentFlagStr;
}
void printElfProgramHeader32(const Elf32_Phdr *pProgramHeader,Elf_Half segmentNum,const uint8_t* pFileBuffer) {
printf("ELF ProgramHeader:\n");
printf("\t[Nr] Type\t\tFileOff\t\tVirAddr\t\tPhyAddr\t\tFileSize\tMemSize\t\tFlag\tAlign\n");
for (int i = 0; i < segmentNum; i++) {
printf("\t[%02d] %-16s", i, getSegmentTypeStr(pProgramHeader[i].p_type));
printf("\t%08x", pProgramHeader[i].p_offset);
printf("\t%08x", pProgramHeader[i].p_vaddr);
printf("\t%08x", pProgramHeader[i].p_paddr);
printf("\t%08x", pProgramHeader[i].p_filesz);
printf("\t%08x", pProgramHeader[i].p_memsz);
printf("\t%#4s", getSegmentFlagStr(pProgramHeader[i].p_flags));
printf("\t%#x\n", pProgramHeader[i].p_align);
if (pProgramHeader[i].p_type == PT_INTERP) {
printf("\t\t [Request Program Interpreter Path: %s]\n",(char *) (pFileBuffer + pProgramHeader[i].p_offset));
}
}
printf("ELF ProgramHeader End\n");
}
// print segment mapping
void printSectionToSegmentMapping32(const Elf32_Phdr* pProgramHeader,const Elf32_Shdr* pSectionHeader,Elf_Half segmentNum,Elf_Half sectionNum,const char* pSectionHeaderStringTable) {
printf("Segtion to Segment Mapping:\n");
printf("\tSegment\tSections\n");
//Traverse program headers
for (int i = 0; i < segmentNum; i++) {
Elf32_Addr segmentStartAddr = pProgramHeader[i].p_vaddr;
Elf32_Addr segmentEndAddr = segmentStartAddr + pProgramHeader[i].p_memsz;
printf("\t%02d\t\t", i);
//Traverse section headers
for (int j = 0; j < sectionNum; j++) {
Elf32_Addr sectionStartAddr = pSectionHeader[j].sh_addr;
//Check whether the start addr of a section is in the segment addr
if (sectionStartAddr >= segmentStartAddr && sectionStartAddr < segmentEndAddr) {
//SHF_ALLOC means need alloc memory, some control sections don't need mapping to memory
if (pSectionHeader[j].sh_flags & SHF_ALLOC) {
printf("%s ",(char *) pSectionHeaderStringTable + pSectionHeader[j].sh_name);
}
}
}
printf("\n");
}
}
打印结果如下
五
特殊节
ELF 文件中有一些特定的节是预定义好的,其内容是指令代码或者控制信息。
这些节专门为操作系统使用,对于不同的操作系统,这些节的类型和属性有所不同。
节名 | 作用 |
---|---|
.text | 代码段 |
.data | 保存已经初始化的全局变量和局部静态变量 |
.bss | 保存未初始化的全局变量和局部静态变量 |
.rodata | 存放只读数据, 例如常量字符串 |
.comment | 编译器版本信息 |
.debug | 调试信息 |
.dynamic | 动态链接信息, linker解析该段以加载elf文件 |
.hash | 符号哈希表 (可查导入和导出符号) |
.gnu.hash | GNU哈希表 (只可查导出符号,导出表) |
.line | 调试行号表 即源代码行号与编译后指令的对应表 |
.note | 额外的编译器信息 例如公司名,版本号 |
.rel.dyn | 动态链接重定位表 存放全局变量重定位项 |
.rel.plt | 动态链接函数跳转重定位表 存放plt重定位项 |
.symtab | 符号表 |
.dynsym | 动态链接符号表 |
.strtab | 字符串表 |
.shstrtab | 节名表 |
.dynstr | 动态链接字符串表 |
.plt | 动态链接跳转表 |
.got | 动态链接全局偏移表 |
.init | 程序初始化代码段(节) |
.fini | 程序结束代码段(节) |
六
String Table
ELF文件中有很多字符串,例如段名,变量名等,由于字符串长度往往不固定,所以使用固定结构描述比较困难。
常见做法是将字符串集中起来存放到一张字符串表,然后通过索引查表来引用字符串
常见的有:
1..strtab(字符串表,保存普通字符串)
2..shstrtab(段表字符串表,保存段表用到的字符串)
打印代码如下:
// Print String Table
void printStringTable32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,const char* pSectionHeaderStringTable,const uint8_t* pFileBuffer) {
//Traverse the section header table then find string table
printf("ELF String Table:\n");
for (int i = 0; i < sectionNum; i++) {
//not only just one string table such as .dynstr .strtab
if (pSectionHeader[i].sh_type == SHT_STRTAB) {
printf("\t==========String Table %s==========\n",getSectionName(pSectionHeaderStringTable,pSectionHeader[i].sh_name));
char *pStringTable = (char *) (pFileBuffer + pSectionHeader[i].sh_offset);
Elf32_Word stringTableSize = pSectionHeader[i].sh_size, pos = 0; //遍历字符串表, 遇到0时pos+1打印字符串, 非0时继续搜索
while (pos < stringTableSize) {
if (pStringTable[pos] == 0) {
pos += 1;
printf("\t%s\n", pStringTable + pos);
} else {
//find zero
while (pStringTable[pos] != 0) {
pos++;
}
}
}
}
}
printf("ELF String Table End\n");
}
七
Symbol Table
符号表的作用是描述导入和导出符号,这里的符号可以是全局变量,函数,外部引用等
通过符号表和对应的字符串表可以得到符号名,符号大小,符号地址等信息。
.dynsym //动态链接符号表
.symtab //符号表.dynstr //动态链接符号表的字符串表
.strtab //符号表的字符串表
符号表表项结构
typedef struct
{
Elf32_Word st_name; /* Symbol name (string tbl index) */
Elf32_Addr st_value; /* Symbol value */
Elf32_Word st_size; /* Symbol size */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility */
Elf32_Section st_shndx; /* Section index */
} Elf32_Sym;typedef struct
{
Elf64_Word st_name; /* Symbol name (string tbl index) */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility */
Elf64_Section st_shndx; /* Section index */
Elf64_Addr st_value; /* Symbol value */
Elf64_Xword st_size; /* Symbol size */
} Elf64_Sym;
符号名, 字符串表的索引下标, 节表的sh_link说明了是在哪个字符串表中
符号对应的值, 和符号有关, 可能是绝对值,也可能是一个地址, 不同符号的含义不同
符号大小, 对于包含数据的符号, 是该数据类型的大小
例如一个double型的符号占用8字节,如果该值为0表示符号大小为0或未知
符号的类型和属性,高4bit标识了符号绑定(symbol binding), 低4bit标识了符号类型(symbol type),组成符号信息(symbol information)
有3个宏分别读取这三个属性值
/* How to extract and insert information held in the st_info field. */#define ELF32_ST_BIND(val) (((unsigned char) (val)) >> 4)
#define ELF32_ST_TYPE(val) ((val) & 0xf)
#define ELF32_ST_INFO(bind, type) (((bind) << 4) + ((type) & 0xf))
符号绑定的合法属性如下
/* Legal values for ST_BIND subfield of st_info (symbol binding). */#define STB_LOCAL 0 /* Local symbol */
#define STB_GLOBAL 1 /* Global symbol */
#define STB_WEAK 2 /* Weak symbol */
#define STB_NUM 3 /* Number of defined types. */
#define STB_LOOS 10 /* Start of OS-specific */
#define STB_GNU_UNIQUE 10 /* Unique symbol. */
#define STB_HIOS 12 /* End of OS-specific */
#define STB_LOPROC 13 /* Start of processor-specific */
#define STB_HIPROC 15 /* End of processor-specific */
几个重要属性解释如下:
1.STB_LOCAL
2.STB_GLOBAL
3.STB_WEAK
4.STB_LOPROC~STB_HIPROC
/* Legal values for ST_TYPE subfield of st_info (symbol type). */#define STT_NOTYPE 0 /* Symbol type is unspecified */
#define STT_OBJECT 1 /* Symbol is a data object */
#define STT_FUNC 2 /* Symbol is a code object */
#define STT_SECTION 3 /* Symbol associated with a section */
#define STT_FILE 4 /* Symbol's name is file name */
#define STT_COMMON 5 /* Symbol is a common data object */
#define STT_TLS 6 /* Symbol is thread-local data object*/
#define STT_NUM 7 /* Number of defined types. */
#define STT_LOOS 10 /* Start of OS-specific */
#define STT_GNU_IFUNC 10 /* Symbol is indirect code object */
#define STT_HIOS 12 /* End of OS-specific */
#define STT_LOPROC 13 /* Start of processor-specific */
#define STT_HIPROC 15 /* End of processor-specific */
几个重要符号解析如下
1.STT_NOTYPE
2.STT_OBJECT
3.STT_FUNC
4.STT_SECTION
5.STT_FILE
6.STT_LOPROC~STT_HIPROC
低2位保存了符号可见性
符号所在的段
// Print Symbol Table
const char *getSymbolBindingString(uint8_t symbolBinding) {
switch (symbolBinding) {
case STB_LOCAL: return "LOCAL";
case STB_GLOBAL: return "GLOBAL";
case STB_WEAK: return "WEAK";
case STB_NUM: return "STB_NUM";
case STB_GNU_UNIQUE: return "GNU_UNIQUE";
case STB_HIOS: return "STB_HIOS";
case STB_LOPROC: return "STB_LOPROC";
case STB_HIPROC: return "STB_HIPROC";
default: return "UNKNOWN";
}
}
const char *getSymbolTypeString(uint8_t symbolType) {
switch (symbolType) {
case STT_NOTYPE: return "NOTYPE";
case STT_OBJECT: return "OBJECT";
case STT_FUNC: return "FUNC";
case STT_SECTION: return "SECTION";
case STT_FILE: return "FILE";
case STT_COMMON: return "COMMON";
case STT_TLS: return "TLS";
case STT_NUM: return "STT_NUM";
case STT_GNU_IFUNC: return "GNU_IFUNC";
case STT_HIOS: return "HIOS";
case STT_LOPROC: return "LOPROC";
case STT_HIPROC: return "HIPROC";
default: return "UNKNOWN";
}
}
const char *getSymbolVisibility(uint8_t st_other) {
unsigned char visibility = st_other & 0x03;
switch (visibility) {
case 0: return "DEFAULT";
case 1: return "INTERNAL";
case 2: return "HIDDEN";
case 3: return "PROTECTED";
default: return "UNKNOWN";
}
}void printSymbolTable32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,const char* pSectionHeaderStringTable,const uint8_t* pFileBuffer) {
printf("ELF Symbol Tables:\n");
for (int i = 0; i < sectionNum; i++) {
//全局静态符号表和动态符号表
if (pSectionHeader[i].sh_type == SHT_SYMTAB || pSectionHeader[i].sh_type == SHT_DYNSYM) {
Elf32_Word symbolNum = pSectionHeader[i].sh_size / pSectionHeader[i].sh_entsize;
//获取符号表对应的字符串表,全局静态符号和动态符号表对应字符串表可能不同 sh_link is index of string table, fileBuffer+offset is real string table
char* pSymbolNameTable =(char*) pFileBuffer + pSectionHeader[pSectionHeader[i].sh_link].sh_offset;
printf("\tSymbol Table '%s' contains %#x entries:\n",(char*)getSectionName(pSectionHeaderStringTable,pSectionHeader[i].sh_name), symbolNum);
printf("\tNum \tValue\t\tSize\t\tType\t\tBind\t\tVisible\t\tIndex\t\tName\n");
Elf32_Sym *pSymbolTable = (Elf32_Sym *) (pFileBuffer + pSectionHeader[i].sh_offset);
for (int j = 0; j < symbolNum; j++) {
printf("\t%04d", j);
printf("\t%08x", pSymbolTable[j].st_value);
printf("\t%08x", pSymbolTable[j].st_size);
//symbol type and binding
printf("\t%s\t", getSymbolTypeString(ELF32_ST_TYPE(pSymbolTable[j].st_info)));
printf("\t%s\t", getSymbolBindingString(ELF32_ST_BIND(pSymbolTable[j].st_info)));
printf("\t%-10s", getSymbolVisibility(pSymbolTable[j].st_other));
if (pSymbolTable[j].st_shndx == SHN_UNDEF) {
printf("\t%4s\t", "UDEF");
} else if (pSymbolTable[j].st_shndx == SHN_ABS) {
printf("\t%4s\t", "ABS");
} else {
printf("\t%04x\t", pSymbolTable[j].st_shndx);
}
printf("\t%s\n", pSymbolNameTable + pSymbolTable[j].st_name);
}
printf("\n");
}
}
}
八
Relocation Table
一般有两张重定位表:
1..rel.plt修复外部函数地址
2..rel.dyn修复全局变量地址
重定位表有SHT_REL, SHT_RELA, SHT_RELR三种类型,对应表项定义如下。
注: Intel x86架构只使用REL重定位项, x64架构似乎只使用RELA重定位项, 在后续修复重定位表可以得知:
/* Relocation table entry without addend (in section of type SHT_REL). */typedef struct
{
Elf32_Addr r_offset; /* Address */
Elf32_Word r_info; /* Relocation type and symbol index */
} Elf32_Rel;
typedef struct
{
Elf64_Addr r_offset; /* Address */
Elf64_Xword r_info; /* Relocation type and symbol index */
} Elf64_Rel;
/* Relocation table entry with addend (in section of type SHT_RELA). */
typedef struct
{
Elf32_Addr r_offset; /* Address */
Elf32_Word r_info; /* Relocation type and symbol index */
Elf32_Sword r_addend; /* Addend */
} Elf32_Rela;
typedef struct
{
Elf64_Addr r_offset; /* Address */
Elf64_Xword r_info; /* Relocation type and symbol index */
Elf64_Sxword r_addend; /* Addend */
} Elf64_Rela;
/* RELR relocation table entry */
typedef Elf32_Word Elf32_Relr;
typedef Elf64_Xword Elf64_Relr;
重定位的位置
对于重定位文件而言,该值是待重定位单元在节中的偏移量
对于可执行文件或链接库文件而言,该值是待重定位单元的虚拟地址
给出了待重定位单元的符号表索引和重定位类型
获取信息的宏
SYM获取高24/32位, 是符号表索引,指明符号
TYPE获取低8/32位, 是重定位类型
/* How to extract and insert information held in the r_info field. */#define ELF32_R_SYM(val) ((val) >> 8)
#define ELF32_R_TYPE(val) ((val) & 0xff)
#define ELF32_R_INFO(sym, type) (((sym) << 8) + ((type) & 0xff))
#define ELF64_R_SYM(i) ((i) >> 32)
#define ELF64_R_TYPE(i) ((i) & 0xffffffff)
#define ELF64_R_INFO(sym,type) ((((Elf64_Xword) (sym)) << 32) + (type))
指定加数,用于计算需要重定位的域的值
Rela使用该字段显式地指出加数,Rel的加数隐含在被修改的位置中
一个重定位节(Relocation Section)需要引用另外两个节: 符号表和待修复节
重定位节节头的sh_info和sh_link分别指明了引用关系
不同目标文件中,重定位项的r_offset成员含义略有不同
1.重定位文件
2.可执行文件/共享目标文件
重定位项用于描述如何修改以下的指令和数据域(被重定位域)
定义以下几种运算符号便于描述
常见重定位类型如下
将指定的符号地址设置为一个GOT表项
修复方法: elf加载后, 填入符号对应真实地址
用于动态链接的PLT表项
修复方法: elf加载后, 修改跳转地址为符号地址
相对偏移地址重定位
修复方法: 将offset指出的位置解引用,加上elf加载的基地址
全部的intel x86架构重定位类型如下
/* Intel 80386 specific definitions. *//* i386 relocs. */
#define R_386_NONE 0 /* No reloc */
#define R_386_32 1 /* Direct 32 bit */
#define R_386_PC32 2 /* PC relative 32 bit */
#define R_386_GOT32 3 /* 32 bit GOT entry */
#define R_386_PLT32 4 /* 32 bit PLT address */
#define R_386_COPY 5 /* Copy symbol at runtime */
#define R_386_GLOB_DAT 6 /* Create GOT entry */
#define R_386_JMP_SLOT 7 /* Create PLT entry */
#define R_386_RELATIVE 8 /* Adjust by program base */
#define R_386_GOTOFF 9 /* 32 bit offset to GOT */
#define R_386_GOTPC 10 /* 32 bit PC relative offset to GOT */
#define R_386_32PLT 11
#define R_386_TLS_TPOFF 14 /* Offset in static TLS block */
#define R_386_TLS_IE 15 /* Address of GOT entry for static TLS
block offset */
#define R_386_TLS_GOTIE 16 /* GOT entry for static TLS block
offset */
#define R_386_TLS_LE 17 /* Offset relative to static TLS
block */
#define R_386_TLS_GD 18 /* Direct 32 bit for GNU version of
general dynamic thread local data */
#define R_386_TLS_LDM 19 /* Direct 32 bit for GNU version of
local dynamic thread local data
in LE code */
#define R_386_16 20
#define R_386_PC16 21
#define R_386_8 22
#define R_386_PC8 23
#define R_386_TLS_GD_32 24 /* Direct 32 bit for general dynamic
thread local data */
#define R_386_TLS_GD_PUSH 25 /* Tag for pushl in GD TLS code */
#define R_386_TLS_GD_CALL 26 /* Relocation for call to
__tls_get_addr() */
#define R_386_TLS_GD_POP 27 /* Tag for popl in GD TLS code */
#define R_386_TLS_LDM_32 28 /* Direct 32 bit for local dynamic
thread local data in LE code */
#define R_386_TLS_LDM_PUSH 29 /* Tag for pushl in LDM TLS code */
#define R_386_TLS_LDM_CALL 30 /* Relocation for call to
__tls_get_addr() in LDM code */
#define R_386_TLS_LDM_POP 31 /* Tag for popl in LDM TLS code */
#define R_386_TLS_LDO_32 32 /* Offset relative to TLS block */
#define R_386_TLS_IE_32 33 /* GOT entry for negated static TLS
block offset */
#define R_386_TLS_LE_32 34 /* Negated offset relative to static
TLS block */
#define R_386_TLS_DTPMOD32 35 /* ID of module containing symbol */
#define R_386_TLS_DTPOFF32 36 /* Offset in TLS block */
#define R_386_TLS_TPOFF32 37 /* Negated offset in static TLS block */
#define R_386_SIZE32 38 /* 32-bit symbol size */
#define R_386_TLS_GOTDESC 39 /* GOT offset for TLS descriptor. */
#define R_386_TLS_DESC_CALL 40 /* Marker of call through TLS
descriptor for
relaxation. */
#define R_386_TLS_DESC 41 /* TLS descriptor containing
pointer to code and to
argument, returning the TLS
offset for the symbol. */
#define R_386_IRELATIVE 42 /* Adjust indirectly by program base */
#define R_386_GOT32X 43 /* Load from 32 bit GOT entry,
relaxable. */
/* Keep this the last entry. */
#define R_386_NUM 44
x64重定位类型定义如下
/* AMD x86-64 relocations. */
#define R_X86_64_NONE 0 /* No reloc */
#define R_X86_64_64 1 /* Direct 64 bit */
#define R_X86_64_PC32 2 /* PC relative 32 bit signed */
#define R_X86_64_GOT32 3 /* 32 bit GOT entry */
#define R_X86_64_PLT32 4 /* 32 bit PLT address */
#define R_X86_64_COPY 5 /* Copy symbol at runtime */
#define R_X86_64_GLOB_DAT 6 /* Create GOT entry */
#define R_X86_64_JUMP_SLOT 7 /* Create PLT entry */
#define R_X86_64_RELATIVE 8 /* Adjust by program base */
#define R_X86_64_GOTPCREL 9 /* 32 bit signed PC relative
offset to GOT */
#define R_X86_64_32 10 /* Direct 32 bit zero extended */
#define R_X86_64_32S 11 /* Direct 32 bit sign extended */
#define R_X86_64_16 12 /* Direct 16 bit zero extended */
#define R_X86_64_PC16 13 /* 16 bit sign extended pc relative */
#define R_X86_64_8 14 /* Direct 8 bit sign extended */
#define R_X86_64_PC8 15 /* 8 bit sign extended pc relative */
#define R_X86_64_DTPMOD64 16 /* ID of module containing symbol */
#define R_X86_64_DTPOFF64 17 /* Offset in module's TLS block */
#define R_X86_64_TPOFF64 18 /* Offset in initial TLS block */
#define R_X86_64_TLSGD 19 /* 32 bit signed PC relative offset
to two GOT entries for GD symbol */
#define R_X86_64_TLSLD 20 /* 32 bit signed PC relative offset
to two GOT entries for LD symbol */
#define R_X86_64_DTPOFF32 21 /* Offset in TLS block */
#define R_X86_64_GOTTPOFF 22 /* 32 bit signed PC relative offset
to GOT entry for IE symbol */
#define R_X86_64_TPOFF32 23 /* Offset in initial TLS block */
#define R_X86_64_PC64 24 /* PC relative 64 bit */
#define R_X86_64_GOTOFF64 25 /* 64 bit offset to GOT */
#define R_X86_64_GOTPC32 26 /* 32 bit signed pc relative
offset to GOT */
#define R_X86_64_GOT64 27 /* 64-bit GOT entry offset */
#define R_X86_64_GOTPCREL64 28 /* 64-bit PC relative offset
to GOT entry */
#define R_X86_64_GOTPC64 29 /* 64-bit PC relative offset to GOT */
#define R_X86_64_GOTPLT64 30 /* like GOT64, says PLT entry needed */
#define R_X86_64_PLTOFF64 31 /* 64-bit GOT relative offset
to PLT entry */
#define R_X86_64_SIZE32 32 /* Size of symbol plus 32-bit addend */
#define R_X86_64_SIZE64 33 /* Size of symbol plus 64-bit addend */
#define R_X86_64_GOTPC32_TLSDESC 34 /* GOT offset for TLS descriptor. */
#define R_X86_64_TLSDESC_CALL 35 /* Marker for call through TLS
descriptor. */
#define R_X86_64_TLSDESC 36 /* TLS descriptor. */
#define R_X86_64_IRELATIVE 37 /* Adjust indirectly by program base */
#define R_X86_64_RELATIVE64 38 /* 64-bit adjust by program base */
/* 39 Reserved was R_X86_64_PC32_BND */
/* 40 Reserved was R_X86_64_PLT32_BND */
#define R_X86_64_GOTPCRELX 41 /* Load from 32 bit signed pc relative
offset to GOT entry without REX
prefix, relaxable. */
#define R_X86_64_REX_GOTPCRELX 42 /* Load from 32 bit signed pc relative
offset to GOT entry with REX prefix,
relaxable. */
#define R_X86_64_NUM 43
// Print Relocation Table
const char *getRelocationTypeString32(Elf_Word value) {
switch (value) {
case R_386_NONE: return "R_386_NONE";
case 1: return "R_386_32";
case 2: return "R_386_PC32";
case 3: return "R_386_GOT32";
case 4: return "R_386_PLT32";
case 5: return "R_386_COPY";
case 6: return "R_386_GLOB_DAT";
case 7: return "R_386_JMP_SLOT";
case 8: return "R_386_RELATIVE";
case 9: return "R_386_GOTOFF";
case 10: return "R_386_GOTPC";
case 11: return "R_386_32PLT";
case 14: return "R_386_TLS_TPOFF";
case 15: return "R_386_TLS_IE";
case 16: return "R_386_TLS_GOTIE";
case 17: return "R_386_TLS_LE";
case 18: return "R_386_TLS_GD";
case 19: return "R_386_TLS_LDM";
case 20: return "R_386_16";
case 21: return "R_386_PC16";
case 22: return "R_386_8";
case 23: return "R_386_PC8";
case 24: return "R_386_TLS_GD_32";
case 25: return "R_386_TLS_GD_PUSH";
case 26: return "R_386_TLS_GD_CALL";
case 27: return "R_386_TLS_GD_POP";
case 28: return "R_386_TLS_LDM_32";
case 29: return "R_386_TLS_LDM_PUSH";
case 30: return "R_386_TLS_LDM_CALL";
case 31: return "R_386_TLS_LDM_POP";
case 32: return "R_386_TLS_LDO_32";
case 33: return "R_386_TLS_IE_32";
case 34: return "R_386_TLS_LE_32";
case 35: return "R_386_TLS_DTPMOD32";
case 36: return "R_386_TLS_DTPOFF32";
case 37: return "R_386_TLS_TPOFF32";
case 38: return "R_386_SIZE32";
case 39: return "R_386_TLS_GOTDESC";
case 40: return "R_386_TLS_DESC_CALL";
case 41: return "R_386_TLS_DESC";
case 42: return "R_386_IRELATIVE";
case 43: return "R_386_GOT32X";
default: return "Unknown relocation type";
}
}
void printRelocationTable32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,uint8_t* pFileBuffer,const char* pSectionHeaderStringTable) {
printf("Relocation Tables:\n");
for (int i = 0; i < sectionNum; i++) {
if (pSectionHeader[i].sh_type == SHT_REL) {
Elf32_Shdr *pRelocationTableHeader = &pSectionHeader[i];
Elf32_Rel *pRelocationTable = (Elf32_Rel *) (pFileBuffer + pRelocationTableHeader->sh_offset);
Elf32_Word relocItemNum = pRelocationTableHeader->sh_size / pRelocationTableHeader->sh_entsize;
// relocation table sh_link is index of symbol table header
Elf32_Shdr *pSymbolTableHeader = (Elf32_Shdr *) &pSectionHeader[pSectionHeader[i].sh_link];
//real symbol table
Elf32_Sym *pSymbolTable = (Elf32_Sym *) (pFileBuffer + pSymbolTableHeader->sh_offset);
//string table for symbol name
char *pSymbolTableStringTable = (char *) pFileBuffer + pSectionHeader[pSymbolTableHeader->sh_link].sh_offset; printf("Relocation Section '%s' at offset contains %d entries\n",(char*) pSectionHeaderStringTable + pSectionHeader[i].sh_name, relocItemNum);
printf("\tOffset\t\tInfo\t\tType\t\t\t\tSym.value\t\tSym.name\n");
for (int j = 0; j < relocItemNum; j++) {
printf("\t%08x", pRelocationTable[j].r_offset);
printf("\t%08x", pRelocationTable[j].r_info);
printf("\t%s\t", getRelocationTypeString32(ELF32_R_TYPE(pRelocationTable[j].r_info)));
printf("\t%08x\t", pSymbolTable[ELF32_R_SYM(pRelocationTable[j].r_info)].st_value);
//R_SYM get the index of symbol in symbol table, st_name is index of symbol name in string table
printf("\t%s", &pSymbolTableStringTable[pSymbolTable[ELF32_R_SYM(pRelocationTable[j].r_info)].st_name]);
printf("\n");
}
}
}
}
r_offset指定了待修复的地址,这是一个RVA, 需要将该地址存储的数据加上elf文件加载的基地址
例如readelf读取的重定位表信息如下
Relocation section '.rel.dyn' at offset 0x384 contains 8 entries:
Offset Info Type Sym.Value Sym. Name
00003ee8 00000008 R_386_RELATIVE
00003eec 00000008 R_386_RELATIVE
00003fec 00000008 R_386_RELATIVE
0000400c 00000008 R_386_RELATIVE
00003fe0 00000206 R_386_GLOB_DAT 00000000 _ITM_deregisterTM[...]
00003fe4 00000306 R_386_GLOB_DAT 00000000 __cxa_finalize@GLIBC_2.1.3
00003fe8 00000506 R_386_GLOB_DAT 00000000 __gmon_start__
00003ff0 00000606 R_386_GLOB_DAT 00000000 _ITM_registerTMCl[...]Relocation section '.rel.plt' at offset 0x3c4 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00004000 00000107 R_386_JUMP_SLOT 00000000 __libc_start_main@GLIBC_2.34
00004004 00000407 R_386_JUMP_SLOT 00000000 puts@GLIBC_2.0
No processor specific unwind information to decode
3ee8和3eec分别在init_array和fini_array段,均为RELATIVE类型重定位项
3fec, 3fe0,3fe4,3fe8,3ff0是GOT表项, 其中3fec (main_ptr) 是RELATIVE类型,其他均为GLOB_DAT类型
表项填充的函数为虚拟extern段中函数的地址,该段在内存中实际不存在
4000,4004是plt表项, 均为JUMP_SLOT类型, 400c是dso_handle, 为RELATIVE类型
got.plt表填充的也是外部函数地址,在虚拟extern段
在elf文件末尾,ida自动追加extern段(该段在内存中不存在,仅供分析)
综上所述,重定位有以下情况:
1.将待重定位地址处的内容解引用并加上elf加载的基地址即可
2.加载动态库,写入外部函数地址
九
Dynamic Segment
如果目标文件参与动态链接,必定包含一个类型为 PT_DYNAMIC 的Program表项, 对应节名为 .dynamic (type=SHT_DYNAMIC)
动态段的作用是提供动态链接器所需要的信息,比如依赖哪些共享库文件,动态链接符号表的位置,动态链接重定位表的位置等。
/* Dynamic section entry. */
typedef struct
{
Elf32_Sword d_tag; /* Dynamic entry type */
union
{
Elf32_Word d_val; /* Integer value */
Elf32_Addr d_ptr; /* Address value */
} d_un;
} Elf32_Dyn;typedef struct
{
Elf64_Sxword d_tag; /* Dynamic entry type */
union
{
Elf64_Xword d_val; /* Integer value */
Elf64_Addr d_ptr; /* Address value */
} d_un;
} Elf64_Dyn;
d_tag决定了如何对d_un解析
合法的d_tag值定义如下
/* Legal values for d_tag (dynamic entry type). */#define DT_NULL 0 /* Marks end of dynamic section */
#define DT_NEEDED 1 /* Name of needed library */
#define DT_PLTRELSZ 2 /* Size in bytes of PLT relocs */
#define DT_PLTGOT 3 /* Processor defined value */
#define DT_HASH 4 /* Address of symbol hash table */
#define DT_STRTAB 5 /* Address of string table */
#define DT_SYMTAB 6 /* Address of symbol table */
#define DT_RELA 7 /* Address of Rela relocs */
#define DT_RELASZ 8 /* Total size of Rela relocs */
#define DT_RELAENT 9 /* Size of one Rela reloc */
#define DT_STRSZ 10 /* Size of string table */
#define DT_SYMENT 11 /* Size of one symbol table entry */
#define DT_INIT 12 /* Address of init function */
#define DT_FINI 13 /* Address of termination function */
#define DT_SONAME 14 /* Name of shared object */
#define DT_RPATH 15 /* Library search path (deprecated) */
#define DT_SYMBOLIC 16 /* Start symbol search here */
#define DT_REL 17 /* Address of Rel relocs */
#define DT_RELSZ 18 /* Total size of Rel relocs */
#define DT_RELENT 19 /* Size of one Rel reloc */
#define DT_PLTREL 20 /* Type of reloc in PLT */
#define DT_DEBUG 21 /* For debugging; unspecified */
#define DT_TEXTREL 22 /* Reloc might modify .text */
#define DT_JMPREL 23 /* Address of PLT relocs */
#define DT_BIND_NOW 24 /* Process relocations of object */
#define DT_INIT_ARRAY 25 /* Array with addresses of init fct */
#define DT_FINI_ARRAY 26 /* Array with addresses of fini fct */
#define DT_INIT_ARRAYSZ 27 /* Size in bytes of DT_INIT_ARRAY */
#define DT_FINI_ARRAYSZ 28 /* Size in bytes of DT_FINI_ARRAY */
#define DT_RUNPATH 29 /* Library search path */
#define DT_FLAGS 30 /* Flags for the object being loaded */
#define DT_ENCODING 32 /* Start of encoded range */
#define DT_PREINIT_ARRAY 32 /* Array with addresses of preinit fct*/
#define DT_PREINIT_ARRAYSZ 33 /* size in bytes of DT_PREINIT_ARRAY */
#define DT_SYMTAB_SHNDX 34 /* Address of SYMTAB_SHNDX section */
#define DT_RELRSZ 35 /* Total size of RELR relative relocations */
#define DT_RELR 36 /* Address of RELR relative relocations */
#define DT_RELRENT 37 /* Size of one RELR relative relocaction */
#define DT_NUM 38 /* Number used */
#define DT_LOOS 0x6000000d /* Start of OS-specific */
#define DT_HIOS 0x6ffff000 /* End of OS-specific */
#define DT_LOPROC 0x70000000 /* Start of processor-specific */
#define DT_HIPROC 0x7fffffff /* End of processor-specific */
#define DT_PROCNUM DT_MIPS_NUM /* Most used by any processor */
/* DT_* entries which fall between DT_VALRNGHI & DT_VALRNGLO use the
Dyn.d_un.d_val field of the Elf*_Dyn structure. This follows Sun's
approach. */
#define DT_VALRNGLO 0x6ffffd00
#define DT_GNU_PRELINKED 0x6ffffdf5 /* Prelinking timestamp */
#define DT_GNU_CONFLICTSZ 0x6ffffdf6 /* Size of conflict section */
#define DT_GNU_LIBLISTSZ 0x6ffffdf7 /* Size of library list */
#define DT_CHECKSUM 0x6ffffdf8
#define DT_PLTPADSZ 0x6ffffdf9
#define DT_MOVEENT 0x6ffffdfa
#define DT_MOVESZ 0x6ffffdfb
#define DT_FEATURE_1 0x6ffffdfc /* Feature selection (DTF_*). */
#define DT_POSFLAG_1 0x6ffffdfd /* Flags for DT_* entries, effecting
the following DT_* entry. */
#define DT_SYMINSZ 0x6ffffdfe /* Size of syminfo table (in bytes) */
#define DT_SYMINENT 0x6ffffdff /* Entry size of syminfo */
#define DT_VALRNGHI 0x6ffffdff
#define DT_VALTAGIDX(tag) (DT_VALRNGHI - (tag)) /* Reverse order! */
#define DT_VALNUM 12
/* DT_* entries which fall between DT_ADDRRNGHI & DT_ADDRRNGLO use the
Dyn.d_un.d_ptr field of the Elf*_Dyn structure. If any adjustment is made to the ELF object after it has been
built these entries will need to be adjusted. */
#define DT_ADDRRNGLO 0x6ffffe00
#define DT_GNU_HASH 0x6ffffef5 /* GNU-style hash table. */
#define DT_TLSDESC_PLT 0x6ffffef6
#define DT_TLSDESC_GOT 0x6ffffef7
#define DT_GNU_CONFLICT 0x6ffffef8 /* Start of conflict section */
#define DT_GNU_LIBLIST 0x6ffffef9 /* Library list */
#define DT_CONFIG 0x6ffffefa /* Configuration information. */
#define DT_DEPAUDIT 0x6ffffefb /* Dependency auditing. */
#define DT_AUDIT 0x6ffffefc /* Object auditing. */
#define DT_PLTPAD 0x6ffffefd /* PLT padding. */
#define DT_MOVETAB 0x6ffffefe /* Move table. */
#define DT_SYMINFO 0x6ffffeff /* Syminfo table. */
#define DT_ADDRRNGHI 0x6ffffeff
#define DT_ADDRTAGIDX(tag) (DT_ADDRRNGHI - (tag)) /* Reverse order! */
#define DT_ADDRNUM 11
/* The versioning entry types. The next are defined as part of the GNU extension. */
#define DT_VERSYM 0x6ffffff0
#define DT_RELACOUNT 0x6ffffff9
#define DT_RELCOUNT 0x6ffffffa
/* These were chosen by Sun. */
#define DT_FLAGS_1 0x6ffffffb /* State flags, see DF_1_* below. */
#define DT_VERDEF 0x6ffffffc /* Address of version definition table */
#define DT_VERDEFNUM 0x6ffffffd /* Number of version definitions */
#define DT_VERNEED 0x6ffffffe /* Address of table with needed versions */
#define DT_VERNEEDNUM 0x6fffffff /* Number of needed versions */
#define DT_VERSIONTAGIDX(tag) (DT_VERNEEDNUM - (tag)) /* Reverse order! */
#define DT_VERSIONTAGNUM 16
/* Sun added these machine-independent extensions in the "processor-specific"
range. Be compatible. */
#define DT_AUXILIARY 0x7ffffffd /* Shared object to load before self */
#define DT_FILTER 0x7fffffff /* Shared object to get values from */
#define DT_EXTRATAGIDX(tag) ((Elf32_Word)-((Elf32_Sword) (tag) <<1>>1)-1)
#define DT_EXTRANUM 3
该tag对应的即为elf文件依赖的动态库文件,使用d_val解析后得到索引值
通过索引查找.dynstr即可得到链接库名
动态段的sh_link字段是指向动态链接字符串表的索引值
另外通过d_tag==DT_STRTAB解析对应的d_val可以得到.dynstr的文件偏移值
d_val 代表整数值
d_ptr 代表进程空间的虚拟地址
解析规则如下
名称 | 值 | d_un | 可执行文件 | 共享目标文件 |
---|---|---|---|---|
DT_NULL | 0 | 忽略 | 必需 | 必需 |
DT_NEEDED | 1 | d_val | 可选 | 可选 |
DT_PLTRELSZ | 2 | d_val | 可选 | 可选 |
DT_PLTGOT | 3 | d_ptr | 可选 | 可选 |
DT_HASH | 4 | d_ptr | 必需 | 必需 |
DT_STRTAB | 5 | d_ptr | 必需 | 必需 |
DT_SYMTAB | 6 | d_ptr | 必需 | 必需 |
DT_RELA | 7 | d_ptr | 必需 | 可选 |
DT_RELASZ | 8 | d_val | 必需 | 可选 |
DT_RELAENT | 9 | d_val | 必需 | 可选 |
DT_STRSZ | 10 | d_val | 必需 | 必需 |
DT_SYMENT | 11 | d_val | 必需 | 必需 |
DT_INIT | 12 | d_ptr | 可选 | 可选 |
DT_FINI | 13 | d_ptr | 可选 | 可选 |
DT_SONAME | 14 | d_val | 忽略 | 可选 |
DT_RPATH | 15 | d_val | 可选 | 忽略 |
DT_SYMBOLIC | 16 | 忽略 | 忽略 | 可选 |
DT_REL | 17 | d_ptr | 必需 | 可选 |
DT_RELSZ | 18 | d_val | 必需 | 可选 |
DT_RELENT | 19 | d_val | 必需 | 可选 |
DT_PLTREL | 20 | d_val | 可选 | 可选 |
DT_DEBUG | 21 | d_ptr | 可选 | 忽略 |
DT_TEXTREL | 22 | 忽略 | 可选 | 可选 |
DT_JMPREL | 23 | d_ptr | 可选 | 可选 |
DT_BIND_NOW | 24 | 忽略 | 可选 | 可选 |
DT_LOPROC | 0x70000000 | 未定义 | 未定义 | 未定义 |
DT_HIPROC | 0x7fffffff | 未定义 | 未定义 | 未定义 |
// Print Dynamic Segment
#define DT_VAL 0
#define DT_PTR 1
const char *getDynamicType(Elf_Xword value) {
if (value >= DT_LOOS && value <= DT_HIOS)
return "OS-Specific";
if (value >= DT_LOPROC && value <= DT_HIPROC)
return "Processor-Specific";
switch (value) {
case DT_NULL: return "NULL";
case DT_NEEDED: return "NEEDED";
case DT_PLTRELSZ: return "PLTRELSZ";
case DT_PLTGOT: return "PLTGOT";
case DT_HASH: return "HASH";
case DT_STRTAB: return "STRTAB";
case DT_SYMTAB: return "SYMTAB";
case DT_RELA: return "RELA";
case DT_RELASZ: return "RELASZ";
case DT_RELAENT: return "RELAENT";
case DT_STRSZ: return "STRSZ";
case DT_SYMENT: return "SYMENT";
case DT_INIT: return "INIT";
case DT_FINI: return "FINI";
case DT_SONAME: return "SONAME";
case DT_RPATH: return "RPATH";
case DT_SYMBOLIC: return "SYMBOLIC";
case DT_REL: return "REL";
case DT_RELSZ: return "RELSZ";
case DT_RELENT: return "RELENT";
case DT_PLTREL: return "PLTREL";
case DT_DEBUG: return "DEBUG";
case DT_TEXTREL: return "TEXTREL";
case DT_JMPREL: return "JMPREL";
case DT_BIND_NOW: return "BIND_NOW";
case DT_INIT_ARRAY: return "INIT_ARRAY";
case DT_FINI_ARRAY: return "FINI_ARRAY";
case DT_INIT_ARRAYSZ: return "INIT_ARRAYSZ";
case DT_FINI_ARRAYSZ: return "FINI_ARRAYSZ";
case DT_RUNPATH: return "RUNPATH";
case DT_FLAGS: return "FLAGS";
case DT_ENCODING: return "ENCODING";
case DT_SYMTAB_SHNDX: return "SYMTAB_SHNDX";
case DT_RELRSZ: return "RELRSZ";
case DT_RELR: return "RELR";
case DT_RELRENT: return "RELRENT";
case DT_NUM: return "NUM";
case DT_VALRNGLO: return "VALRNGLO";
case DT_GNU_PRELINKED: return "GNU_PRELINKED";
case DT_GNU_CONFLICTSZ: return "GNU_CONFLICTSZ";
case DT_GNU_LIBLISTSZ: return "GNU_LIBLISTSZ";
case DT_CHECKSUM: return "CHECKSUM";
case DT_PLTPADSZ: return "PLTPADSZ";
case DT_MOVEENT: return "MOVEENT";
case DT_MOVESZ: return "MOVESZ";
case DT_FEATURE_1: return "FEATURE_1";
case DT_POSFLAG_1: return "POSFLAG_1";
case DT_SYMINSZ: return "SYMINSZ";
case DT_SYMINENT: return "SYMINENT";
case DT_ADDRRNGLO: return "ADDRRNGLO";
case DT_GNU_HASH: return "GNU_HASH";
case DT_TLSDESC_PLT: return "TLSDESC_PLT";
case DT_TLSDESC_GOT: return "TLSDESC_GOT";
case DT_GNU_CONFLICT: return "GNU_CONFLICT";
case DT_GNU_LIBLIST: return "GNU_LIBLIST";
case DT_CONFIG: return "CONFIG";
case DT_DEPAUDIT: return "DEPAUDIT";
case DT_AUDIT: return "AUDIT";
case DT_PLTPAD: return "PLTPAD";
case DT_MOVETAB: return "MOVETAB";
case DT_SYMINFO: return "SYMINFO";
case DT_VERSYM: return "VERSYM";
case DT_RELACOUNT: return "RELACOUNT";
case DT_RELCOUNT: return "RELCOUNT";
case DT_FLAGS_1: return "FLAGS_1";
case DT_VERDEF: return "VERDEF";
case DT_VERDEFNUM: return "VERDEFNUM";
case DT_VERNEED: return "VERNEED";
case DT_VERNEEDNUM: return "VERNEEDNUM";
case DT_AUXILIARY: return "AUXILIARY";
case DT_FILTER: return "FILTER";
default: return "Unknown Type";
}
}
uint32_t getDynamicDunType(Elf_Xword value) {
switch (value) {
case DT_NULL:
case DT_NEEDED:
case DT_PLTRELSZ:
case DT_RELASZ:
case DT_RELAENT:
case DT_STRSZ:
case DT_SYMENT:
case DT_SONAME:
case DT_RPATH:
case DT_SYMBOLIC:
case DT_RELSZ:
case DT_RELENT:
case DT_PLTREL:
case DT_TEXTREL:
case DT_BIND_NOW:
case DT_LOPROC:
case DT_HIPROC:
return DT_VAL;
case DT_PLTGOT:
case DT_HASH:
case DT_STRTAB:
case DT_SYMTAB:
case DT_RELA:
case DT_INIT:
case DT_FINI:
case DT_JMPREL:
case DT_DEBUG:
case DT_REL:
return DT_PTR;
default:
return DT_VAL;
}
}
void printDynamicSegment32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,uint8_t* pFileBuffer) {
for (int i = 0; i < sectionNum; i++) {
if (pSectionHeader[i].sh_type == SHT_DYNAMIC) {
Elf32_Shdr *pDynamicSection = &pSectionHeader[i];
Elf32_Word dynamicItemNum = pDynamicSection->sh_size / pDynamicSection->sh_entsize;
printf("Dynamic Section At File Offset %#x Contains %d Entries:\n", pDynamicSection->sh_offset,dynamicItemNum);
printf("\tTag \t\tType\t\t\t\tName/Value\n");
Elf32_Dyn *pDynamicTable = (Elf32_Dyn *) (pFileBuffer + pDynamicSection->sh_offset);
Elf32_Shdr *pDynamicStringTableHeader = &pSectionHeader[pDynamicSection->sh_link];
// dynamic string table
char *pDynamicStringTable = (char *) pFileBuffer + pDynamicStringTableHeader->sh_offset;
for (int j = 0; j < dynamicItemNum; j++) {
printf("\t%08x", pDynamicTable[j].d_tag);
printf("\t%-16s", getDynamicType(pDynamicTable[j].d_tag));
printf("\t%08x\t", pDynamicTable[j].d_un.d_val);
if (getDynamicDunType(pDynamicTable[j].d_tag) == DT_PTR) //Some special item is ptr
printf("(PTR)");
//Index of shared library path in dynamic string table
switch (pDynamicTable[j].d_tag) {
case DT_NEEDED: printf("[%s]", pDynamicStringTable + pDynamicTable[j].d_un.d_val);
break;
case DT_SONAME: printf("[%s]", pDynamicStringTable + pDynamicTable[j].d_un.d_val);
break;
default: ;
}
printf("\n");
}
}
}
}
十
Hash Table(Export Table)
哈希表可用于查询导出函数, 有两种, 目前的elf文件主要是用GNU HASH表作为导出表。
.hash //旧版,可以查导入和导出函数 DT_HASH
.gnu.hash //新版,只能查导出函数 DT_GNU_HASH
Hash表定义如下
struct ELFHash {
uint32_t nbucket; //bucket的数目
uint32_t nchain; //chain的数目,和动态符号表的符号数相同
uint32_t buckets[]; //nbucket个项的数组
uint32_t chains[]; //nchain个项的数组
};
Linux原始Elf Hash算法如下
uint32_t elf_hash(const unsigned char* name)
{
uint32_t h = 0, g;
while (*name)
{
h = (h << 4) + *name++;
if (g = h & 0xf0000000)
h ^= g >> 24;
h &= ~g;
}
return h;
}
ELF Hash Table根据符号名查找符号地址的流程如下
1.根据elfhash函数计算符号名的hash
2.index=buckets[hash%nbucket]
3.如果index==SHT_UNDEF(0)则未找到符号,结束
4.如果符号名不同则根据index从chains表找下一个符号索引,继续第3步
代码表示如下:
uint32_t findSymbolIndexByElfHash(const char* symbolName,
uint32_t* pHashTable,
Elf32_Sym* pSymbolTable,
const char* pSymbolStringTable)
{
uint32_t nbucket=pHashTable[0],nchain=pHashTable[1];
uint32_t* buckets=&pHashTable[2],*chains=&pHashTable[2+nbucket];
uint32_t hash = elf_hash(symbolName);
for (uint32_t index=buckets[hash % nbucket]; index; index = chains[index]) {
if (strcmp(symbolName, &pSymbolStringTable[pSymbolTable[index].st_name]) == 0) {
return index;
}
}
return 0;
}
手工查找流程示例:
由于x86_64下gcc编译的elf程序默认只使用gnu.hash,以Android NDK得到的64位so为例
找到.hash节,发现nbucket=nchain=0x36
根据elfhash计算bucket下标, index=hash%nbucket =48
由于bucket项大小为4字节,从0x960开始+48*4=0xA20
得到动态符号表下标为0xE(14), 查找符号表正好对应dlopen函数
Android的elfhash算法代码有所不同,但和原始elfhash等价
参考https://cs.android.com/android/platform/superproject/+/android-4.1.2_r2.1:bionic/linker/linker.c
static unsigned elfhash(const char *_name)
{
const unsigned char *name = (const unsigned char *) _name;
unsigned h = 0, g; while(*name) {
h = (h << 4) + *name++;
g = h & 0xf0000000;
h ^= g;
h ^= g >> 24;
}
return h;
}
static Elf32_Sym *_elf_lookup(soinfo *si, unsigned hash, const char *name)
{
Elf32_Sym *s;
Elf32_Sym *symtab = si->symtab;
const char *strtab = si->strtab;
unsigned n;
TRACE_TYPE(LOOKUP, "%5d SEARCH %s in %s@0x%08x %08x %d\n", pid,
name, si->name, si->base, hash, hash % si->nbucket);
n = hash % si->nbucket;
for(n = si->bucket[hash % si->nbucket]; n != 0; n = si->chain[n]){
s = symtab + n;
if(strcmp(strtab + s->st_name, name)) continue;
/* only concern ourselves with global and weak symbol definitions */
switch(ELF32_ST_BIND(s->st_info)){
case STB_GLOBAL:
case STB_WEAK:
/* no section == undefined */
if(s->st_shndx == 0) continue;
TRACE_TYPE(LOOKUP, "%5d FOUND %s in %s (%08x) %d\n", pid,
name, si->name, s->st_value, s->st_size);
return s;
}
}
return NULL;
}
Elf Hash在Android又定义为为Sysv Hash,参考https://cs.android.com/android/platform/superproject/+/android14-qpr3-release:external/musl/ldso/dynlink.c
static uint32_t sysv_hash(const char *s0)
{
const unsigned char *s = (void *)s0;
uint_fast32_t h = 0;
while (*s) {
h = 16*h + *s++;
h ^= h>>24 & 0xf0;
}
return h & 0xfffffff;
}
static Sym *sysv_lookup(const char *s, uint32_t h, struct dso *dso)
{
size_t i;
Sym *syms = dso->syms;
Elf_Symndx *hashtab = dso->hashtab;
char *strings = dso->strings;
for (i=hashtab[2+h%hashtab[0]]; i; i=hashtab[2+hashtab[0]+i]) {
if ((!dso->versym || dso->versym[i] >= 0)
&& (!strcmp(s, strings+syms[i].st_name)))
return syms+i;
}
return 0;
}
GNU Hash表项如下
struct GnuHash {
uint32_t nbucket;
uint32_t symndx; //支持查找index>=symndx的符号, index<symndx的不能直接通过GNU Hash表查找
uint32_t bloomSize; // 布隆过滤器需要的3个数据,用于快速判断某个符号是否查不到
uint32_t bloomShift; //
ElfW(Addr) blooms[]; // bloomSize个项的数组 32/64位下, 元素大小分别为uint32_t/uint64_t
uint32_t buckets[]; // nbucket个项的数组
uint32_t chains[]; // 和符号表索引一一对应, chain的大小等于导出函数个数
};
可以发现,GNU Hash并没有给出nchain字段,如何计算?
◆chains数组前面是连续的blooms和buckets数组,只要根据哈希表大小减去前面的成员大小即可
◆32位 nchain=GNUHashTable.sh_size/sizeof(uint32_t) - (4+bloomSize+nbucket)
◆64位 nchain=GNUHashTable.sh_size/sizeof(uint32_t) - (4+bloomSize*2+nbucket)
查找GNU Hash表的示意图如下:
1.chain表的虚线部分并不存在
2.chain表每个表项保存符号的哈希值
详细可参考ELF 通过 Sysv Hash & Gnu Hash 查找符号的实现及对比和ELF解析07_哈希表, 导出表
参考https://cs.android.com/android/platform/superproject/+/android14-qpr3-release:external/musl/ldso/dynlink.c
Android Linker的源码实现如下
uint32_t gnu_hash(const unsigned char* str)
{
uint_32 h = 5381;// 0x1505
while(*str != 0)
{
h += (h<<5) +*str++;// 33 * h + *str = h*33 + c = h + h * 32 + c = h + h << 5 + c
}
return h;
}static Sym *gnu_lookup(uint32_t h1, uint32_t *hashtab, struct dso *dso, const char *s)
{
uint32_t nbuckets = hashtab[0];
uint32_t *buckets = hashtab + 4 + hashtab[2]*(sizeof(size_t)/4);
uint32_t i = buckets[h1 % nbuckets];
if (!i) return 0;
uint32_t *hashval = buckets + nbuckets + (i - hashtab[1]);
for (h1 |= 1; ; i++) {
uint32_t h2 = *hashval++;
if ((h1 == (h2|1)) && (!dso->versym || dso->versym[i] >= 0)
&& !strcmp(s, dso->strings + dso->syms[i].st_name))
return dso->syms+i;
if (h2 & 1) break;
}
return 0;
}
unsigned int elf_hash(const char* _name)
{
const unsigned char* name=(const unsigned char*)_name;
unsigned int h = 0, g;
while (*name)
{
h = (h << 4) + *name++;
if (g = h & 0xf0000000)
h ^= g >> 24;
h &= ~g;
}
return h;
}
void printHashTable32(Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,uint8_t* pFileBuffer,const char* pSectionHeaderStringTable) {
printf("ELF Hash Tables:\n");
for(int i=0;i<sectionNum;i++) {
if(pSectionHeader[i].sh_type==SHT_HASH) {
//SHT_HASH 可同时查询导入和导出函数,linux默认弃用,android保留该节
//对于SHT_HASH类型而言,index=buckets[elfhash(symbolName)%nbucket]作为符号表索引
//如果index==0则符号不存在,如果符号不等则index=chains[index]继续循环判断
Elf32_Shdr* pDynamicSymbolTableHeader=&pSectionHeader[pSectionHeader[i].sh_link];
Elf32_Sym* pDynamicSymbolTable=(Elf32_Sym*)(pDynamicSymbolTableHeader->sh_offset+pFileBuffer);
const char* pDynamicSymbolStringTable=(const char*)(pSectionHeader[pDynamicSymbolTableHeader->sh_link].sh_offset+pFileBuffer);
uint32_t* pHashTable=(uint32_t*)(pSectionHeader[i].sh_offset+pFileBuffer);
uint32_t nbucket=pHashTable[0],nchain=pHashTable[1];
uint32_t* buckets=&pHashTable[2];
uint32_t* chains=&pHashTable[2+nbucket];
printf("\tHash Table '%s' contains %d entries\n",&pSectionHeaderStringTable[pSectionHeader[i].sh_name],nchain);
printf("\t\tNum\t\tHash \% Nbucket\t\tIndex\t\t\tValue\t\t\tName\n");
for(uint32_t j=0,count=0;j<nbucket;j++) {
uint32_t index=buckets[j];//遍历buckets
if(index) {
//index!=0 说明存在对应符号,打印首个符号
printf("\t\t%d\t\t%08x\t\t%08x\t\t%08x\t\t%s\n",++count,elf_hash(&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name])%nbucket,index,pDynamicSymbolTable[index].st_value,&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name]);
}
//判断是否存在chain,打印相同hash%nbucket的其余符号,
while(chains[index]) {
index=chains[index];
printf("\t\t%d\t\t%08x\t\t%08x\t\t%08x\t\t%s\n",++count,elf_hash(&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name])%nbucket,index,pDynamicSymbolTable[index].st_value,&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name]);
}
}
}
if(pSectionHeader[i].sh_type==SHT_GNU_HASH) {
//SHT_GNU_HASH 只能查询导出函数,作为elf的导出函数表
Elf32_Shdr* pDynamicSymbolTableHeader=&pSectionHeader[pSectionHeader[i].sh_link];
Elf32_Sym* pDynamicSymbolTable=(Elf32_Sym*)(pDynamicSymbolTableHeader->sh_offset+pFileBuffer);
const char* pDynamicSymbolStringTable=(const char*)(pSectionHeader[pDynamicSymbolTableHeader->sh_link].sh_offset+pFileBuffer);
uint32_t* pGNUHashTable=(uint32_t*)(pSectionHeader[i].sh_offset+pFileBuffer);
uint32_t nbucket=pGNUHashTable[0];
uint32_t symndx=pGNUHashTable[1];
uint32_t bloomSize=pGNUHashTable[2];
uint32_t bloomShift=pGNUHashTable[3];
Elf32_Addr* blooms=(Elf32_Addr*)&pGNUHashTable[4];
uint32_t* buckets=pGNUHashTable+4+bloomSize;
uint32_t* chains=buckets+nbucket-symndx;
//chain的个数等于导出符号个数,但GNU HASH没有nchain,需要手动计算
uint32_t nchain=pSectionHeader[i].sh_size/sizeof(uint32_t)-(4+bloomSize+nbucket);
printf("\tHash Table '%s' contains %d entries, nbucket: %d, symndx: %#x \n",&pSectionHeaderStringTable[pSectionHeader[i].sh_name],nchain,nbucket,symndx);
printf("\t\tNum\t\tIndex\t\t\tValue\t\t\tName\n");
for(int j=0,count=0;j<nbucket;j++) {
uint32_t index=buckets[j];
if(index) {
printf("\t\t%d\t\t%08x\t\t%08x\t\t%s\n",++count,index,pDynamicSymbolTable[index].st_value,&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name]);
}
//chain最低位为0时表示有,为1时表示无
while((chains[index]&1)==0) {
index++;
printf("\t\t%d\t\t%08x\t\t%08x\t\t%s\n",++count,index,pDynamicSymbolTable[index].st_value,&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name]);
}
}
}
}
}
11
ELF Loader
ELF Program Header描述了ELF文件的哪些段需要映射到内存,ELF程序的加载流程如下:
1.将elf文件加载到内存中,成为filebuffer
2.根据program header,映射filebuffer至imagebuffer
3.重定位,修复全局变量地址和外部引用地址
4.跳转至入口点
分别编译loadelf32/64以加载x86/x64的elf文件
gcc -m32 main.c LoadELF.h LoadELF.c -o loadelf32
gcc -m64 main.c LoadELF.h LoadELF.c -o loadelf64
main.c
// LoadELF
#include "LoadELF.h"
#include <stdio.h>
int main(int argc, char *argv[]) {
if (argc!= 2) {
printf("Usage: %s <filepath>\n", argv[0]);
return 1;
}
LoadAndExecElf(argv[1]);
return 0;
}
LoadELF.h
#ifndef LOADELF_H
#define LOADELF_H
#include <stddef.h>
#include <stdint.h>
uint8_t* readFileToBytes(const char *fileName,size_t* readSize);
void LoadAndExecElf(const char* filePath);
#endif //LOADELF_H
LoadELF.c
根据x86/x64不同环境,定义对应宏
#include "LoadELF.h"
#include <stdio.h>
#include <elf.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <string.h>
#include <sys/mman.h>
#include <link.h>
#ifdef __x86_64__
#define Elf_Ehdr Elf64_Ehdr
#define Elf_Phdr Elf64_Phdr
#define Elf_Shdr Elf64_Shdr
#define Elf_Addr Elf64_Addr
#define Elf_Dyn Elf64_Dyn
#define Elf_Rel Elf64_Rela
#define Elf_Sym Elf64_Sym
#define ELF_R_TYPE ELF64_R_TYPE
#define ELF_R_SYM ELF64_R_SYM
#define DT_REL_ITEM DT_RELA
#define DT_REL_SZ DT_RELASZ
#else
#define Elf_Ehdr Elf32_Ehdr
#define Elf_Phdr Elf32_Phdr
#define Elf_Shdr Elf32_Shdr
#define Elf_Addr Elf32_Addr
#define Elf_Dyn Elf32_Dyn
#define Elf_Rel Elf32_Rel
#define Elf_Sym Elf32_Sym
#define ELF_R_TYPE ELF32_R_TYPE
#define ELF_R_SYM ELF32_R_SYM
#define DT_REL_ITEM DT_REL
#define DT_REL_SZ DT_RELSZ
#endif
uint8_t* readFileToBytes(const char *fileName,size_t* readSize) {
FILE *file = fopen(fileName, "rb");
if (file == NULL) {
printf("Error opening file\n");
fclose(file);
return NULL;
}
fseek(file, 0,SEEK_END);
size_t fileSize = ftell(file);
fseek(file, 0,SEEK_SET);
uint8_t *buffer = (uint8_t *) malloc(fileSize);
if (buffer == NULL) {
printf("Error allocating memory\n");
fclose(file);
return NULL;
}
size_t bytesRead = fread(buffer, 1, fileSize, file);
if(bytesRead!=fileSize) {
printf("Read bytes not equal file size!\n");
free(buffer);
fclose(file);
return NULL;
}
fclose(file);
if(readSize)
*readSize=bytesRead;
return buffer;
}
//以指定对齐值对齐
uint64_t alignValue(uint64_t value, uint64_t alignment) {
return value % alignment ? (value / alignment + 1) * alignment : value;
}
size_t getElfMemorySize(Elf_Phdr* pProgramHeader,Elf_Half segmentNum) {
size_t size = 0;
//从后往前遍历段表,最后一个段的内存起始地址+大小对齐后即为镜像大小
for (int i = segmentNum - 1; i >= 0; i--) {
if (pProgramHeader[i].p_type == PT_LOAD) {
size = pProgramHeader[i].p_vaddr + pProgramHeader[i].p_memsz;
break;
}
}
return alignValue(size, 0x1000);
}
Elf_Word getDynamicTableValueByType(Elf_Dyn *dynamicTable, size_t dynamicTableSize, int type) {
for (int i = 0; i < dynamicTableSize; i++) {
if (dynamicTable[i].d_tag == type) {
return dynamicTable[i].d_un.d_val;
}
}
return 0;
}const char** getNeededLibraryPath(uint8_t* pElfBuffer,Elf_Dyn *pDynamicTable, size_t dynamicTableSize,size_t* neededLibraryNum) {
//Traverse dynamic segment find needed library
char** buffer = NULL;
int num=0;
char* pImageStringTable=(char*)pElfBuffer+getDynamicTableValueByType(pDynamicTable,dynamicTableSize,DT_STRTAB);
for (int i = 0; i < dynamicTableSize; i++) {
if (pDynamicTable[i].d_tag == DT_NEEDED) {
num++;
buffer=(char**)realloc(buffer,num*sizeof(char*));
if(buffer==NULL) {
printf("Error reallocating memory\n");
exit(-1);
}
buffer[num-1]=pImageStringTable+ pDynamicTable[i].d_un.d_val;
}
}
*neededLibraryNum=num;
return (const char**)buffer;
}
Elf_Addr getSymbolAddress(const char** neededLibrary, size_t neededLibraryNum, const char *symbolName) {
//Load needed dynamic libraries,and traverse libraries, get symbol address
for (int i = 0; i < neededLibraryNum; i++) {
void *handle = dlopen(neededLibrary[i],RTLD_NOW);
if (handle == NULL) {
printf("Error opening library %s\n", dlerror());
exit(1);
}
void *address = dlsym(handle, symbolName);
if (address == NULL) {
continue;
}
return (Elf_Addr)address;
}
printf("Can't find address of symbol: %s\n",symbolName);
return 0;
}
void mapSegmentToMemory(uint8_t* pImageBuffer,uint8_t* pFileBuffer,Elf_Phdr* pProgramHeader,Elf_Half segmentNum) {
for (int i = 0; i < segmentNum; i++) {
if (pProgramHeader[i].p_type == PT_LOAD) {
uint8_t *pImageAddr = pImageBuffer + pProgramHeader[i].p_vaddr;//根据内存地址和大小进行映射
size_t memorySize = pProgramHeader[i].p_memsz;
Elf_Word segmentFlags = pProgramHeader[i].p_flags;
int protection = 0;
memcpy(pImageAddr, pFileBuffer + pProgramHeader[i].p_offset, pProgramHeader[i].p_filesz);
if (segmentFlags & PF_R) {
protection |= PROT_READ;
}
if (segmentFlags & PF_W) {
protection |= PROT_WRITE;
}
if (segmentFlags & PF_X) {
protection |= PROT_EXEC;
}
mprotect(pImageAddr, alignValue(memorySize, 0x1000), protection);//页面权限设置
}
}
}
void fixRelocationItem(Elf_Rel* pRelocationTable,Elf_Word relocationItemNum,uint8_t* pImageBuffer,const char* pDynamicStringTable,Elf_Sym* pDynamicSymbolTable,const char** neededLibrary,size_t neededLibraryNum) {
Elf_Addr* fixItem=NULL;//根据位数不同,修复项4/8字节
Elf_Addr baseAddr=(Elf_Addr)pImageBuffer;
for(int i=0;i<relocationItemNum;i++) {
switch (ELF_R_TYPE(pRelocationTable[i].r_info)) {
//Relocate base address
case R_386_RELATIVE:
fixItem=(Elf_Addr*)(pImageBuffer+pRelocationTable[i].r_offset);
*fixItem+=baseAddr;
break;
//Fix GOT and PLT
case R_386_GLOB_DAT:
case R_386_JMP_SLOT:
// Get symbol name and real address
const char* symbolName=&pDynamicStringTable[ pDynamicSymbolTable[ELF_R_SYM(pRelocationTable[i].r_info)].st_name ];//符号表表项的name属性是字符串表下标
fixItem=(Elf_Addr*)(pImageBuffer+pRelocationTable[i].r_offset);
Elf_Addr symbolAddr=getSymbolAddress(neededLibrary,neededLibraryNum,symbolName);
*fixItem=symbolAddr;
break;
}
}
}
void LoadAndExecElf(const char* filePath) {
//1. Read file to memory buffer
size_t readFileSize=0;
uint8_t* pFileBuffer=readFileToBytes(filePath,&readFileSize);
if(pFileBuffer==NULL) {
printf("Error reading file\n");
return;
}
Elf_Ehdr* pElfHeader=(Elf_Ehdr*)pFileBuffer;
Elf_Phdr *pProgramHeader=(Elf_Phdr*)(pFileBuffer+pElfHeader->e_phoff);
Elf_Half segmentNum=pElfHeader->e_phnum;
uint8_t* pImageBuffer=NULL;
//2. Mapping file buffer to image buffer
size_t elfMemorySize = getElfMemorySize(pProgramHeader,segmentNum);
if (elfMemorySize == 0) {
printf("ELF memory size is 0!\n");
return;
}
posix_memalign((void*)&pImageBuffer, 0x1000, elfMemorySize); //Alloc align memory
if (pImageBuffer == NULL) {
printf("Error allocating memory\n");
return;
}
memset(pImageBuffer,0 ,elfMemorySize);
// Mapping segments to memory and set protection
mapSegmentToMemory(pImageBuffer,pFileBuffer,pProgramHeader,segmentNum);
//3. Relocate
Elf_Phdr *pDynamicTableHeader=NULL;
Elf_Dyn *pDynamicTable=NULL;
for (int i = 0; i < segmentNum; i++) {
if (pProgramHeader[i].p_type == PT_DYNAMIC) {
pDynamicTableHeader = &pProgramHeader[i];
break;
}
}
pDynamicTable = (Elf_Dyn *) (pImageBuffer + pDynamicTableHeader->p_vaddr);
size_t dynamicItemNum = pDynamicTableHeader->p_filesz / sizeof(Elf_Dyn);
Elf_Rel *pRelocationTable =NULL;
size_t relocationItemNum=0;
Elf_Rel *pJmpRelocationTable = (Elf_Rel *) (pImageBuffer + getDynamicTableValueByType(pDynamicTable, dynamicItemNum,DT_JMPREL));
size_t jmpRelocationItemNum=0;
Elf_Sym *pDynamicSymbolTable = NULL;
char *pDynamicStringTable = NULL;
for (int i = 0; i <dynamicItemNum; i++) {
switch (pDynamicTable[i].d_tag) {
case DT_REL_ITEM: pRelocationTable=(Elf_Rel*)(pImageBuffer+pDynamicTable[i].d_un.d_val); break;
case DT_JMPREL: pJmpRelocationTable=(Elf_Rel*)(pImageBuffer+pDynamicTable[i].d_un.d_val); break;
case DT_REL_SZ: relocationItemNum=pDynamicTable[i].d_un.d_val/sizeof(Elf_Rel); break;
case DT_PLTRELSZ: jmpRelocationItemNum=pDynamicTable[i].d_un.d_val/sizeof(Elf_Rel); break;
case DT_SYMTAB:pDynamicSymbolTable=(Elf_Sym*)(pImageBuffer+pDynamicTable[i].d_un.d_val);break;
case DT_STRTAB:pDynamicStringTable=(char*)(pImageBuffer+pDynamicTable[i].d_un.d_val);break;
}
}
size_t neededLibraryNum=0;
const char** neededLibrary=getNeededLibraryPath(pImageBuffer,pDynamicTable,dynamicItemNum,&neededLibraryNum);
fixRelocationItem(pRelocationTable,relocationItemNum,pImageBuffer,pDynamicStringTable,pDynamicSymbolTable,neededLibrary,neededLibraryNum);
fixRelocationItem(pJmpRelocationTable,jmpRelocationItemNum,pImageBuffer,pDynamicStringTable,pDynamicSymbolTable,neededLibrary,neededLibraryNum);
//4. Jump to entry point
typedef void (*VoidFunctionPtr)();
VoidFunctionPtr entry=(VoidFunctionPtr)(pImageBuffer+pElfHeader->e_entry);
printf("Load ELF success!Jump to entry point:%#lx\n",(unsigned long long)entry);
entry();
printf("Come back\n");
}
效果如下
12
References
ELF文件格式
ELF文件格式解析
《程序员的自我修养》
ELF加载器的原理与实现
【内核】ELF 文件执行流程
说一下Linux可执行文件的格式,ELF格式
ELF解析07_哈希表, 导出表
ELF 通过 Sysv Hash & Gnu Hash 查找符号的实现及对比
[翻译]GNU Hash ELF Sections
看雪ID:东方玻璃
https://bbs.kanxue.com/user-home-968342.htm
# 往期推荐
1、细说软件保护
球分享
球点赞
球在看
点击阅读原文查看更多