ELF文件结构浅析-解析器和加载器实现
2024-12-31 10:1:0 Author: mp.weixin.qq.com(查看原文) 阅读量:1 收藏

网上有不少ELF文件结构相关的文章,但大都介绍原理,具体的代码实现并不多(或许是因为有开源代码)。然而阅读开源代码不是我的强项(看的头大),于是依据当年学习PE文件结构的思路,学习ELF文件格式。

仿照 readelf 的输出结果编写解析器, 最后编写了简单的ELF加载器。

代码支持x86和x64的ELF文件:

◆解析器针对x86/x64有两套实现, 支持解析x86和x64平台的ELF文件

◆加载器依赖编译环境,只能加载对应平台的ELF文件,要分别编译x86和x64的加载器

◆内容讲解演示主要以x86为主

环境&工具:

◆VMware pro 17.6.1

◆Kali Linux 2023.4 vmware amd64

◆gcc (Debian 14.2.0-8) 14.2.0

◆CLion 2024.2.3

◆010 Editor 13.0.1

◆IDA Pro 7.7

附件:

◆Sources.zip

◆CompiledTools.zip

◆TestFiles.zip

由于本人水平有限, 内容错误之处还望大佬多多包涵, 批评指正


ELF文件结构概述

ELF是UNIX系统实验室(USL)作为应用程序二进制接口(Application Binary Interface,ABI)而开发和发布的,也是Linux的主要可执行文件格式, 全称是Executable and Linking Format,这个名字相当关键,包含了ELF所需要支持的两个功能——执行和链接

ELF文件包含3大部分,ELF头,ELF节,ELF段

◆节头表指向节, 类似PE的节表, 描述各个节区的信息

◆程序头表描述段信息,一个段可以包含多个节,指导ELF文件如何映射至文件

◆在OBJ文件中,段是可选的,在可执行文件中,节是可选的,但NDK编译的ELF文件同时有段和节

ELF文件封装了部分数据类型

#include <stdint.h>

typedef uint16_t Elf32_Half;
typedef uint16_t Elf64_Half;

/* Types for signed and unsigned 32-bit quantities. */
typedef uint32_t Elf32_Word;
typedef int32_t Elf32_Sword;
typedef uint32_t Elf64_Word;
typedef int32_t Elf64_Sword;

/* Types for signed and unsigned 64-bit quantities. */
typedef uint64_t Elf32_Xword;
typedef int64_t Elf32_Sxword;
typedef uint64_t Elf64_Xword;
typedef int64_t Elf64_Sxword;

/* Type of addresses. */
typedef uint32_t Elf32_Addr;
typedef uint64_t Elf64_Addr;

/* Type of file offsets. */
typedef uint32_t Elf32_Off;
typedef uint64_t Elf64_Off;

/* Type for section indices, which are 16-bit quantities. */
typedef uint16_t Elf32_Section;
typedef uint16_t Elf64_Section;

/* Type for version symbol information. */
typedef Elf32_Half Elf32_Versym;
typedef Elf64_Half Elf64_Versym;

可以发现,32和64位定义的数据结构仅有Addr和Off有位宽差距,我们可以定义对应的通用类型。

ELF数据结构原始类型备注
Elfn_Halfuint16_t
Elfn_Worduint32_t
Elfn_Swordint32_t
Elfn_Xworduint64_t
Elfn_Sxwordint64_t
Elf32_Addruint32_t地址
Elf64_Addruint64_t
Elf32_Offuint32_t文件偏移
Elf64_Offuint64_t
Elfn_Sectionuint16_t节索引
Elfn_Versymuint16_t

使用gcc分别编译32/64位的elf可执行文件用于测试。

#include <stdio.h>

int main(int argc, char* argv[]){
printf("Hello ELF!\n");
return 0;
}

gcc -m32 -O0 main.c -o HelloELF32
gcc -m64 -O0 main.c -o HelloELF64

编写ELF解析器/加载器前,定义文件读取函数。读取指定路径文件,返回字节指针和读取文件大小。

// 读取文件,返回buffer和读取字节数
uint8_t* readFileToBytes(const char *fileName,size_t* readSize) {
FILE *file = fopen(fileName, "rb");
if (file == NULL) {
printf("Error opening file\n");
fclose(file);
return NULL;
}
fseek(file, 0,SEEK_END);
size_t fileSize = ftell(file);
fseek(file, 0,SEEK_SET);
uint8_t *buffer = (uint8_t *) malloc(fileSize);
if (buffer == NULL) {
printf("Error allocating memory\n");
fclose(file);
return NULL;
}
size_t bytesRead = fread(buffer, 1, fileSize, file);
if(bytesRead!=fileSize) {
printf("Read bytes not equal file size!\n");
free(buffer);
fclose(file);
return NULL;
}
fclose(file);
if(readSize)
*readSize=bytesRead;
return buffer;
}


ELF Header

定义在elf.h中:

#define EI_NIDENT (16)
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf32_Half e_type; /* Object file type */
Elf32_Half e_machine; /* Architecture */
Elf32_Word e_version; /* Object file version */
Elf32_Addr e_entry; /* Entry point virtual address */
Elf32_Off e_phoff; /* Program header table file offset */
Elf32_Off e_shoff; /* Section header table file offset */
Elf32_Word e_flags; /* Processor-specific flags */
Elf32_Half e_ehsize; /* ELF header size in bytes */
Elf32_Half e_phentsize; /* Program header table entry size */
Elf32_Half e_phnum; /* Program header table entry count */
Elf32_Half e_shentsize; /* Section header table entry size */
Elf32_Half e_shnum; /* Section header table entry count */
Elf32_Half e_shstrndx; /* Section header string table index */
} Elf32_Ehdr;

//64位
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf64_Half e_type; /* Object file type */
Elf64_Half e_machine; /* Architecture */
Elf64_Word e_version; /* Object file version */
Elf64_Addr e_entry; /* Entry point virtual address */
Elf64_Off e_phoff; /* Program header table file offset */
Elf64_Off e_shoff; /* Section header table file offset */
Elf64_Word e_flags; /* Processor-specific flags */
Elf64_Half e_ehsize; /* ELF header size in bytes */
Elf64_Half e_phentsize; /* Program header table entry size */
Elf64_Half e_phnum; /* Program header table entry count */
Elf64_Half e_shentsize; /* Section header table entry size */
Elf64_Half e_shnum; /* Section header table entry count */
Elf64_Half e_shstrndx; /* Section header string table index */
} Elf64_Ehdr;

可以使用readelf查看

e_ident

16字节ELF标识,前4字节是ELF文件标识"\x7fELF",不可修改。

010editor中解析如下

1.e_ident[EI_CLASS]

2.e_ident[EI_DATA]

3.e_ident[EI_VERSION]

e_type

2字节,表明目标文件属于哪种类型

Android5.0后,可执行文件全部为so,这个标志只能为03不可修改。

/* Legal values for e_type (object file type).  */

#define ET_NONE 0 /* No file type */
#define ET_REL 1 /* Relocatable file */
#define ET_EXEC 2 /* Executable file */
#define ET_DYN 3 /* Shared object file */
#define ET_CORE 4 /* Core file */
#define ET_NUM 5 /* Number of defined types */
#define ET_LOOS 0xfe00 /* OS-specific range start */
#define ET_HIOS 0xfeff /* OS-specific range end */
#define ET_LOPROC 0xff00 /* Processor-specific range start */
#define ET_HIPROC 0xffff /* Processor-specific range end */

e_machine

2字节,该字段用于指定ELF文件适用的处理器架构,部分定义如下, 对于intel,固定为EM_386

#define EM_NONE		 0	/* No machine */
#define EM_M32 1 /* AT&T WE 32100 */
#define EM_SPARC 2 /* SUN SPARC */
#define EM_386 3 /* Intel 80386 */
#define EM_68K 4 /* Motorola m68k family */
#define EM_88K 5 /* Motorola m88k family */
#define EM_IAMCU 6 /* Intel MCU */
#define EM_860 7 /* Intel 80860 */
#define EM_MIPS 8 /* MIPS R3000 big-endian */
#define EM_S370 9 /* IBM System/370 */
#define EM_MIPS_RS3_LE 10 /* MIPS R3000 little-endian */
/* reserved 11-14 */
#define EM_PARISC 15 /* HPPA */
/* reserved 16 */

e_version

4字节,指明目标文件版本

Android不检查该字段,IDA检查,但对反汇编无影响

e_entry

4或8字节,程序入口点(OEP) RVA, 如果e_type=2 即可执行程序, 则该字段为VA; 如果是so,则为0

e_phoff

4或8字节,程序头表偏移FOA,如果没有程序头表则该字段为0

e_shoff

4或8字节,节头表偏移FOA,如果没有节头表则该字段为0

Android对抗中经常会删除节表

e_flags

4字节标志,无用

e_ehsize

2字节,ELF文件头大小

Android不检查,默认ELF Header大小为52字节; IDA检查,修改该字段只会产生警告不影响反汇编

e_phentsize

2字节,表示程序头表每一个表项的大小

e_phnum

2字节,表示程序头表的表项数目

e_shentsize

2字节,节头表表项大小

e_shnum

2字节,节头表表项个数

e_shstrndx

2字节,节头表中与节名表相对应表项的索引

打印文件头

根据枚举值,定义对应的字符串数组以打印相关信息。

// Print ELF Header
char ELF_Class[3][6] = {"NONE", "ELF32", "ELF64"};
char ELF_Data[3][14] = {"NONE", "Little Endian", "Big Endian"};
char objectFileType[7][7] = {"NONE", "REL", "EXEC", "DYN", "CORE", "LOPROC", "HIPROC"};
void printELFHeader32(const Elf32_Ehdr* pElfHeader) {
printf("ELF Header:\n");
printf("\tMagic:\t");
for (int i = 0; i < EI_NIDENT; i++) {
printf("%02x ", pElfHeader[i].e_ident[i]);
}
printf("\n");
printf("\t%-36s%s\n", "Class:", ELF_Class[pElfHeader->e_ident[EI_CLASS]]);
printf("\t%-36s%s\n", "Data:", ELF_Data[pElfHeader->e_ident[EI_DATA]]);
printf("\t%-36s%#x\n", "Version:", pElfHeader->e_version);
printf("\t%-36s%#x\n", "Machine:", pElfHeader->e_machine);
printf("\t%-36s%s\n", "Type:", objectFileType[pElfHeader->e_type]);
printf("\t%-36s%#x\n", "Size Of ELF Header:", pElfHeader->e_ehsize);
printf("\t%-36s%#x\n", "Entry point:", pElfHeader->e_entry);
printf("\t%-36s%#x\n", "Start Of Program Headers:", pElfHeader->e_phoff);
printf("\t%-36s%#x\n", "Start Of Section Headers:", pElfHeader->e_shoff);
printf("\t%-36s%#x\n", "Size Of Program Headers:", pElfHeader->e_phentsize);
printf("\t%-36s%#x\n", "Number Of Program Headers:", pElfHeader->e_phnum);
printf("\t%-36s%#x\n", "Size Of Section Headers:", pElfHeader->e_shentsize);
printf("\t%-36s%#x\n", "Number Of Sections:", pElfHeader->e_shnum);
printf("\t%-36s%d\n", "Section Header String Table Index:", pElfHeader->e_shstrndx);
printf("ELF Header End\n");
}

打印效果如下


Section Header

类似PE文件的节表(IMAGE_SECTION_HEADER)

节表保存了节的基本属性,是ELF文件中除了文件头之外最重要的结构,编译器,链接器和装载器都依赖节表定位和访问各个节的属性。

节表数组第0个元素固定为SHN_UNDEF, 节表成员结构定义如下:

typedef struct
{
Elf32_Word sh_name; /* Section name (string tbl index) */
Elf32_Word sh_type; /* Section type */
Elf32_Word sh_flags; /* Section flags */
Elf32_Addr sh_addr; /* Section virtual addr at execution */
Elf32_Off sh_offset; /* Section file offset */
Elf32_Word sh_size; /* Section size in bytes */
Elf32_Word sh_link; /* Link to another section */
Elf32_Word sh_info; /* Additional section information */
Elf32_Word sh_addralign; /* Section alignment */
Elf32_Word sh_entsize; /* Entry size if section holds table */
} Elf32_Shdr;

typedef struct
{
Elf64_Word sh_name; /* Section name (string tbl index) */
Elf64_Word sh_type; /* Section type */
Elf64_Xword sh_flags; /* Section flags */
Elf64_Addr sh_addr; /* Section virtual addr at execution */
Elf64_Off sh_offset; /* Section file offset */
Elf64_Xword sh_size; /* Section size in bytes */
Elf64_Word sh_link; /* Link to another section */
Elf64_Word sh_info; /* Additional section information */
Elf64_Xword sh_addralign; /* Section alignment */
Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr;

readelf查看节表

sh_name

4字节,偏移值,通过ELF File Header.e_shstrndx拿到节表中节名称表对应项的索引

然后在节表中找到该项,找到sh_offset的文件偏移 sh_name+sh_offset即为该节名的字符串的FOA。

sh_type

4字节,指示节的类型,定义如下:

/* Legal values for sh_type (section type).  */

#define SHT_NULL 0 /* Section header table entry unused */
#define SHT_PROGBITS 1 /* Program data */
#define SHT_SYMTAB 2 /* Symbol table */
#define SHT_STRTAB 3 /* String table */
#define SHT_RELA 4 /* Relocation entries with addends */
#define SHT_HASH 5 /* Symbol hash table */
#define SHT_DYNAMIC 6 /* Dynamic linking information */
#define SHT_NOTE 7 /* Notes */
#define SHT_NOBITS 8 /* Program space with no data (bss) */
#define SHT_REL 9 /* Relocation entries, no addends */
#define SHT_SHLIB 10 /* Reserved */
#define SHT_DYNSYM 11 /* Dynamic linker symbol table */
#define SHT_INIT_ARRAY 14 /* Array of constructors */
#define SHT_FINI_ARRAY 15 /* Array of destructors */
#define SHT_PREINIT_ARRAY 16 /* Array of pre-constructors */
#define SHT_GROUP 17 /* Section group */
#define SHT_SYMTAB_SHNDX 18 /* Extended section indices */
#define SHT_RELR 19 /* RELR relative relocations */
#define SHT_NUM 20 /* Number of defined types. */
#define SHT_LOOS 0x60000000 /* Start OS-specific. */
#define SHT_GNU_ATTRIBUTES 0x6ffffff5 /* Object attributes. */
#define SHT_GNU_HASH 0x6ffffff6 /* GNU-style hash table. */
#define SHT_GNU_LIBLIST 0x6ffffff7 /* Prelink library list */
#define SHT_CHECKSUM 0x6ffffff8 /* Checksum for DSO content. */
#define SHT_LOSUNW 0x6ffffffa /* Sun-specific low bound. */
#define SHT_SUNW_move 0x6ffffffa
#define SHT_SUNW_COMDAT 0x6ffffffb
#define SHT_SUNW_syminfo 0x6ffffffc
#define SHT_GNU_verdef 0x6ffffffd /* Version definition section. */
#define SHT_GNU_verneed 0x6ffffffe /* Version needs section. */
#define SHT_GNU_versym 0x6fffffff /* Version symbol table. */
#define SHT_HISUNW 0x6fffffff /* Sun-specific high bound. */
#define SHT_HIOS 0x6fffffff /* End OS-specific type */
#define SHT_LOPROC 0x70000000 /* Start of processor-specific */
#define SHT_HIPROC 0x7fffffff /* End of processor-specific */
#define SHT_LOUSER 0x80000000 /* Start of application-specific */
#define SHT_HIUSER 0x8fffffff /* End of application-specific */

比较常见的节类型如下:

SHT_NULL   	//无效节
SHT_STRTAB //本节是字符串表 ELF文件可以有多个字符串表节
SHT_RELA //重定位节
SHT_HASH //表明本节包含一张哈希表 目前一个ELF文件最多只能有一张哈希表
SHT_DYNAMIC //表明本节包含动态链接信息 目前一个目标文件最多一个dynamic节
SHT_NOBITS //表明本节内容为空,不占用实际内存空间
SHT_REL //重定位节
SHT_DYNSYM //表明本节是符号表,同SHT_SYMTAB

sh_flags

4字节,由一系列标志bit位组成

1.SHF_WRITE 表示本节在进程中可写

2.SHF_ALLOC 表示本节在运行中需要占用内存

3.SHF_EXECINSTR 表示本节的内容是指令代码

4.SHF_MASKPROC 被该值覆盖的位都保留做特殊处理器扩展用

sh_addr

4字节,节的内存虚拟地址

sh_offset

4字节,节的FOA

sh_size

4字节,段的大小

sh_link

4字节,索引值

sh_info

4字节,节的附加信息

根据节类型不同,sh_info和sh_link有不同的含义。

sh_addralign

4字节,段地址对齐值,假如为0或者1表示该段没有对齐要求; 假如为3表示对齐2^3=8

节的sh_addr必须能被sh_addralign整除,即sh_addr%sh_addralign=0

sh_entsize

4字节,部分节的内容是一张表,每个表项的大小固定(例如符号表), 该字段指定其每个表项的大小,为0则表示不是这些表。

打印节表头

// Print ELF Section Headers
char *getSectionTypeString(Elf_Word sectionType) {
switch (sectionType) {
case SHT_NULL: return "NULL";
case SHT_PROGBITS: return "PROGBITS";
case SHT_SYMTAB: return "SYMTAB";
case SHT_STRTAB: return "STRTAB";
case SHT_RELA: return "RELA";
case SHT_HASH: return "HASH";
case SHT_DYNAMIC: return "DYNAMIC";
case SHT_NOTE: return "NOTE";
case SHT_NOBITS: return "NOBITS";
case SHT_REL: return "REL";
case SHT_SHLIB: return "SHLIB";
case SHT_DYNSYM: return "DYNSYM";
case SHT_INIT_ARRAY: return "INIT_ARRAY";
case SHT_FINI_ARRAY: return "FINI_ARRAY";
case SHT_PREINIT_ARRAY: return "PREINIT_ARRAY";
case SHT_GROUP: return "GROUP";
case SHT_SYMTAB_SHNDX: return "SYMTAB_SHNDX";
case SHT_RELR: return "RELR";
case SHT_NUM: return "NUM";
case SHT_LOOS: return "LOOS";
case SHT_GNU_ATTRIBUTES: return "GNU_ATTRIBUTES";
case SHT_GNU_HASH: return "GNU_HASH";
case SHT_GNU_LIBLIST: return "GNU_LIBLIST";
case SHT_CHECKSUM: return "CHECKSUM";
case SHT_LOSUNW: return "LOSUNW";
case SHT_SUNW_COMDAT: return "SUNW_COMDAT";
case SHT_SUNW_syminfo: return "SUNW_syminfo";
case SHT_GNU_verdef: return "GNU_verdef";
case SHT_GNU_verneed: return "GNU_verneed";
case SHT_GNU_versym: return "GNU_versym";
case SHT_LOPROC: return "LOPROC";
case SHT_HIPROC: return "HIPROC";
case SHT_LOUSER: return "LOUSER";
case SHT_HIUSER: return "HIUSER";
default: return "UNKNOWN";
}
}
const char* getSectionFlagStr(Elf_Word flags) {
switch (flags) {
case SHF_ALLOC: return " A";
case SHF_WRITE: return " W";
case SHF_WRITE | SHF_ALLOC: return " WA";
case SHF_EXECINSTR: return " X";
case SHF_ALLOC | SHF_EXECINSTR: return " AX";
case SHF_MASKPROC: return "MKP";
default: return " ";
}
}
void printElfSectionHeader32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,const char* pStringTable) {
printf("ELF Section Headers:\n");
printf("\t[Nr] Name\t\t\tType\t\t\tAddr\t\tOffset\t\tSize\t\tEntSize\tFlag\tLink\tInfo\tAlign\n");
for (int i = 0; i < sectionNum; i++) {
printf("\t[%2d] %-20s", i, (char *) &pStringTable[pSectionHeader[i].sh_name]);
printf("\t%-16s", getSectionTypeString(pSectionHeader[i].sh_type));
printf("\t%08x", pSectionHeader[i].sh_addr);
printf("\t%08x", pSectionHeader[i].sh_offset);
printf("\t%08x", pSectionHeader[i].sh_size);
printf("\t%x", pSectionHeader[i].sh_entsize);
printf("\t%s", getSectionFlagStr(pSectionHeader[i].sh_flags));
printf("\t%x", pSectionHeader[i].sh_link);
printf("\t%x", pSectionHeader[i].sh_info);
printf("\t%x\n", pSectionHeader[i].sh_addralign);
}
printf("ELF Section Headers End\n");
}

打印结果如下


Program Header

程序头表用于描述ELF文件如何映射到内存中,用段(segment)表示。

定义如下:

typedef struct
{
Elf32_Word p_type; /* Segment type */
Elf32_Off p_offset; /* Segment file offset */
Elf32_Addr p_vaddr; /* Segment virtual address */
Elf32_Addr p_paddr; /* Segment physical address */
Elf32_Word p_filesz; /* Segment size in file */
Elf32_Word p_memsz; /* Segment size in memory */
Elf32_Word p_flags; /* Segment flags */
Elf32_Word p_align; /* Segment alignment */
} Elf32_Phdr;

typedef struct
{
Elf64_Word p_type; /* Segment type */
Elf64_Word p_flags; /* Segment flags */
Elf64_Off p_offset; /* Segment file offset */
Elf64_Addr p_vaddr; /* Segment virtual address */
Elf64_Addr p_paddr; /* Segment physical address */
Elf64_Xword p_filesz; /* Segment size in file */
Elf64_Xword p_memsz; /* Segment size in memory */
Elf64_Xword p_align; /* Segment alignment */
} Elf64_Phdr;

p_type

指定了程序头描述的段类型(或如何解析本程序头的信息)

段类型如下:

/* Legal values for p_type (segment type).  */

#define PT_NULL 0 /* Program header table entry unused */
#define PT_LOAD 1 /* Loadable program segment */
#define PT_DYNAMIC 2 /* Dynamic linking information */
#define PT_INTERP 3 /* Program interpreter */
#define PT_NOTE 4 /* Auxiliary information */
#define PT_SHLIB 5 /* Reserved */
#define PT_PHDR 6 /* Entry for header table itself */
#define PT_TLS 7 /* Thread-local storage segment */
#define PT_NUM 8 /* Number of defined types */
#define PT_LOOS 0x60000000 /* Start of OS-specific */
#define PT_GNU_EH_FRAME 0x6474e550 /* GCC .eh_frame_hdr segment */
#define PT_GNU_STACK 0x6474e551 /* Indicates stack executability */
#define PT_GNU_RELRO 0x6474e552 /* Read-only after relocation */
#define PT_GNU_PROPERTY 0x6474e553 /* GNU property */
#define PT_GNU_SFRAME 0x6474e554 /* SFrame segment. */
#define PT_LOSUNW 0x6ffffffa
#define PT_SUNWBSS 0x6ffffffa /* Sun Specific segment */
#define PT_SUNWSTACK 0x6ffffffb /* Stack segment */
#define PT_HISUNW 0x6fffffff
#define PT_HIOS 0x6fffffff /* End of OS-specific */
#define PT_LOPROC 0x70000000 /* Start of processor-specific */
#define PT_HIPROC 0x7fffffff /* End of processor-specific */

p_offset

段的文件偏移值

p_vaddr

段的内存虚拟地址

p_paddr

段的内存物理地址, 由于多数现代操作系统的设计不可预知段的物理地址,故该字段多数情况下保留

p_filesz

段的文件大小

p_memsz

段的内存大小

p_flags

段的属性

/* Legal values for p_flags (segment flags).  */

#define PF_X (1 << 0) /* Segment is executable */ //可读
#define PF_W (1 << 1) /* Segment is writable */ //可写
#define PF_R (1 << 2) /* Segment is readable */ //可执行
#define PF_MASKOS 0x0ff00000 /* OS-specific */ //系统指定
#define PF_MASKPROC 0xf0000000 /* Processor-specific */ //进程指定

p_align

段的内存对齐值

打印段表头

// Print ELF Program Headers
const char *getSegmentTypeStr(Elf32_Word segmentType) {
switch (segmentType) {
case PT_NULL:return "NULL";
case PT_LOAD: return "LOAD";
case PT_DYNAMIC: return "DYNAMIC";
case PT_INTERP:return "INTERP";
case PT_NOTE: return "NOTE";
case PT_SHLIB:return "SHLIB";
case PT_PHDR: return "PHDR";
case PT_TLS:return "TLS";
case PT_NUM: return "PT_NUM";
case PT_LOOS:return "LOOS";
case PT_GNU_EH_FRAME: return "GNU_EH_FRAME";
case PT_GNU_STACK:return "GNU_STACK";
case PT_GNU_RELRO: return "GNU_RELRO";
case PT_GNU_PROPERTY: return "GNU_PROPERTY";
case PT_GNU_SFRAME: return "GNU_SFRAME";
case PT_SUNWBSS: return "SUNWBSS";
case PT_SUNWSTACK: return "SUNWSTACK";
case PT_HIOS: return "HIOS";
case PT_LOPROC: return "LOPROC";
case PT_HIPROC: return "HIPROC";
default: return "UNKNOWN";
}
}
const char* getSegmentFlagStr(Elf_Word segmentFlags) {
static char segmentFlagStr[5] = " ";
int count = 0;
if (segmentFlags & PF_R) {
segmentFlagStr[count++] = 'R';
}
if (segmentFlags & PF_W) {
segmentFlagStr[count++] = 'W';
}
if (segmentFlags & PF_X) {
segmentFlagStr[count++] = 'X';
}
return segmentFlagStr;
}
void printElfProgramHeader32(const Elf32_Phdr *pProgramHeader,Elf_Half segmentNum,const uint8_t* pFileBuffer) {
printf("ELF ProgramHeader:\n");
printf("\t[Nr] Type\t\tFileOff\t\tVirAddr\t\tPhyAddr\t\tFileSize\tMemSize\t\tFlag\tAlign\n");
for (int i = 0; i < segmentNum; i++) {
printf("\t[%02d] %-16s", i, getSegmentTypeStr(pProgramHeader[i].p_type));
printf("\t%08x", pProgramHeader[i].p_offset);
printf("\t%08x", pProgramHeader[i].p_vaddr);
printf("\t%08x", pProgramHeader[i].p_paddr);
printf("\t%08x", pProgramHeader[i].p_filesz);
printf("\t%08x", pProgramHeader[i].p_memsz);
printf("\t%#4s", getSegmentFlagStr(pProgramHeader[i].p_flags));
printf("\t%#x\n", pProgramHeader[i].p_align);
if (pProgramHeader[i].p_type == PT_INTERP) {
printf("\t\t [Request Program Interpreter Path: %s]\n",(char *) (pFileBuffer + pProgramHeader[i].p_offset));
}
}
printf("ELF ProgramHeader End\n");
}
// print segment mapping
void printSectionToSegmentMapping32(const Elf32_Phdr* pProgramHeader,const Elf32_Shdr* pSectionHeader,Elf_Half segmentNum,Elf_Half sectionNum,const char* pSectionHeaderStringTable) {
printf("Segtion to Segment Mapping:\n");
printf("\tSegment\tSections\n");
//Traverse program headers
for (int i = 0; i < segmentNum; i++) {
Elf32_Addr segmentStartAddr = pProgramHeader[i].p_vaddr;
Elf32_Addr segmentEndAddr = segmentStartAddr + pProgramHeader[i].p_memsz;
printf("\t%02d\t\t", i);
//Traverse section headers
for (int j = 0; j < sectionNum; j++) {
Elf32_Addr sectionStartAddr = pSectionHeader[j].sh_addr;
//Check whether the start addr of a section is in the segment addr
if (sectionStartAddr >= segmentStartAddr && sectionStartAddr < segmentEndAddr) {
//SHF_ALLOC means need alloc memory, some control sections don't need mapping to memory
if (pSectionHeader[j].sh_flags & SHF_ALLOC) {
printf("%s ",(char *) pSectionHeaderStringTable + pSectionHeader[j].sh_name);
}
}
}
printf("\n");
}
}

打印结果如下


特殊节

ELF 文件中有一些特定的节是预定义好的,其内容是指令代码或者控制信息。

这些节专门为操作系统使用,对于不同的操作系统,这些节的类型和属性有所不同。

节名作用
.text代码段
.data保存已经初始化的全局变量和局部静态变量
.bss保存未初始化的全局变量和局部静态变量
.rodata存放只读数据, 例如常量字符串
.comment编译器版本信息
.debug调试信息
.dynamic动态链接信息, linker解析该段以加载elf文件
.hash符号哈希表 (可查导入和导出符号)
.gnu.hashGNU哈希表 (只可查导出符号,导出表)
.line调试行号表 即源代码行号与编译后指令的对应表
.note额外的编译器信息 例如公司名,版本号
.rel.dyn动态链接重定位表 存放全局变量重定位项
.rel.plt动态链接函数跳转重定位表 存放plt重定位项
.symtab符号表
.dynsym动态链接符号表
.strtab字符串表
.shstrtab节名表
.dynstr动态链接字符串表
.plt动态链接跳转表
.got动态链接全局偏移表
.init程序初始化代码段(节)
.fini程序结束代码段(节)


String Table

ELF文件中有很多字符串,例如段名,变量名等,由于字符串长度往往不固定,所以使用固定结构描述比较困难。

常见做法是将字符串集中起来存放到一张字符串表,然后通过索引查表来引用字符串

常见的有:

1..strtab(字符串表,保存普通字符串)

2..shstrtab(段表字符串表,保存段表用到的字符串)

打印代码如下:

// Print String Table
void printStringTable32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,const char* pSectionHeaderStringTable,const uint8_t* pFileBuffer) {
//Traverse the section header table then find string table
printf("ELF String Table:\n");
for (int i = 0; i < sectionNum; i++) {
//not only just one string table such as .dynstr .strtab
if (pSectionHeader[i].sh_type == SHT_STRTAB) {
printf("\t==========String Table %s==========\n",getSectionName(pSectionHeaderStringTable,pSectionHeader[i].sh_name));
char *pStringTable = (char *) (pFileBuffer + pSectionHeader[i].sh_offset);
Elf32_Word stringTableSize = pSectionHeader[i].sh_size, pos = 0;

//遍历字符串表, 遇到0时pos+1打印字符串, 非0时继续搜索
while (pos < stringTableSize) {
if (pStringTable[pos] == 0) {
pos += 1;
printf("\t%s\n", pStringTable + pos);
} else {
//find zero
while (pStringTable[pos] != 0) {
pos++;
}
}
}
}
}
printf("ELF String Table End\n");
}


Symbol Table

符号表的作用是描述导入和导出符号,这里的符号可以是全局变量,函数,外部引用等

通过符号表和对应的字符串表可以得到符号名,符号大小,符号地址等信息。

.dynsym //动态链接符号表
.symtab //符号表

.dynstr //动态链接符号表的字符串表
.strtab //符号表的字符串表

符号表表项结构

typedef struct
{
Elf32_Word st_name; /* Symbol name (string tbl index) */
Elf32_Addr st_value; /* Symbol value */
Elf32_Word st_size; /* Symbol size */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility */
Elf32_Section st_shndx; /* Section index */
} Elf32_Sym;

typedef struct
{
Elf64_Word st_name; /* Symbol name (string tbl index) */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility */
Elf64_Section st_shndx; /* Section index */
Elf64_Addr st_value; /* Symbol value */
Elf64_Xword st_size; /* Symbol size */
} Elf64_Sym;

st_name

符号名, 字符串表的索引下标, 节表的sh_link说明了是在哪个字符串表中

st_value

符号对应的值, 和符号有关, 可能是绝对值,也可能是一个地址, 不同符号的含义不同

st_size

符号大小, 对于包含数据的符号, 是该数据类型的大小

例如一个double型的符号占用8字节,如果该值为0表示符号大小为0或未知

st_info

符号的类型和属性,高4bit标识了符号绑定(symbol binding), 低4bit标识了符号类型(symbol type),组成符号信息(symbol information)

有3个宏分别读取这三个属性值

/* How to extract and insert information held in the st_info field.  */

#define ELF32_ST_BIND(val) (((unsigned char) (val)) >> 4)
#define ELF32_ST_TYPE(val) ((val) & 0xf)
#define ELF32_ST_INFO(bind, type) (((bind) << 4) + ((type) & 0xf))

Symbol Binding

符号绑定的合法属性如下

/* Legal values for ST_BIND subfield of st_info (symbol binding).  */

#define STB_LOCAL 0 /* Local symbol */
#define STB_GLOBAL 1 /* Global symbol */
#define STB_WEAK 2 /* Weak symbol */
#define STB_NUM 3 /* Number of defined types. */
#define STB_LOOS 10 /* Start of OS-specific */
#define STB_GNU_UNIQUE 10 /* Unique symbol. */
#define STB_HIOS 12 /* End of OS-specific */
#define STB_LOPROC 13 /* Start of processor-specific */
#define STB_HIPROC 15 /* End of processor-specific */

几个重要属性解释如下:

1.STB_LOCAL

2.STB_GLOBAL

3.STB_WEAK

4.STB_LOPROC~STB_HIPROC

Symbol Type

/* Legal values for ST_TYPE subfield of st_info (symbol type).  */

#define STT_NOTYPE 0 /* Symbol type is unspecified */
#define STT_OBJECT 1 /* Symbol is a data object */
#define STT_FUNC 2 /* Symbol is a code object */
#define STT_SECTION 3 /* Symbol associated with a section */
#define STT_FILE 4 /* Symbol's name is file name */
#define STT_COMMON 5 /* Symbol is a common data object */
#define STT_TLS 6 /* Symbol is thread-local data object*/
#define STT_NUM 7 /* Number of defined types. */
#define STT_LOOS 10 /* Start of OS-specific */
#define STT_GNU_IFUNC 10 /* Symbol is indirect code object */
#define STT_HIOS 12 /* End of OS-specific */
#define STT_LOPROC 13 /* Start of processor-specific */
#define STT_HIPROC 15 /* End of processor-specific */

几个重要符号解析如下

1.STT_NOTYPE

2.STT_OBJECT

3.STT_FUNC

4.STT_SECTION

5.STT_FILE

6.STT_LOPROC~STT_HIPROC

st_other

低2位保存了符号可见性

st_shndx

符号所在的段

打印符号表

// Print Symbol Table
const char *getSymbolBindingString(uint8_t symbolBinding) {
switch (symbolBinding) {
case STB_LOCAL: return "LOCAL";
case STB_GLOBAL: return "GLOBAL";
case STB_WEAK: return "WEAK";
case STB_NUM: return "STB_NUM";
case STB_GNU_UNIQUE: return "GNU_UNIQUE";
case STB_HIOS: return "STB_HIOS";
case STB_LOPROC: return "STB_LOPROC";
case STB_HIPROC: return "STB_HIPROC";
default: return "UNKNOWN";
}
}
const char *getSymbolTypeString(uint8_t symbolType) {
switch (symbolType) {
case STT_NOTYPE: return "NOTYPE";
case STT_OBJECT: return "OBJECT";
case STT_FUNC: return "FUNC";
case STT_SECTION: return "SECTION";
case STT_FILE: return "FILE";
case STT_COMMON: return "COMMON";
case STT_TLS: return "TLS";
case STT_NUM: return "STT_NUM";
case STT_GNU_IFUNC: return "GNU_IFUNC";
case STT_HIOS: return "HIOS";
case STT_LOPROC: return "LOPROC";
case STT_HIPROC: return "HIPROC";
default: return "UNKNOWN";
}
}
const char *getSymbolVisibility(uint8_t st_other) {
unsigned char visibility = st_other & 0x03;
switch (visibility) {
case 0: return "DEFAULT";
case 1: return "INTERNAL";
case 2: return "HIDDEN";
case 3: return "PROTECTED";
default: return "UNKNOWN";
}
}

void printSymbolTable32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,const char* pSectionHeaderStringTable,const uint8_t* pFileBuffer) {
printf("ELF Symbol Tables:\n");
for (int i = 0; i < sectionNum; i++) {
//全局静态符号表和动态符号表
if (pSectionHeader[i].sh_type == SHT_SYMTAB || pSectionHeader[i].sh_type == SHT_DYNSYM) {
Elf32_Word symbolNum = pSectionHeader[i].sh_size / pSectionHeader[i].sh_entsize;
//获取符号表对应的字符串表,全局静态符号和动态符号表对应字符串表可能不同 sh_link is index of string table, fileBuffer+offset is real string table
char* pSymbolNameTable =(char*) pFileBuffer + pSectionHeader[pSectionHeader[i].sh_link].sh_offset;
printf("\tSymbol Table '%s' contains %#x entries:\n",(char*)getSectionName(pSectionHeaderStringTable,pSectionHeader[i].sh_name), symbolNum);
printf("\tNum \tValue\t\tSize\t\tType\t\tBind\t\tVisible\t\tIndex\t\tName\n");
Elf32_Sym *pSymbolTable = (Elf32_Sym *) (pFileBuffer + pSectionHeader[i].sh_offset);
for (int j = 0; j < symbolNum; j++) {
printf("\t%04d", j);
printf("\t%08x", pSymbolTable[j].st_value);
printf("\t%08x", pSymbolTable[j].st_size);
//symbol type and binding
printf("\t%s\t", getSymbolTypeString(ELF32_ST_TYPE(pSymbolTable[j].st_info)));
printf("\t%s\t", getSymbolBindingString(ELF32_ST_BIND(pSymbolTable[j].st_info)));
printf("\t%-10s", getSymbolVisibility(pSymbolTable[j].st_other));
if (pSymbolTable[j].st_shndx == SHN_UNDEF) {
printf("\t%4s\t", "UDEF");
} else if (pSymbolTable[j].st_shndx == SHN_ABS) {
printf("\t%4s\t", "ABS");
} else {
printf("\t%04x\t", pSymbolTable[j].st_shndx);
}
printf("\t%s\n", pSymbolNameTable + pSymbolTable[j].st_name);
}
printf("\n");
}
}
}


Relocation Table

一般有两张重定位表:

1..rel.plt修复外部函数地址

2..rel.dyn修复全局变量地址

重定位表有SHT_REL, SHT_RELA, SHT_RELR三种类型,对应表项定义如下。

注: Intel x86架构只使用REL重定位项, x64架构似乎只使用RELA重定位项, 在后续修复重定位表可以得知:

/* Relocation table entry without addend (in section of type SHT_REL).  */

typedef struct
{
Elf32_Addr r_offset; /* Address */
Elf32_Word r_info; /* Relocation type and symbol index */
} Elf32_Rel;

typedef struct
{
Elf64_Addr r_offset; /* Address */
Elf64_Xword r_info; /* Relocation type and symbol index */
} Elf64_Rel;

/* Relocation table entry with addend (in section of type SHT_RELA). */

typedef struct
{
Elf32_Addr r_offset; /* Address */
Elf32_Word r_info; /* Relocation type and symbol index */
Elf32_Sword r_addend; /* Addend */
} Elf32_Rela;

typedef struct
{
Elf64_Addr r_offset; /* Address */
Elf64_Xword r_info; /* Relocation type and symbol index */
Elf64_Sxword r_addend; /* Addend */
} Elf64_Rela;

/* RELR relocation table entry */

typedef Elf32_Word Elf32_Relr;
typedef Elf64_Xword Elf64_Relr;

r_offset

重定位的位置

对于重定位文件而言,该值是待重定位单元在节中的偏移量

对于可执行文件或链接库文件而言,该值是待重定位单元的虚拟地址

r_info

给出了待重定位单元的符号表索引和重定位类型

获取信息的宏

SYM获取高24/32位, 是符号表索引,指明符号

TYPE获取低8/32位, 是重定位类型

/* How to extract and insert information held in the r_info field.  */

#define ELF32_R_SYM(val) ((val) >> 8)
#define ELF32_R_TYPE(val) ((val) & 0xff)
#define ELF32_R_INFO(sym, type) (((sym) << 8) + ((type) & 0xff))

#define ELF64_R_SYM(i) ((i) >> 32)
#define ELF64_R_TYPE(i) ((i) & 0xffffffff)
#define ELF64_R_INFO(sym,type) ((((Elf64_Xword) (sym)) << 32) + (type))

r_addend

指定加数,用于计算需要重定位的域的值

Rela使用该字段显式地指出加数,Rel的加数隐含在被修改的位置中

一个重定位节(Relocation Section)需要引用另外两个节: 符号表和待修复节

重定位节节头的sh_info和sh_link分别指明了引用关系

不同目标文件中,重定位项的r_offset成员含义略有不同

1.重定位文件

2.可执行文件/共享目标文件

重定位类型

重定位项用于描述如何修改以下的指令和数据域(被重定位域)

定义以下几种运算符号便于描述

常见重定位类型如下

R_386_GOT_DAT

将指定的符号地址设置为一个GOT表项

修复方法: elf加载后, 填入符号对应真实地址

R_386_JMP_SLOT

用于动态链接的PLT表项

修复方法: elf加载后, 修改跳转地址为符号地址

R_386_RELATIVE

相对偏移地址重定位

修复方法: 将offset指出的位置解引用,加上elf加载的基地址

全部的intel x86架构重定位类型如下

/* Intel 80386 specific definitions.  */

/* i386 relocs. */

#define R_386_NONE 0 /* No reloc */
#define R_386_32 1 /* Direct 32 bit */
#define R_386_PC32 2 /* PC relative 32 bit */
#define R_386_GOT32 3 /* 32 bit GOT entry */
#define R_386_PLT32 4 /* 32 bit PLT address */
#define R_386_COPY 5 /* Copy symbol at runtime */
#define R_386_GLOB_DAT 6 /* Create GOT entry */
#define R_386_JMP_SLOT 7 /* Create PLT entry */
#define R_386_RELATIVE 8 /* Adjust by program base */
#define R_386_GOTOFF 9 /* 32 bit offset to GOT */
#define R_386_GOTPC 10 /* 32 bit PC relative offset to GOT */
#define R_386_32PLT 11
#define R_386_TLS_TPOFF 14 /* Offset in static TLS block */
#define R_386_TLS_IE 15 /* Address of GOT entry for static TLS
block offset */

#define R_386_TLS_GOTIE 16 /* GOT entry for static TLS block
offset */

#define R_386_TLS_LE 17 /* Offset relative to static TLS
block */

#define R_386_TLS_GD 18 /* Direct 32 bit for GNU version of
general dynamic thread local data */

#define R_386_TLS_LDM 19 /* Direct 32 bit for GNU version of
local dynamic thread local data
in LE code */

#define R_386_16 20
#define R_386_PC16 21
#define R_386_8 22
#define R_386_PC8 23
#define R_386_TLS_GD_32 24 /* Direct 32 bit for general dynamic
thread local data */

#define R_386_TLS_GD_PUSH 25 /* Tag for pushl in GD TLS code */
#define R_386_TLS_GD_CALL 26 /* Relocation for call to
__tls_get_addr() */

#define R_386_TLS_GD_POP 27 /* Tag for popl in GD TLS code */
#define R_386_TLS_LDM_32 28 /* Direct 32 bit for local dynamic
thread local data in LE code */

#define R_386_TLS_LDM_PUSH 29 /* Tag for pushl in LDM TLS code */
#define R_386_TLS_LDM_CALL 30 /* Relocation for call to
__tls_get_addr() in LDM code */

#define R_386_TLS_LDM_POP 31 /* Tag for popl in LDM TLS code */
#define R_386_TLS_LDO_32 32 /* Offset relative to TLS block */
#define R_386_TLS_IE_32 33 /* GOT entry for negated static TLS
block offset */

#define R_386_TLS_LE_32 34 /* Negated offset relative to static
TLS block */

#define R_386_TLS_DTPMOD32 35 /* ID of module containing symbol */
#define R_386_TLS_DTPOFF32 36 /* Offset in TLS block */
#define R_386_TLS_TPOFF32 37 /* Negated offset in static TLS block */
#define R_386_SIZE32 38 /* 32-bit symbol size */
#define R_386_TLS_GOTDESC 39 /* GOT offset for TLS descriptor. */
#define R_386_TLS_DESC_CALL 40 /* Marker of call through TLS
descriptor for
relaxation. */

#define R_386_TLS_DESC 41 /* TLS descriptor containing
pointer to code and to
argument, returning the TLS
offset for the symbol. */

#define R_386_IRELATIVE 42 /* Adjust indirectly by program base */
#define R_386_GOT32X 43 /* Load from 32 bit GOT entry,
relaxable. */

/* Keep this the last entry. */
#define R_386_NUM 44

x64重定位类型定义如下

/* AMD x86-64 relocations.  */
#define R_X86_64_NONE 0 /* No reloc */
#define R_X86_64_64 1 /* Direct 64 bit */
#define R_X86_64_PC32 2 /* PC relative 32 bit signed */
#define R_X86_64_GOT32 3 /* 32 bit GOT entry */
#define R_X86_64_PLT32 4 /* 32 bit PLT address */
#define R_X86_64_COPY 5 /* Copy symbol at runtime */
#define R_X86_64_GLOB_DAT 6 /* Create GOT entry */
#define R_X86_64_JUMP_SLOT 7 /* Create PLT entry */
#define R_X86_64_RELATIVE 8 /* Adjust by program base */
#define R_X86_64_GOTPCREL 9 /* 32 bit signed PC relative
offset to GOT */

#define R_X86_64_32 10 /* Direct 32 bit zero extended */
#define R_X86_64_32S 11 /* Direct 32 bit sign extended */
#define R_X86_64_16 12 /* Direct 16 bit zero extended */
#define R_X86_64_PC16 13 /* 16 bit sign extended pc relative */
#define R_X86_64_8 14 /* Direct 8 bit sign extended */
#define R_X86_64_PC8 15 /* 8 bit sign extended pc relative */
#define R_X86_64_DTPMOD64 16 /* ID of module containing symbol */
#define R_X86_64_DTPOFF64 17 /* Offset in module's TLS block */
#define R_X86_64_TPOFF64 18 /* Offset in initial TLS block */
#define R_X86_64_TLSGD 19 /* 32 bit signed PC relative offset
to two GOT entries for GD symbol */

#define R_X86_64_TLSLD 20 /* 32 bit signed PC relative offset
to two GOT entries for LD symbol */

#define R_X86_64_DTPOFF32 21 /* Offset in TLS block */
#define R_X86_64_GOTTPOFF 22 /* 32 bit signed PC relative offset
to GOT entry for IE symbol */

#define R_X86_64_TPOFF32 23 /* Offset in initial TLS block */
#define R_X86_64_PC64 24 /* PC relative 64 bit */
#define R_X86_64_GOTOFF64 25 /* 64 bit offset to GOT */
#define R_X86_64_GOTPC32 26 /* 32 bit signed pc relative
offset to GOT */

#define R_X86_64_GOT64 27 /* 64-bit GOT entry offset */
#define R_X86_64_GOTPCREL64 28 /* 64-bit PC relative offset
to GOT entry */

#define R_X86_64_GOTPC64 29 /* 64-bit PC relative offset to GOT */
#define R_X86_64_GOTPLT64 30 /* like GOT64, says PLT entry needed */
#define R_X86_64_PLTOFF64 31 /* 64-bit GOT relative offset
to PLT entry */

#define R_X86_64_SIZE32 32 /* Size of symbol plus 32-bit addend */
#define R_X86_64_SIZE64 33 /* Size of symbol plus 64-bit addend */
#define R_X86_64_GOTPC32_TLSDESC 34 /* GOT offset for TLS descriptor. */
#define R_X86_64_TLSDESC_CALL 35 /* Marker for call through TLS
descriptor. */

#define R_X86_64_TLSDESC 36 /* TLS descriptor. */
#define R_X86_64_IRELATIVE 37 /* Adjust indirectly by program base */
#define R_X86_64_RELATIVE64 38 /* 64-bit adjust by program base */
/* 39 Reserved was R_X86_64_PC32_BND */
/* 40 Reserved was R_X86_64_PLT32_BND */
#define R_X86_64_GOTPCRELX 41 /* Load from 32 bit signed pc relative
offset to GOT entry without REX
prefix, relaxable. */

#define R_X86_64_REX_GOTPCRELX 42 /* Load from 32 bit signed pc relative
offset to GOT entry with REX prefix,
relaxable. */

#define R_X86_64_NUM 43

打印重定位表

// Print Relocation Table
const char *getRelocationTypeString32(Elf_Word value) {
switch (value) {
case R_386_NONE: return "R_386_NONE";
case 1: return "R_386_32";
case 2: return "R_386_PC32";
case 3: return "R_386_GOT32";
case 4: return "R_386_PLT32";
case 5: return "R_386_COPY";
case 6: return "R_386_GLOB_DAT";
case 7: return "R_386_JMP_SLOT";
case 8: return "R_386_RELATIVE";
case 9: return "R_386_GOTOFF";
case 10: return "R_386_GOTPC";
case 11: return "R_386_32PLT";
case 14: return "R_386_TLS_TPOFF";
case 15: return "R_386_TLS_IE";
case 16: return "R_386_TLS_GOTIE";
case 17: return "R_386_TLS_LE";
case 18: return "R_386_TLS_GD";
case 19: return "R_386_TLS_LDM";
case 20: return "R_386_16";
case 21: return "R_386_PC16";
case 22: return "R_386_8";
case 23: return "R_386_PC8";
case 24: return "R_386_TLS_GD_32";
case 25: return "R_386_TLS_GD_PUSH";
case 26: return "R_386_TLS_GD_CALL";
case 27: return "R_386_TLS_GD_POP";
case 28: return "R_386_TLS_LDM_32";
case 29: return "R_386_TLS_LDM_PUSH";
case 30: return "R_386_TLS_LDM_CALL";
case 31: return "R_386_TLS_LDM_POP";
case 32: return "R_386_TLS_LDO_32";
case 33: return "R_386_TLS_IE_32";
case 34: return "R_386_TLS_LE_32";
case 35: return "R_386_TLS_DTPMOD32";
case 36: return "R_386_TLS_DTPOFF32";
case 37: return "R_386_TLS_TPOFF32";
case 38: return "R_386_SIZE32";
case 39: return "R_386_TLS_GOTDESC";
case 40: return "R_386_TLS_DESC_CALL";
case 41: return "R_386_TLS_DESC";
case 42: return "R_386_IRELATIVE";
case 43: return "R_386_GOT32X";
default: return "Unknown relocation type";
}
}
void printRelocationTable32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,uint8_t* pFileBuffer,const char* pSectionHeaderStringTable) {
printf("Relocation Tables:\n");
for (int i = 0; i < sectionNum; i++) {
if (pSectionHeader[i].sh_type == SHT_REL) {
Elf32_Shdr *pRelocationTableHeader = &pSectionHeader[i];
Elf32_Rel *pRelocationTable = (Elf32_Rel *) (pFileBuffer + pRelocationTableHeader->sh_offset);
Elf32_Word relocItemNum = pRelocationTableHeader->sh_size / pRelocationTableHeader->sh_entsize;
// relocation table sh_link is index of symbol table header
Elf32_Shdr *pSymbolTableHeader = (Elf32_Shdr *) &pSectionHeader[pSectionHeader[i].sh_link];
//real symbol table
Elf32_Sym *pSymbolTable = (Elf32_Sym *) (pFileBuffer + pSymbolTableHeader->sh_offset);
//string table for symbol name
char *pSymbolTableStringTable = (char *) pFileBuffer + pSectionHeader[pSymbolTableHeader->sh_link].sh_offset;

printf("Relocation Section '%s' at offset contains %d entries\n",(char*) pSectionHeaderStringTable + pSectionHeader[i].sh_name, relocItemNum);
printf("\tOffset\t\tInfo\t\tType\t\t\t\tSym.value\t\tSym.name\n");
for (int j = 0; j < relocItemNum; j++) {
printf("\t%08x", pRelocationTable[j].r_offset);
printf("\t%08x", pRelocationTable[j].r_info);
printf("\t%s\t", getRelocationTypeString32(ELF32_R_TYPE(pRelocationTable[j].r_info)));
printf("\t%08x\t", pSymbolTable[ELF32_R_SYM(pRelocationTable[j].r_info)].st_value);
//R_SYM get the index of symbol in symbol table, st_name is index of symbol name in string table
printf("\t%s", &pSymbolTableStringTable[pSymbolTable[ELF32_R_SYM(pRelocationTable[j].r_info)].st_name]);
printf("\n");
}
}
}
}

修复重定位表

r_offset指定了待修复的地址,这是一个RVA, 需要将该地址存储的数据加上elf文件加载的基地址

例如readelf读取的重定位表信息如下

Relocation section '.rel.dyn' at offset 0x384 contains 8 entries:
Offset Info Type Sym.Value Sym. Name
00003ee8 00000008 R_386_RELATIVE
00003eec 00000008 R_386_RELATIVE
00003fec 00000008 R_386_RELATIVE
0000400c 00000008 R_386_RELATIVE
00003fe0 00000206 R_386_GLOB_DAT 00000000 _ITM_deregisterTM[...]
00003fe4 00000306 R_386_GLOB_DAT 00000000 __cxa_finalize@GLIBC_2.1.3
00003fe8 00000506 R_386_GLOB_DAT 00000000 __gmon_start__
00003ff0 00000606 R_386_GLOB_DAT 00000000 _ITM_registerTMCl[...]

Relocation section '.rel.plt' at offset 0x3c4 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00004000 00000107 R_386_JUMP_SLOT 00000000 __libc_start_main@GLIBC_2.34
00004004 00000407 R_386_JUMP_SLOT 00000000 puts@GLIBC_2.0
No processor specific unwind information to decode

3ee8和3eec分别在init_array和fini_array段,均为RELATIVE类型重定位项

3fec, 3fe0,3fe4,3fe8,3ff0是GOT表项, 其中3fec (main_ptr) 是RELATIVE类型,其他均为GLOB_DAT类型

表项填充的函数为虚拟extern段中函数的地址,该段在内存中实际不存在

4000,4004是plt表项, 均为JUMP_SLOT类型, 400c是dso_handle, 为RELATIVE类型

got.plt表填充的也是外部函数地址,在虚拟extern段

在elf文件末尾,ida自动追加extern段(该段在内存中不存在,仅供分析)

综上所述,重定位有以下情况:

1.将待重定位地址处的内容解引用并加上elf加载的基地址即可

2.加载动态库,写入外部函数地址


Dynamic Segment

如果目标文件参与动态链接,必定包含一个类型为 PT_DYNAMIC 的Program表项, 对应节名为 .dynamic (type=SHT_DYNAMIC)

动态段的作用是提供动态链接器所需要的信息,比如依赖哪些共享库文件,动态链接符号表的位置,动态链接重定位表的位置等。

/* Dynamic section entry.  */
typedef struct
{
Elf32_Sword d_tag; /* Dynamic entry type */
union
{
Elf32_Word d_val; /* Integer value */
Elf32_Addr d_ptr; /* Address value */
} d_un;
} Elf32_Dyn;

typedef struct
{
Elf64_Sxword d_tag; /* Dynamic entry type */
union
{
Elf64_Xword d_val; /* Integer value */
Elf64_Addr d_ptr; /* Address value */
} d_un;
} Elf64_Dyn;

d_tag

d_tag决定了如何对d_un解析

合法的d_tag值定义如下

/* Legal values for d_tag (dynamic entry type).  */

#define DT_NULL 0 /* Marks end of dynamic section */
#define DT_NEEDED 1 /* Name of needed library */
#define DT_PLTRELSZ 2 /* Size in bytes of PLT relocs */
#define DT_PLTGOT 3 /* Processor defined value */
#define DT_HASH 4 /* Address of symbol hash table */
#define DT_STRTAB 5 /* Address of string table */
#define DT_SYMTAB 6 /* Address of symbol table */
#define DT_RELA 7 /* Address of Rela relocs */
#define DT_RELASZ 8 /* Total size of Rela relocs */
#define DT_RELAENT 9 /* Size of one Rela reloc */
#define DT_STRSZ 10 /* Size of string table */
#define DT_SYMENT 11 /* Size of one symbol table entry */
#define DT_INIT 12 /* Address of init function */
#define DT_FINI 13 /* Address of termination function */
#define DT_SONAME 14 /* Name of shared object */
#define DT_RPATH 15 /* Library search path (deprecated) */
#define DT_SYMBOLIC 16 /* Start symbol search here */
#define DT_REL 17 /* Address of Rel relocs */
#define DT_RELSZ 18 /* Total size of Rel relocs */
#define DT_RELENT 19 /* Size of one Rel reloc */
#define DT_PLTREL 20 /* Type of reloc in PLT */
#define DT_DEBUG 21 /* For debugging; unspecified */
#define DT_TEXTREL 22 /* Reloc might modify .text */
#define DT_JMPREL 23 /* Address of PLT relocs */
#define DT_BIND_NOW 24 /* Process relocations of object */
#define DT_INIT_ARRAY 25 /* Array with addresses of init fct */
#define DT_FINI_ARRAY 26 /* Array with addresses of fini fct */
#define DT_INIT_ARRAYSZ 27 /* Size in bytes of DT_INIT_ARRAY */
#define DT_FINI_ARRAYSZ 28 /* Size in bytes of DT_FINI_ARRAY */
#define DT_RUNPATH 29 /* Library search path */
#define DT_FLAGS 30 /* Flags for the object being loaded */
#define DT_ENCODING 32 /* Start of encoded range */
#define DT_PREINIT_ARRAY 32 /* Array with addresses of preinit fct*/
#define DT_PREINIT_ARRAYSZ 33 /* size in bytes of DT_PREINIT_ARRAY */
#define DT_SYMTAB_SHNDX 34 /* Address of SYMTAB_SHNDX section */
#define DT_RELRSZ 35 /* Total size of RELR relative relocations */
#define DT_RELR 36 /* Address of RELR relative relocations */
#define DT_RELRENT 37 /* Size of one RELR relative relocaction */
#define DT_NUM 38 /* Number used */
#define DT_LOOS 0x6000000d /* Start of OS-specific */
#define DT_HIOS 0x6ffff000 /* End of OS-specific */
#define DT_LOPROC 0x70000000 /* Start of processor-specific */
#define DT_HIPROC 0x7fffffff /* End of processor-specific */
#define DT_PROCNUM DT_MIPS_NUM /* Most used by any processor */

/* DT_* entries which fall between DT_VALRNGHI & DT_VALRNGLO use the
Dyn.d_un.d_val field of the Elf*_Dyn structure. This follows Sun's
approach. */

#define DT_VALRNGLO 0x6ffffd00
#define DT_GNU_PRELINKED 0x6ffffdf5 /* Prelinking timestamp */
#define DT_GNU_CONFLICTSZ 0x6ffffdf6 /* Size of conflict section */
#define DT_GNU_LIBLISTSZ 0x6ffffdf7 /* Size of library list */
#define DT_CHECKSUM 0x6ffffdf8
#define DT_PLTPADSZ 0x6ffffdf9
#define DT_MOVEENT 0x6ffffdfa
#define DT_MOVESZ 0x6ffffdfb
#define DT_FEATURE_1 0x6ffffdfc /* Feature selection (DTF_*). */
#define DT_POSFLAG_1 0x6ffffdfd /* Flags for DT_* entries, effecting
the following DT_* entry. */

#define DT_SYMINSZ 0x6ffffdfe /* Size of syminfo table (in bytes) */
#define DT_SYMINENT 0x6ffffdff /* Entry size of syminfo */
#define DT_VALRNGHI 0x6ffffdff
#define DT_VALTAGIDX(tag) (DT_VALRNGHI - (tag)) /* Reverse order! */
#define DT_VALNUM 12

/* DT_* entries which fall between DT_ADDRRNGHI & DT_ADDRRNGLO use the
Dyn.d_un.d_ptr field of the Elf*_Dyn structure.

If any adjustment is made to the ELF object after it has been
built these entries will need to be adjusted. */


#define DT_ADDRRNGLO 0x6ffffe00
#define DT_GNU_HASH 0x6ffffef5 /* GNU-style hash table. */
#define DT_TLSDESC_PLT 0x6ffffef6
#define DT_TLSDESC_GOT 0x6ffffef7
#define DT_GNU_CONFLICT 0x6ffffef8 /* Start of conflict section */
#define DT_GNU_LIBLIST 0x6ffffef9 /* Library list */
#define DT_CONFIG 0x6ffffefa /* Configuration information. */
#define DT_DEPAUDIT 0x6ffffefb /* Dependency auditing. */
#define DT_AUDIT 0x6ffffefc /* Object auditing. */
#define DT_PLTPAD 0x6ffffefd /* PLT padding. */
#define DT_MOVETAB 0x6ffffefe /* Move table. */
#define DT_SYMINFO 0x6ffffeff /* Syminfo table. */
#define DT_ADDRRNGHI 0x6ffffeff
#define DT_ADDRTAGIDX(tag) (DT_ADDRRNGHI - (tag)) /* Reverse order! */
#define DT_ADDRNUM 11

/* The versioning entry types. The next are defined as part of the GNU extension. */
#define DT_VERSYM 0x6ffffff0

#define DT_RELACOUNT 0x6ffffff9
#define DT_RELCOUNT 0x6ffffffa

/* These were chosen by Sun. */
#define DT_FLAGS_1 0x6ffffffb /* State flags, see DF_1_* below. */
#define DT_VERDEF 0x6ffffffc /* Address of version definition table */
#define DT_VERDEFNUM 0x6ffffffd /* Number of version definitions */
#define DT_VERNEED 0x6ffffffe /* Address of table with needed versions */
#define DT_VERNEEDNUM 0x6fffffff /* Number of needed versions */
#define DT_VERSIONTAGIDX(tag) (DT_VERNEEDNUM - (tag)) /* Reverse order! */
#define DT_VERSIONTAGNUM 16

/* Sun added these machine-independent extensions in the "processor-specific"
range. Be compatible. */

#define DT_AUXILIARY 0x7ffffffd /* Shared object to load before self */
#define DT_FILTER 0x7fffffff /* Shared object to get values from */
#define DT_EXTRATAGIDX(tag) ((Elf32_Word)-((Elf32_Sword) (tag) <<1>>1)-1)
#define DT_EXTRANUM 3

DT_NEEDED

该tag对应的即为elf文件依赖的动态库文件,使用d_val解析后得到索引值

通过索引查找.dynstr即可得到链接库名

动态段的sh_link字段是指向动态链接字符串表的索引值

另外通过d_tag==DT_STRTAB解析对应的d_val可以得到.dynstr的文件偏移值

d_un

d_val 代表整数值

d_ptr 代表进程空间的虚拟地址

解析规则如下

名称d_un可执行文件共享目标文件
DT_NULL0忽略必需必需
DT_NEEDED1d_val可选可选
DT_PLTRELSZ2d_val可选可选
DT_PLTGOT3d_ptr可选可选
DT_HASH4d_ptr必需必需
DT_STRTAB5d_ptr必需必需
DT_SYMTAB6d_ptr必需必需
DT_RELA7d_ptr必需可选
DT_RELASZ8d_val必需可选
DT_RELAENT9d_val必需可选
DT_STRSZ10d_val必需必需
DT_SYMENT11d_val必需必需
DT_INIT12d_ptr可选可选
DT_FINI13d_ptr可选可选
DT_SONAME14d_val忽略可选
DT_RPATH15d_val可选忽略
DT_SYMBOLIC16忽略忽略可选
DT_REL17d_ptr必需可选
DT_RELSZ18d_val必需可选
DT_RELENT19d_val必需可选
DT_PLTREL20d_val可选可选
DT_DEBUG21d_ptr可选忽略
DT_TEXTREL22忽略可选可选
DT_JMPREL23d_ptr可选可选
DT_BIND_NOW24忽略可选可选
DT_LOPROC0x70000000未定义未定义未定义
DT_HIPROC0x7fffffff未定义未定义未定义

打印动态段

// Print Dynamic Segment
#define DT_VAL 0
#define DT_PTR 1
const char *getDynamicType(Elf_Xword value) {
if (value >= DT_LOOS && value <= DT_HIOS)
return "OS-Specific";
if (value >= DT_LOPROC && value <= DT_HIPROC)
return "Processor-Specific";
switch (value) {
case DT_NULL: return "NULL";
case DT_NEEDED: return "NEEDED";
case DT_PLTRELSZ: return "PLTRELSZ";
case DT_PLTGOT: return "PLTGOT";
case DT_HASH: return "HASH";
case DT_STRTAB: return "STRTAB";
case DT_SYMTAB: return "SYMTAB";
case DT_RELA: return "RELA";
case DT_RELASZ: return "RELASZ";
case DT_RELAENT: return "RELAENT";
case DT_STRSZ: return "STRSZ";
case DT_SYMENT: return "SYMENT";
case DT_INIT: return "INIT";
case DT_FINI: return "FINI";
case DT_SONAME: return "SONAME";
case DT_RPATH: return "RPATH";
case DT_SYMBOLIC: return "SYMBOLIC";
case DT_REL: return "REL";
case DT_RELSZ: return "RELSZ";
case DT_RELENT: return "RELENT";
case DT_PLTREL: return "PLTREL";
case DT_DEBUG: return "DEBUG";
case DT_TEXTREL: return "TEXTREL";
case DT_JMPREL: return "JMPREL";
case DT_BIND_NOW: return "BIND_NOW";
case DT_INIT_ARRAY: return "INIT_ARRAY";
case DT_FINI_ARRAY: return "FINI_ARRAY";
case DT_INIT_ARRAYSZ: return "INIT_ARRAYSZ";
case DT_FINI_ARRAYSZ: return "FINI_ARRAYSZ";
case DT_RUNPATH: return "RUNPATH";
case DT_FLAGS: return "FLAGS";
case DT_ENCODING: return "ENCODING";
case DT_SYMTAB_SHNDX: return "SYMTAB_SHNDX";
case DT_RELRSZ: return "RELRSZ";
case DT_RELR: return "RELR";
case DT_RELRENT: return "RELRENT";
case DT_NUM: return "NUM";
case DT_VALRNGLO: return "VALRNGLO";
case DT_GNU_PRELINKED: return "GNU_PRELINKED";
case DT_GNU_CONFLICTSZ: return "GNU_CONFLICTSZ";
case DT_GNU_LIBLISTSZ: return "GNU_LIBLISTSZ";
case DT_CHECKSUM: return "CHECKSUM";
case DT_PLTPADSZ: return "PLTPADSZ";
case DT_MOVEENT: return "MOVEENT";
case DT_MOVESZ: return "MOVESZ";
case DT_FEATURE_1: return "FEATURE_1";
case DT_POSFLAG_1: return "POSFLAG_1";
case DT_SYMINSZ: return "SYMINSZ";
case DT_SYMINENT: return "SYMINENT";
case DT_ADDRRNGLO: return "ADDRRNGLO";
case DT_GNU_HASH: return "GNU_HASH";
case DT_TLSDESC_PLT: return "TLSDESC_PLT";
case DT_TLSDESC_GOT: return "TLSDESC_GOT";
case DT_GNU_CONFLICT: return "GNU_CONFLICT";
case DT_GNU_LIBLIST: return "GNU_LIBLIST";
case DT_CONFIG: return "CONFIG";
case DT_DEPAUDIT: return "DEPAUDIT";
case DT_AUDIT: return "AUDIT";
case DT_PLTPAD: return "PLTPAD";
case DT_MOVETAB: return "MOVETAB";
case DT_SYMINFO: return "SYMINFO";
case DT_VERSYM: return "VERSYM";
case DT_RELACOUNT: return "RELACOUNT";
case DT_RELCOUNT: return "RELCOUNT";
case DT_FLAGS_1: return "FLAGS_1";
case DT_VERDEF: return "VERDEF";
case DT_VERDEFNUM: return "VERDEFNUM";
case DT_VERNEED: return "VERNEED";
case DT_VERNEEDNUM: return "VERNEEDNUM";
case DT_AUXILIARY: return "AUXILIARY";
case DT_FILTER: return "FILTER";
default: return "Unknown Type";
}
}
uint32_t getDynamicDunType(Elf_Xword value) {
switch (value) {
case DT_NULL:
case DT_NEEDED:
case DT_PLTRELSZ:
case DT_RELASZ:
case DT_RELAENT:
case DT_STRSZ:
case DT_SYMENT:
case DT_SONAME:
case DT_RPATH:
case DT_SYMBOLIC:
case DT_RELSZ:
case DT_RELENT:
case DT_PLTREL:
case DT_TEXTREL:
case DT_BIND_NOW:
case DT_LOPROC:
case DT_HIPROC:
return DT_VAL;
case DT_PLTGOT:
case DT_HASH:
case DT_STRTAB:
case DT_SYMTAB:
case DT_RELA:
case DT_INIT:
case DT_FINI:
case DT_JMPREL:
case DT_DEBUG:
case DT_REL:
return DT_PTR;
default:
return DT_VAL;
}
}
void printDynamicSegment32(const Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,uint8_t* pFileBuffer) {
for (int i = 0; i < sectionNum; i++) {
if (pSectionHeader[i].sh_type == SHT_DYNAMIC) {
Elf32_Shdr *pDynamicSection = &pSectionHeader[i];
Elf32_Word dynamicItemNum = pDynamicSection->sh_size / pDynamicSection->sh_entsize;
printf("Dynamic Section At File Offset %#x Contains %d Entries:\n", pDynamicSection->sh_offset,dynamicItemNum);
printf("\tTag \t\tType\t\t\t\tName/Value\n");
Elf32_Dyn *pDynamicTable = (Elf32_Dyn *) (pFileBuffer + pDynamicSection->sh_offset);
Elf32_Shdr *pDynamicStringTableHeader = &pSectionHeader[pDynamicSection->sh_link];
// dynamic string table
char *pDynamicStringTable = (char *) pFileBuffer + pDynamicStringTableHeader->sh_offset;
for (int j = 0; j < dynamicItemNum; j++) {
printf("\t%08x", pDynamicTable[j].d_tag);
printf("\t%-16s", getDynamicType(pDynamicTable[j].d_tag));
printf("\t%08x\t", pDynamicTable[j].d_un.d_val);
if (getDynamicDunType(pDynamicTable[j].d_tag) == DT_PTR) //Some special item is ptr
printf("(PTR)");
//Index of shared library path in dynamic string table
switch (pDynamicTable[j].d_tag) {
case DT_NEEDED: printf("[%s]", pDynamicStringTable + pDynamicTable[j].d_un.d_val);
break;
case DT_SONAME: printf("[%s]", pDynamicStringTable + pDynamicTable[j].d_un.d_val);
break;
default: ;
}
printf("\n");
}
}
}
}


Hash Table(Export Table)

哈希表可用于查询导出函数, 有两种, 目前的elf文件主要是用GNU HASH表作为导出表。

.hash 		//旧版,可以查导入和导出函数 DT_HASH
.gnu.hash //新版,只能查导出函数 DT_GNU_HASH

ELF Hash

Hash表定义如下

struct ELFHash {  
uint32_t nbucket; //bucket的数目
uint32_t nchain; //chain的数目,和动态符号表的符号数相同
uint32_t buckets[]; //nbucket个项的数组
uint32_t chains[]; //nchain个项的数组
};

Linux原始Elf Hash算法如下

uint32_t elf_hash(const unsigned char* name)
{
uint32_t h = 0, g;
while (*name)
{
h = (h << 4) + *name++;
if (g = h & 0xf0000000)
h ^= g >> 24;
h &= ~g;
}
return h;
}

ELF Hash Table根据符号名查找符号地址的流程如下

1.根据elfhash函数计算符号名的hash

2.index=buckets[hash%nbucket]

3.如果index==SHT_UNDEF(0)则未找到符号,结束

4.如果符号名不同则根据index从chains表找下一个符号索引,继续第3步

代码表示如下:

uint32_t findSymbolIndexByElfHash(const char* symbolName, 
uint32_t* pHashTable,
Elf32_Sym* pSymbolTable,
const char* pSymbolStringTable)

{
uint32_t nbucket=pHashTable[0],nchain=pHashTable[1];
uint32_t* buckets=&pHashTable[2],*chains=&pHashTable[2+nbucket];
uint32_t hash = elf_hash(symbolName);
for (uint32_t index=buckets[hash % nbucket]; index; index = chains[index]) {
if (strcmp(symbolName, &pSymbolStringTable[pSymbolTable[index].st_name]) == 0) {
return index;
}
}
return 0;
}

手工查找流程示例:

由于x86_64下gcc编译的elf程序默认只使用gnu.hash,以Android NDK得到的64位so为例

找到.hash节,发现nbucket=nchain=0x36

根据elfhash计算bucket下标, index=hash%nbucket =48

由于bucket项大小为4字节,从0x960开始+48*4=0xA20

得到动态符号表下标为0xE(14), 查找符号表正好对应dlopen函数

Android Elf Hash

Android的elfhash算法代码有所不同,但和原始elfhash等价

参考https://cs.android.com/android/platform/superproject/+/android-4.1.2_r2.1:bionic/linker/linker.c

static unsigned elfhash(const char *_name)
{
const unsigned char *name = (const unsigned char *) _name;
unsigned h = 0, g;

while(*name) {
h = (h << 4) + *name++;
g = h & 0xf0000000;
h ^= g;
h ^= g >> 24;
}
return h;
}

static Elf32_Sym *_elf_lookup(soinfo *si, unsigned hash, const char *name)
{
Elf32_Sym *s;
Elf32_Sym *symtab = si->symtab;
const char *strtab = si->strtab;
unsigned n;

TRACE_TYPE(LOOKUP, "%5d SEARCH %s in %s@0x%08x %08x %d\n", pid,
name, si->name, si->base, hash, hash % si->nbucket);
n = hash % si->nbucket;

for(n = si->bucket[hash % si->nbucket]; n != 0; n = si->chain[n]){
s = symtab + n;
if(strcmp(strtab + s->st_name, name)) continue;

/* only concern ourselves with global and weak symbol definitions */
switch(ELF32_ST_BIND(s->st_info)){
case STB_GLOBAL:
case STB_WEAK:
/* no section == undefined */
if(s->st_shndx == 0) continue;

TRACE_TYPE(LOOKUP, "%5d FOUND %s in %s (%08x) %d\n", pid,
name, si->name, s->st_value, s->st_size);
return s;
}
}

return NULL;
}

Sysv Hash

Elf Hash在Android又定义为为Sysv Hash,参考https://cs.android.com/android/platform/superproject/+/android14-qpr3-release:external/musl/ldso/dynlink.c

static uint32_t sysv_hash(const char *s0)
{
const unsigned char *s = (void *)s0;
uint_fast32_t h = 0;
while (*s) {
h = 16*h + *s++;
h ^= h>>24 & 0xf0;
}
return h & 0xfffffff;
}
static Sym *sysv_lookup(const char *s, uint32_t h, struct dso *dso)
{
size_t i;
Sym *syms = dso->syms;
Elf_Symndx *hashtab = dso->hashtab;
char *strings = dso->strings;
for (i=hashtab[2+h%hashtab[0]]; i; i=hashtab[2+hashtab[0]+i]) {
if ((!dso->versym || dso->versym[i] >= 0)
&& (!strcmp(s, strings+syms[i].st_name)))
return syms+i;
}
return 0;
}

GNU Hash

GNU Hash表项如下

struct GnuHash {  
uint32_t nbucket;
uint32_t symndx; //支持查找index>=symndx的符号, index<symndx的不能直接通过GNU Hash表查找
uint32_t bloomSize; // 布隆过滤器需要的3个数据,用于快速判断某个符号是否查不到
uint32_t bloomShift; //
ElfW(Addr) blooms[]; // bloomSize个项的数组 32/64位下, 元素大小分别为uint32_t/uint64_t
uint32_t buckets[]; // nbucket个项的数组
uint32_t chains[]; // 和符号表索引一一对应, chain的大小等于导出函数个数
};

可以发现,GNU Hash并没有给出nchain字段,如何计算?

◆chains数组前面是连续的blooms和buckets数组,只要根据哈希表大小减去前面的成员大小即可

◆32位 nchain=GNUHashTable.sh_size/sizeof(uint32_t) - (4+bloomSize+nbucket)

◆64位 nchain=GNUHashTable.sh_size/sizeof(uint32_t) - (4+bloomSize*2+nbucket)

查找GNU Hash表的示意图如下:

1.chain表的虚线部分并不存在

2.chain表每个表项保存符号的哈希值

详细可参考ELF 通过 Sysv Hash & Gnu Hash 查找符号的实现及对比和ELF解析07_哈希表, 导出表

参考https://cs.android.com/android/platform/superproject/+/android14-qpr3-release:external/musl/ldso/dynlink.c

Android Linker的源码实现如下

uint32_t gnu_hash(const unsigned char* str)
{
uint_32 h = 5381;// 0x1505
while(*str != 0)
{
h += (h<<5) +*str++;// 33 * h + *str = h*33 + c = h + h * 32 + c = h + h << 5 + c
}
return h;
}

static Sym *gnu_lookup(uint32_t h1, uint32_t *hashtab, struct dso *dso, const char *s)
{
uint32_t nbuckets = hashtab[0];
uint32_t *buckets = hashtab + 4 + hashtab[2]*(sizeof(size_t)/4);
uint32_t i = buckets[h1 % nbuckets];

if (!i) return 0;

uint32_t *hashval = buckets + nbuckets + (i - hashtab[1]);

for (h1 |= 1; ; i++) {
uint32_t h2 = *hashval++;
if ((h1 == (h2|1)) && (!dso->versym || dso->versym[i] >= 0)
&& !strcmp(s, dso->strings + dso->syms[i].st_name))
return dso->syms+i;
if (h2 & 1) break;
}

return 0;
}

打印哈希表

unsigned int elf_hash(const char* _name)
{
const unsigned char* name=(const unsigned char*)_name;
unsigned int h = 0, g;
while (*name)
{
h = (h << 4) + *name++;
if (g = h & 0xf0000000)
h ^= g >> 24;
h &= ~g;
}
return h;
}
void printHashTable32(Elf32_Shdr* pSectionHeader,Elf_Half sectionNum,uint8_t* pFileBuffer,const char* pSectionHeaderStringTable) {
printf("ELF Hash Tables:\n");
for(int i=0;i<sectionNum;i++) {
if(pSectionHeader[i].sh_type==SHT_HASH) {
//SHT_HASH 可同时查询导入和导出函数,linux默认弃用,android保留该节
//对于SHT_HASH类型而言,index=buckets[elfhash(symbolName)%nbucket]作为符号表索引
//如果index==0则符号不存在,如果符号不等则index=chains[index]继续循环判断
Elf32_Shdr* pDynamicSymbolTableHeader=&pSectionHeader[pSectionHeader[i].sh_link];
Elf32_Sym* pDynamicSymbolTable=(Elf32_Sym*)(pDynamicSymbolTableHeader->sh_offset+pFileBuffer);
const char* pDynamicSymbolStringTable=(const char*)(pSectionHeader[pDynamicSymbolTableHeader->sh_link].sh_offset+pFileBuffer);
uint32_t* pHashTable=(uint32_t*)(pSectionHeader[i].sh_offset+pFileBuffer);
uint32_t nbucket=pHashTable[0],nchain=pHashTable[1];
uint32_t* buckets=&pHashTable[2];
uint32_t* chains=&pHashTable[2+nbucket];
printf("\tHash Table '%s' contains %d entries\n",&pSectionHeaderStringTable[pSectionHeader[i].sh_name],nchain);
printf("\t\tNum\t\tHash \% Nbucket\t\tIndex\t\t\tValue\t\t\tName\n");
for(uint32_t j=0,count=0;j<nbucket;j++) {
uint32_t index=buckets[j];//遍历buckets
if(index) {
//index!=0 说明存在对应符号,打印首个符号
printf("\t\t%d\t\t%08x\t\t%08x\t\t%08x\t\t%s\n",++count,elf_hash(&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name])%nbucket,index,pDynamicSymbolTable[index].st_value,&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name]);
}
//判断是否存在chain,打印相同hash%nbucket的其余符号,
while(chains[index]) {
index=chains[index];
printf("\t\t%d\t\t%08x\t\t%08x\t\t%08x\t\t%s\n",++count,elf_hash(&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name])%nbucket,index,pDynamicSymbolTable[index].st_value,&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name]);
}
}
}
if(pSectionHeader[i].sh_type==SHT_GNU_HASH) {
//SHT_GNU_HASH 只能查询导出函数,作为elf的导出函数表
Elf32_Shdr* pDynamicSymbolTableHeader=&pSectionHeader[pSectionHeader[i].sh_link];
Elf32_Sym* pDynamicSymbolTable=(Elf32_Sym*)(pDynamicSymbolTableHeader->sh_offset+pFileBuffer);
const char* pDynamicSymbolStringTable=(const char*)(pSectionHeader[pDynamicSymbolTableHeader->sh_link].sh_offset+pFileBuffer);
uint32_t* pGNUHashTable=(uint32_t*)(pSectionHeader[i].sh_offset+pFileBuffer);
uint32_t nbucket=pGNUHashTable[0];
uint32_t symndx=pGNUHashTable[1];
uint32_t bloomSize=pGNUHashTable[2];
uint32_t bloomShift=pGNUHashTable[3];
Elf32_Addr* blooms=(Elf32_Addr*)&pGNUHashTable[4];
uint32_t* buckets=pGNUHashTable+4+bloomSize;
uint32_t* chains=buckets+nbucket-symndx;
//chain的个数等于导出符号个数,但GNU HASH没有nchain,需要手动计算
uint32_t nchain=pSectionHeader[i].sh_size/sizeof(uint32_t)-(4+bloomSize+nbucket);
printf("\tHash Table '%s' contains %d entries, nbucket: %d, symndx: %#x \n",&pSectionHeaderStringTable[pSectionHeader[i].sh_name],nchain,nbucket,symndx);
printf("\t\tNum\t\tIndex\t\t\tValue\t\t\tName\n");
for(int j=0,count=0;j<nbucket;j++) {
uint32_t index=buckets[j];
if(index) {
printf("\t\t%d\t\t%08x\t\t%08x\t\t%s\n",++count,index,pDynamicSymbolTable[index].st_value,&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name]);
}
//chain最低位为0时表示有,为1时表示无
while((chains[index]&1)==0) {
index++;
printf("\t\t%d\t\t%08x\t\t%08x\t\t%s\n",++count,index,pDynamicSymbolTable[index].st_value,&pDynamicSymbolStringTable[pDynamicSymbolTable[index].st_name]);
}
}
}
}
}

11

ELF Loader

ELF Program Header描述了ELF文件的哪些段需要映射到内存,ELF程序的加载流程如下:

1.将elf文件加载到内存中,成为filebuffer

2.根据program header,映射filebuffer至imagebuffer

3.重定位,修复全局变量地址和外部引用地址

4.跳转至入口点

分别编译loadelf32/64以加载x86/x64的elf文件

gcc -m32 main.c LoadELF.h LoadELF.c -o loadelf32
gcc -m64 main.c LoadELF.h LoadELF.c -o loadelf64

main.c

// LoadELF
#include "LoadELF.h"
#include <stdio.h>
int main(int argc, char *argv[]) {
if (argc!= 2) {
printf("Usage: %s <filepath>\n", argv[0]);
return 1;
}
LoadAndExecElf(argv[1]);
return 0;
}

LoadELF.h

#ifndef LOADELF_H
#define LOADELF_H
#include <stddef.h>
#include <stdint.h>
uint8_t* readFileToBytes(const char *fileName,size_t* readSize);
void LoadAndExecElf(const char* filePath);
#endif //LOADELF_H

LoadELF.c

根据x86/x64不同环境,定义对应宏

#include "LoadELF.h"
#include <stdio.h>
#include <elf.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <string.h>
#include <sys/mman.h>
#include <link.h>
#ifdef __x86_64__
#define Elf_Ehdr Elf64_Ehdr
#define Elf_Phdr Elf64_Phdr
#define Elf_Shdr Elf64_Shdr
#define Elf_Addr Elf64_Addr
#define Elf_Dyn Elf64_Dyn
#define Elf_Rel Elf64_Rela
#define Elf_Sym Elf64_Sym
#define ELF_R_TYPE ELF64_R_TYPE
#define ELF_R_SYM ELF64_R_SYM
#define DT_REL_ITEM DT_RELA
#define DT_REL_SZ DT_RELASZ
#else
#define Elf_Ehdr Elf32_Ehdr
#define Elf_Phdr Elf32_Phdr
#define Elf_Shdr Elf32_Shdr
#define Elf_Addr Elf32_Addr
#define Elf_Dyn Elf32_Dyn
#define Elf_Rel Elf32_Rel
#define Elf_Sym Elf32_Sym
#define ELF_R_TYPE ELF32_R_TYPE
#define ELF_R_SYM ELF32_R_SYM
#define DT_REL_ITEM DT_REL
#define DT_REL_SZ DT_RELSZ
#endif
uint8_t* readFileToBytes(const char *fileName,size_t* readSize) {
FILE *file = fopen(fileName, "rb");
if (file == NULL) {
printf("Error opening file\n");
fclose(file);
return NULL;
}
fseek(file, 0,SEEK_END);
size_t fileSize = ftell(file);
fseek(file, 0,SEEK_SET);
uint8_t *buffer = (uint8_t *) malloc(fileSize);
if (buffer == NULL) {
printf("Error allocating memory\n");
fclose(file);
return NULL;
}
size_t bytesRead = fread(buffer, 1, fileSize, file);
if(bytesRead!=fileSize) {
printf("Read bytes not equal file size!\n");
free(buffer);
fclose(file);
return NULL;
}
fclose(file);
if(readSize)
*readSize=bytesRead;
return buffer;
}
//以指定对齐值对齐
uint64_t alignValue(uint64_t value, uint64_t alignment) {
return value % alignment ? (value / alignment + 1) * alignment : value;
}
size_t getElfMemorySize(Elf_Phdr* pProgramHeader,Elf_Half segmentNum) {
size_t size = 0;
//从后往前遍历段表,最后一个段的内存起始地址+大小对齐后即为镜像大小
for (int i = segmentNum - 1; i >= 0; i--) {
if (pProgramHeader[i].p_type == PT_LOAD) {
size = pProgramHeader[i].p_vaddr + pProgramHeader[i].p_memsz;
break;
}
}
return alignValue(size, 0x1000);
}
Elf_Word getDynamicTableValueByType(Elf_Dyn *dynamicTable, size_t dynamicTableSize, int type) {
for (int i = 0; i < dynamicTableSize; i++) {
if (dynamicTable[i].d_tag == type) {
return dynamicTable[i].d_un.d_val;
}
}
return 0;
}

const char** getNeededLibraryPath(uint8_t* pElfBuffer,Elf_Dyn *pDynamicTable, size_t dynamicTableSize,size_t* neededLibraryNum) {
//Traverse dynamic segment find needed library
char** buffer = NULL;
int num=0;
char* pImageStringTable=(char*)pElfBuffer+getDynamicTableValueByType(pDynamicTable,dynamicTableSize,DT_STRTAB);
for (int i = 0; i < dynamicTableSize; i++) {
if (pDynamicTable[i].d_tag == DT_NEEDED) {
num++;
buffer=(char**)realloc(buffer,num*sizeof(char*));
if(buffer==NULL) {
printf("Error reallocating memory\n");
exit(-1);
}
buffer[num-1]=pImageStringTable+ pDynamicTable[i].d_un.d_val;
}
}
*neededLibraryNum=num;
return (const char**)buffer;
}

Elf_Addr getSymbolAddress(const char** neededLibrary, size_t neededLibraryNum, const char *symbolName) {
//Load needed dynamic libraries,and traverse libraries, get symbol address
for (int i = 0; i < neededLibraryNum; i++) {
void *handle = dlopen(neededLibrary[i],RTLD_NOW);
if (handle == NULL) {
printf("Error opening library %s\n", dlerror());
exit(1);
}
void *address = dlsym(handle, symbolName);
if (address == NULL) {
continue;
}
return (Elf_Addr)address;
}
printf("Can't find address of symbol: %s\n",symbolName);
return 0;
}

void mapSegmentToMemory(uint8_t* pImageBuffer,uint8_t* pFileBuffer,Elf_Phdr* pProgramHeader,Elf_Half segmentNum) {
for (int i = 0; i < segmentNum; i++) {
if (pProgramHeader[i].p_type == PT_LOAD) {
uint8_t *pImageAddr = pImageBuffer + pProgramHeader[i].p_vaddr;//根据内存地址和大小进行映射
size_t memorySize = pProgramHeader[i].p_memsz;
Elf_Word segmentFlags = pProgramHeader[i].p_flags;
int protection = 0;
memcpy(pImageAddr, pFileBuffer + pProgramHeader[i].p_offset, pProgramHeader[i].p_filesz);
if (segmentFlags & PF_R) {
protection |= PROT_READ;
}
if (segmentFlags & PF_W) {
protection |= PROT_WRITE;
}
if (segmentFlags & PF_X) {
protection |= PROT_EXEC;
}
mprotect(pImageAddr, alignValue(memorySize, 0x1000), protection);//页面权限设置
}
}
}
void fixRelocationItem(Elf_Rel* pRelocationTable,Elf_Word relocationItemNum,uint8_t* pImageBuffer,const char* pDynamicStringTable,Elf_Sym* pDynamicSymbolTable,const char** neededLibrary,size_t neededLibraryNum) {
Elf_Addr* fixItem=NULL;//根据位数不同,修复项4/8字节
Elf_Addr baseAddr=(Elf_Addr)pImageBuffer;
for(int i=0;i<relocationItemNum;i++) {
switch (ELF_R_TYPE(pRelocationTable[i].r_info)) {
//Relocate base address
case R_386_RELATIVE:
fixItem=(Elf_Addr*)(pImageBuffer+pRelocationTable[i].r_offset);
*fixItem+=baseAddr;
break;
//Fix GOT and PLT
case R_386_GLOB_DAT:
case R_386_JMP_SLOT:
// Get symbol name and real address
const char* symbolName=&pDynamicStringTable[ pDynamicSymbolTable[ELF_R_SYM(pRelocationTable[i].r_info)].st_name ];//符号表表项的name属性是字符串表下标
fixItem=(Elf_Addr*)(pImageBuffer+pRelocationTable[i].r_offset);
Elf_Addr symbolAddr=getSymbolAddress(neededLibrary,neededLibraryNum,symbolName);
*fixItem=symbolAddr;
break;
}
}
}
void LoadAndExecElf(const char* filePath) {
//1. Read file to memory buffer
size_t readFileSize=0;
uint8_t* pFileBuffer=readFileToBytes(filePath,&readFileSize);
if(pFileBuffer==NULL) {
printf("Error reading file\n");
return;
}
Elf_Ehdr* pElfHeader=(Elf_Ehdr*)pFileBuffer;
Elf_Phdr *pProgramHeader=(Elf_Phdr*)(pFileBuffer+pElfHeader->e_phoff);
Elf_Half segmentNum=pElfHeader->e_phnum;
uint8_t* pImageBuffer=NULL;

//2. Mapping file buffer to image buffer
size_t elfMemorySize = getElfMemorySize(pProgramHeader,segmentNum);
if (elfMemorySize == 0) {
printf("ELF memory size is 0!\n");
return;
}
posix_memalign((void*)&pImageBuffer, 0x1000, elfMemorySize); //Alloc align memory
if (pImageBuffer == NULL) {
printf("Error allocating memory\n");
return;
}
memset(pImageBuffer,0 ,elfMemorySize);
// Mapping segments to memory and set protection
mapSegmentToMemory(pImageBuffer,pFileBuffer,pProgramHeader,segmentNum);

//3. Relocate
Elf_Phdr *pDynamicTableHeader=NULL;
Elf_Dyn *pDynamicTable=NULL;
for (int i = 0; i < segmentNum; i++) {
if (pProgramHeader[i].p_type == PT_DYNAMIC) {
pDynamicTableHeader = &pProgramHeader[i];
break;
}
}
pDynamicTable = (Elf_Dyn *) (pImageBuffer + pDynamicTableHeader->p_vaddr);
size_t dynamicItemNum = pDynamicTableHeader->p_filesz / sizeof(Elf_Dyn);
Elf_Rel *pRelocationTable =NULL;
size_t relocationItemNum=0;
Elf_Rel *pJmpRelocationTable = (Elf_Rel *) (pImageBuffer + getDynamicTableValueByType(pDynamicTable, dynamicItemNum,DT_JMPREL));
size_t jmpRelocationItemNum=0;
Elf_Sym *pDynamicSymbolTable = NULL;
char *pDynamicStringTable = NULL;
for (int i = 0; i <dynamicItemNum; i++) {
switch (pDynamicTable[i].d_tag) {
case DT_REL_ITEM: pRelocationTable=(Elf_Rel*)(pImageBuffer+pDynamicTable[i].d_un.d_val); break;
case DT_JMPREL: pJmpRelocationTable=(Elf_Rel*)(pImageBuffer+pDynamicTable[i].d_un.d_val); break;
case DT_REL_SZ: relocationItemNum=pDynamicTable[i].d_un.d_val/sizeof(Elf_Rel); break;
case DT_PLTRELSZ: jmpRelocationItemNum=pDynamicTable[i].d_un.d_val/sizeof(Elf_Rel); break;
case DT_SYMTAB:pDynamicSymbolTable=(Elf_Sym*)(pImageBuffer+pDynamicTable[i].d_un.d_val);break;
case DT_STRTAB:pDynamicStringTable=(char*)(pImageBuffer+pDynamicTable[i].d_un.d_val);break;
}
}

size_t neededLibraryNum=0;
const char** neededLibrary=getNeededLibraryPath(pImageBuffer,pDynamicTable,dynamicItemNum,&neededLibraryNum);
fixRelocationItem(pRelocationTable,relocationItemNum,pImageBuffer,pDynamicStringTable,pDynamicSymbolTable,neededLibrary,neededLibraryNum);
fixRelocationItem(pJmpRelocationTable,jmpRelocationItemNum,pImageBuffer,pDynamicStringTable,pDynamicSymbolTable,neededLibrary,neededLibraryNum);

//4. Jump to entry point
typedef void (*VoidFunctionPtr)();
VoidFunctionPtr entry=(VoidFunctionPtr)(pImageBuffer+pElfHeader->e_entry);
printf("Load ELF success!Jump to entry point:%#lx\n",(unsigned long long)entry);
entry();
printf("Come back\n");

}

效果如下

12

References

ELF文件格式

ELF文件格式解析

《程序员的自我修养》

ELF加载器的原理与实现

【内核】ELF 文件执行流程

说一下Linux可执行文件的格式,ELF格式

ELF解析07_哈希表, 导出表

ELF 通过 Sysv Hash & Gnu Hash 查找符号的实现及对比

[翻译]GNU Hash ELF Sections

看雪ID:东方玻璃

https://bbs.kanxue.com/user-home-968342.htm

*本文为看雪论坛优秀文章,由 东方玻璃 原创,转载请注明来自看雪社区

# 往期推荐

1、细说软件保护

2、强网杯S8决赛PWN-赛题解析

3、强网杯2024 ez_vm 手撕VM + DFA Attack Whitebox AES

4、PWN入门:三打竞态条件漏洞-TOCTOU

5、《英雄无敌》4:修改pe导入表注入DLL扩展回城术功能

6、Fuzzing原理探究:boofuzz背后的生成算法

球分享

球点赞

球在看

点击阅读原文查看更多


文章来源: https://mp.weixin.qq.com/s?__biz=MjM5NTc2MDYxMw==&mid=2458587912&idx=1&sn=4ad15eeb82b8a9aa74549d38c434f8bc&chksm=b18c238286fbaa946e5eb7029be4c82277cba5b76f615a477adb6facc0f2e2e5102db924d3d4&scene=58&subscene=0#rd
如有侵权请联系:admin#unsafe.sh