Relative relocations and RELR
2021-10-31 16:00:00 Author: maskray.me(查看原文) 阅读量:40 收藏

An ELF linker performs the following steps to process an absolute relocation type whose width equals the word size (e.g. R_AARCH64_ABS64, R_X86_64_64).

1
2
3
4
5
6
7
8
9
10
11
12
if (undefined_weak || (!preemptible && (no_pie || is_shn_abs)))
link-time constant
else if (SHF_WRITE || znotext) {
if (preemptible)
emit a symbolic relocation (e.g. R_X86_64_64)
else
emit a relative relocation (e.g. R_X86_64_RELATIVE)
} else if (!shared && (copy_relocation || canonical_plt_entry)) {
...
} else {
error
}

In -pie or -shared mode, the linker produces a relative relocation (R_*_RELATIVE) if the symbol is non-preemptible. The dynamic relocation is called a relative relocation.

1
2
3
4
5
6
.section .meta.foo,"a",@progbits
.quad .text.foo-. # link-time constant

# Without 'w', text relocation.
.section .meta.foo,"aw",@progbits
.quad .text.foo # R_*_RELATIVE dynamic relocation if -pie or -shared

The other source of R_*_RELATIVE relocations are GOT entries. See All about Global Offset Table.

1
2
3
4
5
6
if (preemptible)
emit an R_*_GLOB_DAT
else if (!pic || is_shn_abs)
link-time constant
else
emit a relative relocation

Representation

ELF has two relocation formats, REL and RELA. 64-bit capability and RELA are improvements on previous formats used by a.out and COFF.

1
2
3
4
5
6
7
8
9
10
typedef struct {
Elf64_Addr r_offset;
Elf64_Xword r_info;
} Elf64_Rel;

typedef struct {
Elf64_Addr r_offset;
Elf64_Xword r_info;
Elf64_Sxword r_addend;
} Elf64_Rela;

RELA is nice for static relocations on RISC architectures but is very size inefficient for dynamic relocations. REL is 33% more efficient but is still bloated when encoding relative relocations. For a relative relocation, the symbol index is 0, but we have to pay a word for r_info.

The dynamic loader performs *(Elf_Addr*)(base+r_offset) += base + addend;. For the REL format (used by arm and x86-32), addend is an implicit addend read from the to-be-relocated location. For the RELA format (used by most architectures), addend is r_addend stored in the relocation record.

Relative relocations have great locality. On a 64-bit platform, consecutive 8-byte locations are typically all relocated, e.g.

1
2
3
4
{r_offset = 0x2000, r_info = (R_X86_64_RELATIVE << 32) | 0},
{r_offset = 0x2008, r_info = (R_X86_64_RELATIVE << 32) | 0},
{r_offset = 0x2010, r_info = (R_X86_64_RELATIVE << 32) | 0},
{r_offset = 0x2018, r_info = (R_X86_64_RELATIVE << 32) | 0},

Sources of relative relocations

In a position-independent executable, R_*_RELATIVE relocations typically dominate: they easily take 90% of the total dynamic relocations. The ratio is larger if there are fewer R_*_JUMP_SLOT relocations. For a mostly statically linked position-independent executable, the ratio can be as large as 99%.

A symbolic shared object has a similar distribution of dynamic relocations. But non-symbolic shared objects are much more common and R_*_RELATIVE relocations usually take a very small portion of their dynamic relocations. (See ELF interposition and -Bsymbolic about symbol interposition.)

1
2
3
4
5
6
7
8

void *ptr = &foo;


static void *ptr = &foo;


const char *str[] = {"a", "b"};

For C++, virtual table can contribute many R_*_RELATIVE relocations through function pointers. Fuchsia contributed -fexperimental-relative-c++-abi-vtables to Clang which is also available on Linux. (underneath i64 sub (lhs, rhs) in a LLVM IR constant expression uses a PC-relative relocation.) This can make a large portion of the memory image read-only and save a lot of space (32-bit PC-relative offsets instead of 64-bit absolute addresses), but is difficult to deploy in practice because of the ABI change.

For some modern architectures (AArch64, RISC-V, x86-64), PIC does not have a size penalty on text sections compared to non-PIC. The number of R_*_RELATIVE relocations is the most significant source of code size bloat when changing from non-PIC to PIC.

Compressing REL/RELA relocations

One intuive idea is to omit r_type. We will need a new section but the size has been cut in half compared to REL.

1
2
3
4
5
.section naive,"a"
.quad 0x2000
.quad 0x2008
.quad 0x2010
.quad 0x2018

Next, we can think of delta encoding and narrower entries. But note that ELF tries to avoid unaligned/packed structures. In addition, delta encoding, if not designed carefully, can make medium/large code models hard.


Over the years, there have been multiple attempts compressing the ELF relocation formats.

In 2010, Mike Hommey added "elfhack" to Firefox (https://bugzilla.mozilla.org/show_bug.cgi?id=606145). Improving libxul startup I/O by hacking the ELF format is a write-up. It appeared to move most relative relocations from .rel.dyn/.rela.dyn into a custom section. The section basically has multiple pairs of a base offset and a count of subsequent consecutive relocations. The savings are quite significant.

In 2015, Android bionic got DT_ANDROID_REL/DT_ANDROID_RELA. This is somewhat over-engineered but does not optimize relative relocations well.

In 2017, Cary Coutant proposed a prototype of the RELR relocation format https://sourceware.org/legacy-ml/gnu-gabi/2017-q2/msg00003.html Ali Bahrami refined it to the format we have today: an even entry indicates an address while an odd entry indicates a bitmap. In 2018, Rahul Chaudhry added LLD/llvm-readelf support. It is a very simple format but with a significant saving. Read on.

RELR relocation format

This is currently a generic-abi pre-standard. We have Cary Coutant's written agreement that after he converts generic-abi to a more open format, RELR will be applied.

From Proposal for a new section type SHT_RELR

Description

SHT_RELR: The section holds an array of relocation entries, used to encode relative relocations that do not require explicit addends or other information. Array elements are of type Elf32_Relr for ELFCLASS32 objects, and Elf64_Relr for ELFCLASS64 objects. SHT_RELR sections are for dynamic linking, and may only appear in object files of type ET_EXEC or ET_DYN. An object file may have multiple relocation sections. See ``Relocation'' below for details.

[...]

The format is best described by code. When the feature is enabled, LLD creates .relr.dyn (of type SHT_RELR) which holds an array of relocation entries, used to encode relative relocations that do not require explicit addends. Regular R_*_RELATIVE from .rel.dyn/.rela.dyn are removed.

In the .relr.dyn section,

  • An even entry indicates a location which needs a relocation and sets up where for subsequent odd entries.
  • An odd entry indicates a bitmap encoding up to 63 locations following where.
  • Odd entries can be chained.
1
2
3
4
5
6
7
8
9
10
11
12
13
relrlim = (const Elf_Relr *)((const char *)obj->relr + obj->relrsize);
for (relr = obj->relr; relr < relrlim; relr++) {
Elf_Relr entry = *relr;
if ((entry & 1) == 0) {
where = (Elf_Addr *)(obj->relocbase + entry);
*where++ += (Elf_Addr)obj->relocbase;
} else {
for (long i = 0; (entry >>= 1) != 0; i++)
if ((entry & 1) != 0)
where[i] += (Elf_Addr)obj->relocbase;
where += CHAR_BIT * sizeof(Elf_Relr) - 1;
}
}

RELR can typically encode the same information in .rela.dyn in less than 3% space.

1
2
3
4
5
q() { file $1 | grep -q ELF || return; printf "$1\t"; bc <<< "scale=2; $(readelf -Wr $1 | grep -c _RELATIVE)*24 * 100 / $(stat -c %s $1)" | sed 's/^\./0./' }

for i in /usr/bin/*(.x); do nm $i | grep -q google && continue; q $i; done > /tmp/0

awk '{c+=$2*$3/100; s+=$2} END {print s, " ", c/s}'

On my Arch Linux, among /usr/bin/* executables, relative relocations take 7.9% of the total file size. These large executables spend 10+% in the .rela.dyn section: as, audacity, fzf, ndisarm, objdump, ocaml*, perf_*, qemu-system-*, pdftosrc, php*, strace, virt-*.

Application

Android, Chrome OS, and Fuchsia have used ld.lld --pack-dyn-relocs=relr and a DT_RELR capable loader for quite a while now.

FreeBSD

I added DT_RELR support to libexec/rtld-elf in https://reviews.freebsd.org/D32524. Thanks to [email protected] for review and moving it to the future 13.1 release.

musl

There is an unapplied patch https://www.openwall.com/lists/musl/2019/03/06/3. The BDFL says it is a nice improvement but not a critical change blocking anything, so the fate of the feature is still unclear.

glibc

I submitted [PATCH v2] elf: Support DT_RELR relative relocation format [BZ #27924] in October 2021. The good news is that it got some discussion and people agree that it is useful. The bad news is that due to complicated factors it is unclear when the patch can be merged.

Lack of binutils support

  • GNU ld does not support --pack-dyn-relocs=relr.
  • readelf and objdump cannot dump relocation sections.
  • objcopy reports an error when operating on a DT_RELR object.
  • gold does not support synthesizing the .relr.dyn section. ChromeOS folks maintain a patch.

How to test?

[PATCH v2] elf: Support DT_RELR relative relocation format [BZ #27924] includes a test.

LLD 13 can link glibc and pass all testsuite for aarch64 and x86-64 now, but LLD is not in the build configuration of build-many-glibcs.py which is more commonly used by maintainers.

glibc ld.so has some severe architectural problems. I find that linking ld.so itself with --pack-dyn-relocs=relr may make symbol lookup fail because ld.so itself's relative relocations may be doubly applied. DT_REL/DT_RELA are doubly applied as well, but some complex if guards prevent serious issues.

Time travel compatibility

Programs with new features run the risk of mysterious crashes or malfunction on old systems. Ali Bahrami calls new objects on old systems "time travel compatibility". "And yet, our experience is that although we don't go to great effort to catch time traveling objects, they very rarely cause problems for us. People seem to understand that they can't do that."

For DT_RELR objects, they will immediately segfault. But some folks within the glibc community are (overly) cautious about this. (This is the very point that GNU symbol versioning exists.)

I wish that the glibc community can do what Ali suggested in the generic-abi post (January 2018):

My free advice (worth what you paid for it) is to roll out the support, and then wait a bit before turning on the use widely, so that the support is in place before it is needed, and to not complicate things with a way to catch time travelers. The window of time where this can be a problem is finite, and once you're past it, you'll be glad to have a simpler system.

EI_ABIVERSION

Some might ask whether we could use EI_ABIVERSION.

The System V ABI says:

Byte e_ident[EI_ABIVERSION] identifies the version of the ABI to which the object is targeted. This field is used to distinguish among incompatible versions of an ABI. The interpretation of this version number is dependent on the ABI identified by the EI_OSABI field. If no values are specified for the EI_OSABI field by the processor supplement or no version values are specified for the ABI determined by a particular value of the EI_OSABI byte, the value 0 shall be used for the EI_ABIVERSION byte; it indicates unspecified.

EI_ABIVERSION is dependent on EI_OSABI. Operating systems decide their EI_ABIVERSION. ld.lld --pack-dyn-relocs=relr not bumping the ABI version can benefit some operating systems. We know that FreeBSD /Fuchsia/ChromeOS don't find it necessary (or don't want) to bump the ABI version. Bumping the ABI version can immediately lock out many ELF utilities which only deal with e_ident[EI_ABIVERSION] == 0 objects.

Even within ELFOSABI_GNU (used by Linux), different architectures may have different EI_ABIVERSION values. I know that mips may use EI_ABIVERSION==1 for (e_eflags & (EF_MIPS_PIC | EF_MIPS_CPIC)) == EF_MIPS_CPIC position-dependent executables.

Synthesized undefined dynamic symbol

The linker can synthesize an undefined symbol in the dynamic symbol table (.dynsym) to indicate the usage of DT_RELR. A DT_RELR capable glibc can define the symbol. When a new object runs on a old glibc, there will be an undefined symbol error.

If GNU ld adds, say, -z relr=glibc, with the functionality, I will probably just add a compatibility alias to LLD but not actually add the symbol (https://sourceware.org/pipermail/libc-alpha/2021-October/132460.html):

  • some users don't need "time travel compatibility"
  • --pack-dyn-relocs=relr would still be usable
  • I don't want users to migrate away from --pack-dyn-relocs=relr (churn) just because glibc has a different development model.

glibc 2.35 will be released around 2022-02-01. I hope that DT_RELR can catch the release.

When will Linux distributions adopt?

I have filed tickets for several large Linux distributions to bring the size saving to their attention. I also wish that distributions' support/petition can move the glibc patch forward.


文章来源: http://maskray.me/blog/2021-10-31-relative-relocations-and-relr
如有侵权请联系:admin#unsafe.sh