SECTIONS and OVERWRITE

The main task of a linker script is to define additional symbols and how input sections map into output sections. It has other miscellaneous features which can be implemented via command line options:

The ENTRY command can be replaced by --entry.
The OUTPUT_FORMAT command can usually be replaced by -m.
The SEARCH_DIRS command can be replaced by -L.
The VERSION command can be replaced by --version-script.
The INPUT and GROUP commands can add other files as input. This provides a mechanism to split an archive/shared object into multiple files.

This post describes the SECTIONS command. Here is the syntax:

SECTIONS { section-command section-command ... } [INSERT [AFTER|BEFORE] anchor_section;]

The INSERT part (in the bracket) is optional.

Each section-command can be a symbol assignment, an output section description, or an overlay description. The linker processes section-commands in order and maintains the current location (which can be referenced via .).

a = .; is a symbol assignment. It defines a symbol called a at the current location. If the symbol is also defined by a relocatable object file or an extracted archive member, the symbol assignment takes precedence. (In GNU ld, linker scripts and archives are processed in order, so a symbol assignment can suppress the extraction of a subsequent archive member.)

Output section descriptions

An output section description defines an output section. An output section represents a section in the linker output. A simplest output output description is .text : {} where the section name is mandatory. Section attributes (type, AT, ALIGN, VMA memory region, LMA memory region, etc) are optional:

section [address] [(type)] : [AT(lma)] [ALIGN(section_align)] [SUBALIGN](subsection_align)] {
  output-section-command
  ...
} [>region] [AT>lma_region] [:phdr ...] [=fillexp] [,]

An output section description consists of multiple output-section-commands. An output-section-command is one of:

a symbol assignment
an input section description
a BYTE, SHORT, LONG, QUAD command
an output section keyword

Input section descriptions and symbol assignments are common.

.text : { *(.text .text.*) BYTE(0) a = .; } is an example involves an input section description, a data command, and a symbol assignment. BYTE(0) and a = .; are pretty self-descriptive.

Input section descriptions

An input section description consists of a file name pattern and a list of space-separated section patterns. The most common file name pattern is the wildcard *. Section patterns may be exact names (.text) or a glob (.text.*).

The section patterns are unordered: *(.text .text.*) does not place an ordering requirement on .text and .text.*. If the linker sees a .text.a before a .text, it will place the .text.a before the .text.

The KEEP keyword can be used to retain some input sections, e.g. .retain : { KEEP(*(.retain)) }.

Symbol assignments

Symbol assignments can be inside an output section description. They are commonly used to define encapsulation symbols, e.g. foo : { __start_foo = .; *(foo); __stop_foo = .; }.

An anti-pattern is:

1
2
3

__start_foo = .;
foo : { *(foo) }
__stop_foo = .;

This is problematic because the linker's orphan section placement rules may place orphans between __start_foo and foo, breaking the intention of __start_foo:

__start_foo = .;
/* Conceptually */ bar : { *(bar) }
foo : { *(foo) }
__stop_foo = .;

As an example:

# RUN: split-file %s %t
# RUN: cc -c %t/a.s -o %t/a.o
# RUN: ld.lld %t/lds %t/a.o

#--- a.s
.section bar,"ax"; .byte 0
.section foo,"aw"; .byte 0

#--- lds
sections {
  . = SIZEOF_HEADERS;

  __start_foo = .;
  foo : {}
  __stop_foo = .;
}

The orphans and foo have different ranks. The linker does have a special rule to skip non-dot assignments like __stop_foo after an output section description. So sections of the same rank are usually benign.

`INSERT BEFORE` and `INSERT AFTER`

We have delved into the details of output section descriptions for a while. Let's return to the optional INSERT keyword for which we omitted a description. This feature gives us a way to reorder output sections.

SECTIONS {
  .foo : { *(.foo) }
  .bar : { *(.bar) }
} INSERT AFTER .bss;

.bss can be defined by an output section description or implicitly defined by orphan section placement. I added the orphan section support in D74375 (target: LLD 11.0.0).

This is cool for CUDA sections like .nv_fatbin: by moving them after .bss we can mitigate R_X86_64_PC32 relocation overflows for large executables.

While the documentation is not clear, I think for INSERT AFTER the output section descriptions should be in order, i.e. .bss .foo .bar instead of .bss .bar .foo. I recently fixed this in D105158 (target: LLD 13.0.0).

Internal linker script vs external linker script

GNU ld has the concept of an internal linker script. Many layout rules are implemented in a file under ldscripts/ installed along with ld.bfd (e.g. /usr/lib/x86_64-linux-gnu/ldscripts/elf_x86_64.* on Debian).

GNU ld picks a linker script according to whether certain command line options are enabled (e.g. -z combreloc, -z separate-code, -z now). Note that some options may be enabled at configure time. For example -z combreloc is usually the default. -z separte-code is the default on Linux x86. Many GCC installations pass -z relro -z now to ld by default.

Appending --verbose to a GNU ld command line can tell whether an internal linker script or an external one is used.

If you specify a linker script without an option, its SECTIONS commands will be appended to the internal linker script's. Non-INSERT concatenated SECTIONS is usually not desired.

You may specify -T or --default-script to provide an external linker script. The internal linker script will be completely ignored.

A linker script cannot represent conditions like -z separate-code/-z noseparate-code distinction, or -z now/-z lazy distinction.

-z relro/-z norelro does not affect the internal linker script.

-z lazy internal linker script can be used by -z norelro and -z relro -z now links.
-z now internal linker script can be used by -z relro -z now links but cannot be used by -z norelro or -z relro -z lazy links.

gold and LLD

For more than two decades GNU ld was the only usable linker on Linux systems. Then gold was developed as a faster alternative. gold encodes many layout rules in code instead of ldscripts/ files. LLD inherited this property. IMO This is a superior design.

In the case where no linker script has been provided or every SECTIONS command is followed by INSERT, LLD applies built-in rules which are similar to GNU ld's internal linker scripts.

Align the first section in a PT_LOAD segment according to -z noseparate-code, -z separate-code, or -z separate-loadable-segments
Define __bss_start, end, _end, etext, _etext, edata, _edata
Sort .ctors.*/.dtors.*/.init_array.*/.fini_array.* and PowerPC64 specific .toc
Place input .text.* into output .text, and handle certain variants (.text.hot., .text.unknown., .text.unlikely., etc) in the precense of -z keep-text-section-prefix.

LLD can read GNU ld's internal linker scripts. You may also supply a minimal linker script like the following:

SECTIONS
{
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;

  . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE));
  .init_array    :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*)))
    KEEP (*(.init_array))
    PROVIDE_HIDDEN (__init_array_end = .);
  }
  .fini_array    :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*)))
    KEEP (*(.fini_array))
    PROVIDE_HIDDEN (__fini_array_end = .);
  }
  . = DATA_SEGMENT_RELRO_END (0, .);
  .data           : { *(.data .data.*) }
  . = .;
  .bss            : { *(.bss .bss.*) *(COMMON) }
  . = DATA_SEGMENT_END (.);
}

`OVERWRITE_SECTIONS`

An INSERT-flavored SECTIONS command does not suppress built-in layout rules, so it is useful to extend an existing internal linker script. However, there are two properties which are not ideal:

The output section descriptions impose an order.
The anchor section (as in INSERT AFTER where) must be present.

Sometimes we just want built-in orphan section placement and don't want to specify an oder. The OVERWRITE_SECTIONS command is useful in this case. I proposed it in https://sourceware.org/bugzilla/show_bug.cgi?id=26404 and implemented it for LLD in D103303 (target: LLD 13.0.0).

This feature is versatile. To list a few usage:

Use section : { KEEP(...) } to retain input sections under GC
Define encapsulation symbols (start/end) for an output section
Use section : ALIGN(...) : { ... } to overalign an output section (similar to ld64 -sectalign)