Remarks on SFrame

UNDER CONSTRUCTION

The .sframe format is a lightweight alternative to .eh_frame designed for efficient stack unwinding information. By trading some flexibility for compactness, SFrame achieves significantly smaller size while maintaining the essential unwinding capabilities needed by profilers and debuggers.

SFrame focuses on three fundamental elements:

Canonical Frame Address (CFA): The base address for stack frame calculations
Return address
Frame pointer

An .sframe section follows a straightforward layout:

Header: Contains metadata and offset information
Auxiliary header (optional): Reserved for future extensions
Function Descriptor Entries (FDEs): Array describing each function
Frame Row Entries (FREs): Arrays of unwinding information per function

struct sframe_header {
  struct {
    uint16_t sfp_magic;
    uint8_t sfp_version;
    uint8_t sfp_flags;
  } sfh_preamble;
  uint8_t sfh_abi_arch;
  int8_t sfh_cfa_fixed_fp_offset;
  
  int8_t sfh_cfa_fixed_ra_offset;
  
  uint8_t sfh_auxhdr_len;
  
  uint32_t sfh_num_fdes;
  uint32_t sfh_num_fres;
  
  uint32_t sfh_fre_len;
  
  uint32_t sfh_fdeoff;
  uint32_t sfh_freoff;
} ATTRIBUTE_PACKED;

While magic and version fields are popular choices for file formats, they deviate from established ELF conventions. Since ELF supports virtually unlimited section types and makes it trivial to allocate new section types for different versions, this approach would be more consistent with ELF's design philosophy.

However, SFrame will likely evolve over time, unlike ELF's more stable control structures. This means we'll probably need to keep producers and consumers evolving in lockstep, which creates a stronger case for internal versioning. An internal version field would allow linkers to upgrade or ignore unsupported low-version input pieces, providing more flexibility in handling version mismatches.

Data structures

Function Descriptor Entries (FDEs)

Each Function Descriptor Entry describes a function's location and links to its unwinding information through Frame Row Entries (FREs).

struct sframe_func_desc_entry {
  int32_t sfde_func_start_address;
  uint32_t sfde_func_size;
  uint32_t sfde_func_start_fre_off;
  uint32_t sfde_func_num_fres;
  
  
  
  uint8_t sfde_func_info;
  
  uint8_t sfde_func_rep_size;
  uint16_t sfde_func_padding2;
} ATTRIBUTE_PACKED;

### Frame Row Entries (FREs)

Frame Row Entries contain the actual unwinding information for specific program counter ranges within a function.
The template design allows for different address sizes based on the function's characteristics.

template <class AddrType>
struct sframe_frame_row_entry {
  
  AddrType sfre_start_address;
  
  
  
  sframe_fre_info sfre_info;
} ATTRIBUTE_PACKED;

Each FRE contains variable-length stack offsets stored as trailing data. The fre_offset_size field determines whether offsets use 1, 2, or 4 bytes (uint8_t, uint16_t, or uint32_t), allowing optimal space usage based on stack frame sizes.

Architecture-Specific Interpretations

SFrame adapts to different processor architectures by varying its offset encoding to match their respective calling conventions and architectural constraints.

x86-64 Convention

The x86-64 implementation takes advantage of the architecture's predictable stack layout:

First offset: Encodes CFA as BASE_REG + offset
Second offset (if present): Encodes FP as CFA + offset
Return address: Computed implicitly as CFA + sfh_cfa_fixed_ra_offset (using the header field)

AArch64 Convention

AArch64's more flexible calling conventions require explicit return address tracking:

First offset: Encodes CFA as BASE_REG + offset
Second offset: Encodes return address as CFA + offset
Third offset (if present): Encodes FP as CFA + offset

The explicit return address encoding accommodates AArch64's variable stack layouts and link register usage patterns.

`.eh_frame` and `.sframe`

SFrame reduces size compared to .eh_frame plus .eh_frame_hdr by:

Eliminating .eh_frame_hdr through sorted sfde_func_start_address fields
Replacing CIE pointers with direct FDE-to-FRE references
Using variable-width sfre_start_address fields (1 or 2 bytes) for small functions
Storing start addresses instead of address ranges. .eh_frame address ranges
Start addresses in a small function use 1 or 2 byte fields, more efficient than .eh_frame initial_location, which needs at least 4 bytes (DW_EH_PE_sdata4).
Hard-coding stack offsets rather than using flexible register specifications

However, the bytecode design of .eh_frame can sometimes be more efficient than .sframe, as demonstrated on x86-64.

SFrame serves as a specialized complement to .eh_frame rather than a complement replacement. The current version does not include personality routines, Language Specific Data Area (LSDA) information, or the ability to encode extra callee-saved registers. While these constraints make SFrame ideal for profilers and debuggers, they prevent it from supporting C++ exception handling, where libstdc++/libc++abi requires the full .eh_frame feature set.

In practice, executables and shared objects will likely contain all three sections:

.eh_frame: Complete unwinding information for exception handling
.eh_frame_hdr: Fast lookup table for .eh_frame
.sframe: Compact unwinding information for profilers

The auxiliary header, currently unused, provides a pathway for future enhancements. It could potentially accommodate .eh_frame augmentation data such as personality routines, language-specific data areas (LSDAs), and signal frame handling, bridging some of the current functionality gaps.

Large text section support

The sfde_func_start_address field uses a signed 32-bit offset to reference functions, providing a ±2GB addressing range from the field's location. This signed encoding offers flexibility in section ordering-.sframe can be placed either before or after text sections.

However, this approach faces limitations with large binaries, particularly when LLVM generates .ltext sections for x86-64. The typical section layout creates significant gaps between .sframe and .ltext:

.ltext          // Large text section
.lrodata        // Large read-only data
.rodata         // Regular read-only data
// .eh_frame and .sframe position
.text           // Regular text section
.data
.bss
.ldata          // Large data
.lbss           // Large BSS

Monolithic section violates section group rule

The current design and assembler implementation generate a monolithic .sframe section with relocations to STB_LOCAL section symbols of multiple text sections. This violates the section group rule, as the ELF specification specifies:

A symbol table entry with STB_LOCAL binding that is defined relative to one of a group's sections, and that is contained in a symbol table section that is not part of the group, must be discarded if the group members are discarded. References to this symbol table entry from outside the group are not allowed.

Generally, if you want to reference a local symbol relative to a section in a COMDAT group, the referencing section should be part of the same group.

The violation can be seen as gold and LLD linker errors

cat > a.cc <<'eof'
[[gnu::noinline]] inline int inl() { return 0; }
auto *fa = inl;
eof
cat > b.cc <<'eof'
[[gnu::noinline]] inline int inl() { return 0; }
auto *fb = inl;
eof
~/opt/gcc-15/bin/g++ -Wa,--gsframe -c a.cc b.cc

% ld.lld a.o b.o
ld.lld: error: relocation refers to a discarded section: .text._Z3inlv
>>> defined in b.o
>>> referenced by b.cc
>>>               b.o:(.sframe+0x1c)

% gold a.o b.o
b.o(.sframe+0x1c): error: relocation refers to local symbol ".text._Z3inlv" [2], which is defined in a discarded section
  section group signature: "inl()"
  prevailing definition is from a.o

This violation represents a significant concern that should be addressed in the next compiler version. To resolve this issue, the assembler should be modified to generate a dedicated SFrame section for each corresponding text section. When a text section belongs to a section group, the associated SFrame section should be placed within the same group. For standalone text sections, utilize the SHF_LINK_ORDER flag to link to the associated text section.

The file header overhead will be amplified, which could be partially addressed by having a dedicated format for assemblers' relocatable output, eliminating certain fields:

sfp_magic, sfh_abi_arch, sfh_num_fdes, sfh_num_fres, sfh_fdeoff, sfh_freoff
sfde_func_padding2

Endianness considerations

The SFrame format currently includes endianness variations that complicate toolchain support. While runtime consumers typically handle a single target endianness, development tools must support both endianness variants and cross-compilation.

A universal little-endian format could reduce implementation complexity, by eliminating the need for

Endianness-aware function calls like read32le(config, p) where config->endian specifies object file endianness
Or template-based abstractions such as template <class Endian> wrapping every relevant function

Instead, code could simply use direct calls like read32le(p), streamlining both implementation and maintenance.

Even on big-endian architectures like IBM z/Architecture and POWER, this approach remains efficient. For example, z/Architecture's LOAD REVERSED instructions handle byte swapping with minimal overhead, often requiring no additional instructions beyond normal loads. While there may be slight performance differences compared to native endian operations, the toolchain simplification benefits typically outweigh these concerns.

#define WIDTH(x) \
typedef __UINT##x##_TYPE__ [[gnu::aligned(1)]] uint##x; \
uint##x load_inc##x(uint##x *p) { return *p+1; } \
uint##x load_bswap_inc##x(uint##x *p) { return __builtin_bswap##x(*p)+1; }; \
uint##x load_eq##x(uint##x *p) { return *p==3; } \
uint##x load_bswap_eq##x(uint##x *p) { return __builtin_bswap##x(*p)==3; }; \

WIDTH(16);
WIDTH(32);
WIDTH(64);

However, my opinion is probably not popular within the object file format community and faces resistance from stakeholders with significant big-endian investments.