I do not use Apple products, but I sometimes like investigating Mach-O as an object file format and my llvm-project changes sometimes need to work around the quirks.
LLVM has a function call tracing system called XRay. It supports many
architectures on Linux and some BSDs but does not support Apple systems.
If the target triple is x86_64-apple-darwin*, you may
notice that Clang will allow you to perform compilation, but linking
will fail. For other architectures, Clang will reject it.
1 | % clang --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c |
So I dove down the rabbit hole.
1 | .section __DATA,xray_instr_map |
.quad Lxray_sled_0-Ltmp0 is represented as a pair of
relocations (llvm-readobj -r a.o)
1 | 0x0 0 3 0 X86_64_RELOC_SUBTRACTOR 0 xray_instr_map |
X86_64_RELOC_SUBTRACTOR is an external relocation
(r_extern==1) where r_symbolnum references a
symbol table entry. Linkers will give an error if
r_extern=0. The symbols with the "L" prefix are called
temporary symbols in LLVMMC and are not present in the symbol table.
LLVM integrated assembler tries to convert the subtractor symbol to an
atom, that is, a non-temporary symbol defined in the same section.
However, since xray_instr_map does not define a
non-temporary symbol, the X86_64_RELOC_SUBTRACTOR
relocation will have no associated symbol and its r_extern
will be 0.
To fix this issue, we need to define a non-temporary symbol. We can
accomplish this by renaming Lxray_sleds_start0 to
lxray_sleds_start0. In LLVMMC,
LinkerPrivateGlobalPrefix is set to "l" for Apple targets.
We can define an overload of
MCContext::createLinkerPrivateTempSymbol(const Twine &Name)
to allow LLVMMC to select an unused symbol starting with
lxray_sleds_start. (There is a pitfall: "ltmp" should be
compiler internal.) For ELF targets, the
MCContext::createLinkerPrivateTempSymbol function creates a
temporary symbol starting with ".L".
Oleksii Lozovskyi reported that the
-fxray-function-index option has been broken.
- (default): no function index
-fxray-function-index: no function index-fno-xray-function-index:xray_fn_idxsection is present
-fxray-function-index was the default. It turns out that
a clangDriver
refactoring accidentally caused this regression, but the negative
variable name was probably the main reason. XRay tests were not great
and there was no driver test to catch this. I fixed
this.
Now that -fxray-function-index is back, we get the
xray_fn_idx section by default. The section contains
entries like the following:
1 | .section __DATA,xray_fn_idx |
BTW: I noticed an old workaround (2015) for ld64 and proposed to remove it: https://reviews.llvm.org/D152831.
These absolute addresses require rebase opcodes in the special
section __LINKEDIT,__rebase. This is not great and I wanted
to fix it back in 2020 but never got around to do it. This motivated
me to actually fix the issue and create https://reviews.llvm.org/D152661 to change the
[start,end) representation to the
(pc_relative_start, size) representation.
My initial attempt somehow wrote something like this. I took a difference of two labels, and right shifted it by 5 to get the number of sleds.
1 | .section __DATA,xray_fn_idx,regular,live_support |
This approach works on ELF targets but not on Mach-O targets due to a pile of assembler issues.
1 | % clang -c --target=x86_64-apple-darwin a.s |
Assembler issues
When assembling an assembly file into an object file, an expression can be evaluated in multiple steps. Two steps are particularly important:
- Parsing time. At this stage, We have a
MCAssemblerobject but noMCAsmLayoutobject. Instruction operands and certain directives like.ifrequire the ability to evaluate an expression early. - Object file writing time. At this stage, we have both a
MCAssemblerobject and aMCAsmLayoutobject. TheMCAsmLayoutobject provides information about the offset of each fragment.
The first issue is not specific to this case and is also encountered
in ELF. The following assembly code should assemble to the hex pairs
01000001, but Clang fails to compute .if .-1b == 3.
1 | % cat x.s |
Jian Cai implemented limited expression folding support to LLVM integrated assembler to support the Linux kernel arm use case.
1 | arch/arm/mm/proc-v7.S:169:143: error: expected absolute expression |
I have added support for MCFillFragment
(.space and .fill) and for A-B, where A is a
pending label (which will be reassigned to a real fragment in
flushPendingLabels()). Now, the LLVM integrated assembler
can successfully assemble x.s when a
MCAssembler object is present. However, evaluation still
does not work without a MCAssembler object, which is
expected.
1 | % llvm-mc x.s -filetype=null |
Then I noticed a potential pitfall for Mach-O in
MCSection::flushPendingLabels. When flushing pending
labels, it did not ensure that the new fragment inherits the previous
atom symbol. I fixed this issue, although I haven't been able to create
a test case to verify this behavior.
After this fix,
.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5 can
be successfully assembled. However, during "direct
object emission", an error
expected relocatable expression will be reported. The issue
is quite subtle.
In the case of direct object emission, where LLVM IR is directly
lowered to an object file bypassing assembly (e.g., using
clang -c a.c instead of clang -c a.s or
clang -c --save-temps a.c), the assembler information is
not used for parsing
(MCStreamer::UseAssemblerInfoForParsing). As a result, the
assembly code
.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5 will
be transformed into a fixup.
During object writing time, we have a MCAsmLayout object
and atom information for fragments. However, the label
Lxray_sleds_end0 will belong to the next fragment, causing
the condition in
MachObjectWriter::isSymbolRefDifferenceFullyResolvedImpl to
fail. In my opinion, it may be necessary to relax the condition in this
case.
Linker dead stripping
To support linker dead stripping, also known as linker garbage
collection, we need to add the S_ATTR_LIVE_SUPPORT
attribute to the two sections xray_instr_map and
xray_fn_idx.
Runtime issue
compiler-rt/lib/xray/xray_trampoline_x86_64.S used
.Ltmp* symbols which are temporary for ELF but
non-temporary for Mach-O. The non-temporary labels become atoms and can
cause bad dead stripping behaviors.
I fixed the problem by using the LOCAL_LABEL macro,
which generates an "L" symbol specifically for Mach-O.
Driver change
After AArch64 works, we can make Clang Driver accept
--target=arm64-apple-darwin for XRay.