[llvm-dev] LLVM trunk generates different machine code for JCC instruction w/ or w/o debug info
Fangrui Song via llvm-dev
llvm-dev at lists.llvm.org
Tue Dec 29 12:09:12 PST 2020
On 2020-12-29, Neil Nelson via llvm-dev wrote:
>Bug 37728 - [meta] Make llvm passes debug info invariant
>https://bugs.llvm.org/show_bug.cgi?id=37728
>
>Further discussion on methods.
>https://groups.google.com/g/llvm-dev/c/yvbWr4azdh0/m/gy1tQIzIDwAJ
>
>Neil Nelson
Thanks for the links:)
>On 12/29/20 7:25 AM, 陈志伟 via llvm-dev wrote:
>>Hi folks, it’s my first post in llvm-dev mailing list, and
>>definitely not the last :-)
>>
>>Recently, I found an elf file built with or without debug info has
>>different machine code generated. Sadly, it cannot be reproduced in
>>a piece of code. Here is my investigation.
>>
>>> clang -S -emit-llvm foo.cc <http://foo.cc> -O3 -ggdb3 -o dbg.ll
>>> clang -S -emit-llvm foo.cc <http://foo.cc> -O3 -o rel.ll
>>
>>Where foo.cc <http://foo.cc> is a cc file in my company of 10k+ LOC
>>and depends on tons of 3rd libraries.
>>
>>The difference between dbg.ll and rel.ll are the llvm debug
>>intrinsics. Emmmm, looks fine.
>>
>>> llc dbg.ll -o dbg.s
>>> llc rel.ll -o rel.s
>>
>>And the asm instructions are the same. Emmm, fine again.
>>
>>> llvm-mc -filetype=obj dbg.s -o dbg.o
>>> llvm-mc -filetype=obj rel.s -o rel.o
>>
>>The 2 obj files generated by LLVM assembler has DIFFERENT machine codes.
>>
>>> 74 19 je f20
>>
>>The obj compiled with debug info use 0x74 to represent a JE
>>instruction, while
>>
>>> 0f 84 15 00 00 00 je f20
>>
>>The obj compiled without debug info use 0x0f 0x84 instead.
>>
>>What? Why the debug info affects the generation of machine code? As
>>a LLVM beginner, I’m willing to dive deeper to find the root cause.
>>
>>Thanks in advance.
llvm.dbg.* are intrinsics (subset of Instruction).
DbgInfoIntrinsic
DbgLabelInst
DbgVariableIntrinsic
DbgValueInst: llvm.dbg.value
DbgAddrIntrinsic: llvm.dbg.addr
DbgDeclareInst: llvm.dbg.declare (similar to llvm.dbg.addr, but not control-dependent)
It is very easy to forget accounting for their existence in an optimization pass.
for (Instruction &I : BB) {
if (isa<DbgInfoIntrinsic>(I))
continue;
...
}
for (Instruction &I : instructions(F)) {
if (isa<DbgInfoIntrinsic>(I))
continue;
...
}
If an optimization pass does not skip llvm.dbg.* but makes their occurrences affect its heuristics (for example, counting the number of instructions in a basic block), the transformation result may be different with and w/o llvm.dbg.*.
GCC has -fcompare-debug and it seems that in the past they had fought diligently with the debug-affecting-codegen problems as well. (I am happy to take a stab at implementing it if others think it is mildly useful)
It is not clear how serious the problem in LLVM is. If for example, the llvm-project codebase can be fixed relatively easily, we probably could add a built bot to detect new problems.
Yes, reduce the source with some tools like creduce is important.
With the new pass manager (-fno-legacy-pass-manager, which will hopefully become the default in the next release),
you can dump changed IR with -print-changed, e.g.
clang -fno-legacy-pass-manager -mllvm -print-changed -S -O2 a.c 2> log
This is usually more readable than -print-after-all.
More information about the llvm-dev
mailing list