[cfe-users] eBPF: Odd optimization results with clang-5.0
Yonghong Song via cfe-users
cfe-users at lists.llvm.org
Mon Jan 8 10:36:34 PST 2018
On 1/8/18 5:45 AM, Jiong Wang wrote:
> On 05/01/2018 20:05, Alexei Starovoitov wrote:
>
>> On Fri, Jan 5, 2018 at 7:01 AM, Charlemagne Lasse
>> <charlemagnelasse at gmail.com> wrote:
>>> First thing is the odd way how 8 bit loads to an uint8_t are handled
>>> (see bug1_sec):
>
> I could reproduce both issues on other targets on latest LLVm trunk at 321940,
> for example AArch64 (need to remove asm("llvm.bpf.load.byte") from the
> testcase.
>
> For the first issue, it seems to be i8/i16 will be promoted to i32, so for
> bug1_sec, the sequence is:
>
> t6: i32 = truncate t5
> t8: i32 = and t6, Constant:i32<255>
> t9: i64 = any_extend t8
>
> while for ok1, it is;
>
> t6: i32 = truncate t5
> t9: i64 = any_extend t6
>
> For ok1 sequence, LLVM is doing (aext (trunx x)) -> x, while for bug1_sec
> sequence, LLVM is not doing combination which is correct as it doesn't
> understand the return value of llvm.bpf.load.byte is zero extended to i64
> so combine the bug1_sec sequence will lost the effect of and instruction.
Thanks for investigation.
Looks like the IR before "and" operation is introduced, IR looks like
%call = call i64 @llvm.bpf.load.byte(i8* %0, i64 0)
%conv = trunc i64 %call to i8
%conv1 = zext i8 %conv to i32
ret i32 %conv1
and the "Combine redundant instructions" phase changes it to:
%call = call i64 @llvm.bpf.load.byte(i8* %0, i64 0)
%conv = trunc i64 %call to i32
%conv1 = and i32 %conv, 255
ret i32 %conv1
while for ok1, IR looks like:
%call = call i64 @llvm.bpf.load.byte(i8* %0, i64 0)
%conv = trunc i64 %call to i32
ret i32 %conv
One thing we could do is to do this optimization at BPF backend during
DAG2DAG transformation, since it understands the llvm.bpf.load.byte
semantics.
>
>>> For unknown reasons, the line "6:" was changed from an JNE to an JEQ.
>
> LLVM is doing geneirc canonicalizations inside
>
> lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:
>
> // If the lhs block is the next block, invert the condition so that we can
> // fall through to the lhs instead of the rhs block.
I disabled this optimization and original condition "==" is preserved,
we still have the inefficient code:
Disassembly of section bug2_sec:
bug2:
0: bf 16 00 00 00 00 00 00 r6 = r1
1: b7 07 00 00 00 00 00 00 r7 = 0
2: 30 00 00 00 00 00 00 00 r0 = *(u8 *)skb[0]
3: 15 00 01 00 01 00 00 00 if r0 == 1 goto +1 <LBB4_1>
4: 05 00 04 00 00 00 00 00 goto +4 <LBB4_3>
LBB4_1:
5: 30 00 00 00 01 00 00 00 r0 = *(u8 *)skb[1]
6: b7 07 00 00 15 00 00 00 r7 = 21
7: 15 00 01 00 01 00 00 00 if r0 == 1 goto +1 <LBB4_3>
8: b7 07 00 00 00 00 00 00 r7 = 0
LBB4_3:
9: bf 70 00 00 00 00 00 00 r0 = r7
10: 95 00 00 00 00 00 00 00 exit
Right, the insn 7 and 8 can be removed. But since the "switch" to "cond"
transformation happens in insn selection, it may be too late for
the redundant condition elimination...
More information about the cfe-users
mailing list