[cfe-users] eBPF: Odd optimization results with clang-5.0

Mon Jan 8 10:36:34 PST 2018

On 1/8/18 5:45 AM, Jiong Wang wrote:
> On 05/01/2018 20:05, Alexei Starovoitov wrote:
> 
>> On Fri, Jan 5, 2018 at 7:01 AM, Charlemagne Lasse
>> <charlemagnelasse at gmail.com> wrote:
>>> First thing is the odd way how 8 bit loads to an uint8_t are handled
>>> (see bug1_sec):
> 
> I could reproduce both issues on other targets on latest LLVm trunk at 321940,
> for example AArch64 (need to remove asm("llvm.bpf.load.byte") from the
> testcase.
> 
> For the first issue, it seems to be i8/i16 will be promoted to i32, so for
> bug1_sec, the sequence is:
> 
>          t6: i32 = truncate t5
>        t8: i32 = and t6, Constant:i32<255>
>      t9: i64 = any_extend t8
> 
> while for ok1, it is;
> 
>          t6: i32 = truncate t5
>      t9: i64 = any_extend t6
> 
> For ok1 sequence, LLVM is doing (aext (trunx x)) -> x, while for bug1_sec
> sequence, LLVM is not doing combination which is correct as it doesn't
> understand the return value of llvm.bpf.load.byte is zero extended to i64
> so combine the bug1_sec sequence will lost the effect of and instruction.

Thanks for investigation.

Looks like the IR before "and" operation is introduced, IR looks like
   %call = call i64 @llvm.bpf.load.byte(i8* %0, i64 0)
   %conv = trunc i64 %call to i8
   %conv1 = zext i8 %conv to i32
   ret i32 %conv1

and the "Combine redundant instructions" phase changes it to:
   %call = call i64 @llvm.bpf.load.byte(i8* %0, i64 0)
   %conv = trunc i64 %call to i32
   %conv1 = and i32 %conv, 255
   ret i32 %conv1

while for ok1, IR looks like:
   %call = call i64 @llvm.bpf.load.byte(i8* %0, i64 0)
   %conv = trunc i64 %call to i32
   ret i32 %conv

One thing we could do is to do this optimization at BPF backend during
DAG2DAG transformation, since it understands the llvm.bpf.load.byte
semantics.

> 
>>> For unknown reasons, the line "6:" was changed from an JNE to an JEQ.
> 
> LLVM is doing geneirc canonicalizations inside
> 
> lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:
> 
> // If the lhs block is the next block, invert the condition so that we can
> // fall through to the lhs instead of the rhs block.

I disabled this optimization and original condition "==" is preserved,
we still have the inefficient code:
Disassembly of section bug2_sec:
bug2:
        0:       bf 16 00 00 00 00 00 00         r6 = r1
        1:       b7 07 00 00 00 00 00 00         r7 = 0
        2:       30 00 00 00 00 00 00 00         r0 = *(u8 *)skb[0]
        3:       15 00 01 00 01 00 00 00         if r0 == 1 goto +1 <LBB4_1>
        4:       05 00 04 00 00 00 00 00         goto +4 <LBB4_3>

LBB4_1:
        5:       30 00 00 00 01 00 00 00         r0 = *(u8 *)skb[1]
        6:       b7 07 00 00 15 00 00 00         r7 = 21
        7:       15 00 01 00 01 00 00 00         if r0 == 1 goto +1 <LBB4_3>
        8:       b7 07 00 00 00 00 00 00         r7 = 0

LBB4_3:
        9:       bf 70 00 00 00 00 00 00         r0 = r7
       10:       95 00 00 00 00 00 00 00         exit

Right, the insn 7 and 8 can be removed. But since the "switch" to "cond"
transformation happens in insn selection, it may be too late for
the redundant condition elimination...