[PATCH] D80863: [WebAssembly] Eliminate range checks on br_tables
Paolo Severini via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 8 01:02:59 PDT 2020
paolosev added a comment.
I was testing the performance of a program with a big switch statement in a loop, a very common pattern in C/C++ and I came across the problem with needless range checks with br_table that this patch is fixing (Sweet! :-))
However, testing the latest version of Emscripten, with this fix, I am finding that Clang now emits "worse" bytecode for my small test program (attached{F12106893 <https://reviews.llvm.org/F12106893>}, compiled with `emcc -O3 micro-interp.c -o micro-interp.js`).
The code is like:
enum Op {
A = 0, B, C, D, E, F, G, H, I, J, K, L
};
int f(Op* ops, int len) {
int result = 0;
for (int i = 0; i < len; i++) {
Op op = ops[i];
switch (op) {
case A: { ... break; }
case B: { ... break; }
case C: { ... break; }
...
default: { ... break; }
}
}
return result;
}
Before this change, the function was compiled into this Wasm code, with a single `br_table` that was not using its default label (see WAT file attached: F12107110: emcc-0-base.OLD.wat <https://reviews.llvm.org/F12107110>):
(func (;11;) (type 2) (param i32)
(local i32 i32 i32 i32 i32 i64)
...
block ;; label = @1
block ;; label = @2
loop ;; label = @3
local.get 4
i32.load16_u
local.tee 3
i32.const 11
i32.gt_u
br_if 1 (;@2;)
block ;; label = @4
block ;; label = @5
block ;; label = @6
block ;; label = @7
block ;; label = @8
block ;; label = @9
block ;; label = @10
block ;; label = @11
block ;; label = @12
block ;; label = @13
block ;; label = @14
local.get 3
br_table 10 (;@4;) 0 (;@14;) 1 (;@13;) 2 (;@12;) 3 (;@11;) 4 (;@10;) 5 (;@9;) 6 (;@8;) 7 (;@7;) 8 (;@6;) 13 (;@1;) 9 (;@5;) 10 (;@4;)
end
// case 1
end
// case 2
end
...
The x64 code jitted by V8 for the switch was reasonably compact, though not-optimal:
00000000BD10FA43 83 460fb73423 movzxwl r14,[rbx+r12*1]
00000000BD10FA48 88 4183fe0b cmpl r14,0xB
00000000BD10FA4C 8c 0f8741020000 ja 00000000BD10FC93 // jmp to default case
00000000BD10FA52 92 4183ee0 subl r14,0x1
00000000BD10FA56 96 458bf6 movl r14,r14
00000000BD10FA59 99 4183fe0b cmpl r14,0xB
00000000BD10FA5D 9d 0f830d000000 jnc 00000000BD10FA70 // jmp to br_table default label
00000000BD10FA63 a3 4c8d1556030000 leaq r10,[rip+0x356]
00000000BD10FA6A aa 43ff24f2 jmp [r10+r14*8] // br_table jump
There were two checks (`cmp`/`jmp`), the first for the switch default case, the second for the implementation of br_table.
But now with this change I see this code being generated (see WAT file attached: F12107026: emcc-0-base.NEW.wat <https://reviews.llvm.org/F12107026>)
(func (;6;) (type 6) (param i32)
(local i32 i32 i32 i32 i64 i32)
...
block ;; label = @1
block ;; label = @2
block ;; label = @3
block ;; label = @4
block ;; label = @5
block ;; label = @6
block ;; label = @7
block ;; label = @8
block ;; label = @9
block ;; label = @10
block ;; label = @11
block ;; label = @12
block ;; label = @13
block ;; label = @14
local.get 4
i32.load16_u
br_table 1 (;@13;) 0 (;@14;) 2 (;@12;) 3 (;@11;) 4 (;@10;) 5 (;@9;) 6 (;@8;) 7 (;@7;) 8 (;@6;) 9 (;@5;) 13 (;@1;) 10 (;@4;) 12 (;@2;)
end
i32.const 0
local.set 3
br 10 (;@3;)
end
i32.const 10
local.set 3
br 9 (;@3;)
end
i32.const 1
local.set 3
br 8 (;@3;)
end
i32.const 2
local.set 3
br 7 (;@3;)
end
i32.const 3
local.set 3
br 6 (;@3;)
end
i32.const 4
local.set 3
br 5 (;@3;)
end
i32.const 5
local.set 3
br 4 (;@3;)
end
i32.const 6
local.set 3
br 3 (;@3;)
end
i32.const 7
local.set 3
br 2 (;@3;)
end
i32.const 8
local.set 3
br 1 (;@3;)
end
i32.const 9
local.set 3
end
loop ;; label = @3
block ;; label = @4
block ;; label = @5
block ;; label = @6
block ;; label = @7
block ;; label = @8
block ;; label = @9
block ;; label = @10
block ;; label = @11
block ;; label = @12
block ;; label = @13
block ;; label = @14
block ;; label = @15
block ;; label = @16
block ;; label = @17
block ;; label = @18
block ;; label = @19
block ;; label = @20
block ;; label = @21
block ;; label = @22
block ;; label = @23
block ;; label = @24
block ;; label = @25
local.get 3
br_table 0 (;@25;) 1 (;@24;) 2 (;@23;) 3 (;@22;) 4 (;@21;) 5 (;@20;) 6 (;@19;) 7 (;@18;) 8 (;@17;) 9 (;@16;) 10 (;@15;) 10 (;@15;)
end
// ... case 0
...
br_table 19 (;@5;) 20 (;@4;) 10 (;@14;) 11 (;@13;) 12 (;@12;) 13 (;@11;) 14 (;@10;) 15 (;@9;) 16 (;@8;) 17 (;@7;) 23 (;@1;) 18 (;@6;) 22 (;@2;)
end
// ... case 1
br_table 18 (;@5;) 19 (;@4;) 9 (;@14;) 10 (;@13;) 11 (;@12;) 12 (;@11;) 13 (;@10;) 14 (;@9;) 15 (;@8;) 16 (;@7;) 22 (;@1;) 17 (;@6;) 21 (;@2;)
end
...
There is a `br_table` that causes an indirect jump to a stub like `i32.const X` `local.set 3` `br Y` that jumps to the actual code for the case branches, which all end with other strange br_tables.
Obviously also the native code jitted is much more convoluted. The result is that my small benchmark is 34% slower (33.9 sec vs 25.2).
I see that this issue disappears if I comment away `addPass(createWebAssemblyFixBrTableDefaults());` in WebAssemblyPassConfig::addInstSelector() and reintroduce `Ops.push_back(DAG.getBasicBlock(MBBs[0]))` in WebAssemblyTargetLowering::LowerBR_JT().
Maybe there is something wrong with my configuration, in my machine, but I double-tested this reinstalling the latest Emscripten, and can reliably reproduce the problem.
Could you take a look?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D80863/new/
https://reviews.llvm.org/D80863
More information about the llvm-commits
mailing list