[PATCH] D130258: [AMDGPU] Remove IR SpeculativeExecution pass from codegen pipeline
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 21 07:44:22 PDT 2022
foad added a comment.
I've looked at the effect of this patch on our corpus of 10000 graphics shaders (compiled for gfx1030 using a frontend that includes CodeSinking in the IR optimization pipeline).
Out of a total of ~8 million instructions, the instruction frequency deltas look like this:
-89 2935 2846 v_lshl_add_u32
-9 3447 3438 v_mad_u64_u32
-6 59035 59029 s_load_dwordx4
-4 115017 115013 s_clause
-3 56723 56720 s_load_dwordx8
-3 61717 61714 s_mov_b64
-2 1332 1330 v_cmpx_gt_u32
-1 2609 2608 v_cmpx_eq_u32
1 11908 11909 v_cmp_eq_u32
1 257 258 v_ceil_f32
1 5859 5860 v_or_b32
2 1468 1470 v_and_or_b32
2 25853 25855 v_and_b32
2 2924 2926 v_cmp_gt_u32
3 15607 15610 s_and_saveexec_b64
3 6897 6900 s_load_dwordx16
4 1211934 1211938 v_mul_f32
5 14578 14583 s_cbranch_vccnz
5 23221 23226 s_branch
5 301760 301765 s_waitcnt
5 34280 34285 s_and_b64
5 9682 9687 v_cmp_ngt_f32
5 98695 98700 s_buffer_load_dword
9 12325 12334 v_mul_lo_u32
24 760 784 v_subrev_nc_u32
29 187496 187525 v_mov_b32
101 19167 19268 v_lshlrev_b32
144 43422 43566 v_add_nc_u32
For each instruction the 3 numbers are: (delta from A to B), (A = # of occurrences before this patch), (B = # of occurrences after this patch).
So there are maybe 100 shift/add pairs that are no longer combined to v_lshl_add_u32 due to ending up in different basic blocks, but I have looked at a couple of examples and I don't there is anything systematically wrong there, it is just bad luck (and globalisel might fix it anyway by being able to pattern match across basic blocks). Apart from that the diffs seem to be way down in the noise.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D130258/new/
https://reviews.llvm.org/D130258
More information about the llvm-commits
mailing list