[PATCH] D130258: [AMDGPU] Remove IR SpeculativeExecution pass from codegen pipeline

Thu Jul 21 07:44:22 PDT 2022

foad added a comment.

I've looked at the effect of this patch on our corpus of 10000 graphics shaders (compiled for gfx1030 using a frontend that includes CodeSinking in the IR optimization pipeline).

Out of a total of ~8 million instructions, the instruction frequency deltas look like this:

  -89	2935	2846	v_lshl_add_u32
  -9	3447	3438	v_mad_u64_u32
  -6	59035	59029	s_load_dwordx4
  -4	115017	115013	s_clause
  -3	56723	56720	s_load_dwordx8
  -3	61717	61714	s_mov_b64
  -2	1332	1330	v_cmpx_gt_u32
  -1	2609	2608	v_cmpx_eq_u32
  1	11908	11909	v_cmp_eq_u32
  1	257	258	v_ceil_f32
  1	5859	5860	v_or_b32
  2	1468	1470	v_and_or_b32
  2	25853	25855	v_and_b32
  2	2924	2926	v_cmp_gt_u32
  3	15607	15610	s_and_saveexec_b64
  3	6897	6900	s_load_dwordx16
  4	1211934	1211938	v_mul_f32
  5	14578	14583	s_cbranch_vccnz
  5	23221	23226	s_branch
  5	301760	301765	s_waitcnt
  5	34280	34285	s_and_b64
  5	9682	9687	v_cmp_ngt_f32
  5	98695	98700	s_buffer_load_dword
  9	12325	12334	v_mul_lo_u32
  24	760	784	v_subrev_nc_u32
  29	187496	187525	v_mov_b32
  101	19167	19268	v_lshlrev_b32
  144	43422	43566	v_add_nc_u32

For each instruction the 3 numbers are: (delta from A to B), (A = # of occurrences before this patch), (B = # of occurrences after this patch).

So there are maybe 100 shift/add pairs that are no longer combined to v_lshl_add_u32 due to ending up in different basic blocks, but I have looked at a couple of examples and I don't there is anything systematically wrong there, it is just bad luck (and globalisel might fix it anyway by being able to pattern match across basic blocks). Apart from that the diffs seem to be way down in the noise.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130258/new/

https://reviews.llvm.org/D130258