[PATCH] D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate

Fri Apr 1 16:22:20 PDT 2022

Carrot added a comment.

I still failed to reproduce it in plain mode.

But now I understand the problem more clear. It looks this patch triggered some inefficiency in following optimizations.

After the LICM pass, the two versions of IR differs significantly, but our interesting BB is the same.

  %236 = fmul float %162, %6, !dbg !2160
  %237 = mul nsw i64 %19, %5, !dbg !2161
  %238 = getelementptr inbounds float, float* %4, i64 %237, !dbg !2162
  %239 = load float, float* %238, align 4, !dbg !2163
  %240 = fadd float %239, %236, !dbg !2163
  store float %240, float* %238, align 4, !dbg !2163
  %241 = fmul float %163, %6, !dbg !2164
  %242 = or i64 %19, 1, !dbg !2165                                             // *
  %243 = mul nsw i64 %242, %5, !dbg !2166
  %244 = getelementptr inbounds float, float* %4, i64 %243, !dbg !2167
  %245 = load float, float* %244, align 4, !dbg !2168
  %246 = fadd float %245, %241, !dbg !2168
  store float %246, float* %244, align 4, !dbg !2168
  %247 = fmul float %164, %6, !dbg !2169
  %248 = or i64 %19, 2, !dbg !2170                                             // *
  %249 = mul nsw i64 %248, %5, !dbg !2171
  %250 = getelementptr inbounds float, float* %4, i64 %249, !dbg !2172
  %251 = load float, float* %250, align 4, !dbg !2173
  %252 = fadd float %251, %247, !dbg !2173
  store float %252, float* %250, align 4, !dbg !2173
  %253 = fmul float %165, %6, !dbg !2174
  %254 = or i64 %19, 3, !dbg !2175                                            // *
  %255 = mul nsw i64 %254, %5, !dbg !2176
  %256 = getelementptr inbounds float, float* %4, i64 %255, !dbg !2177
  ...

Notice those or instructions, they are used together with following mul/GEP instructions to access consecutive array elements.

In old version IR, the loop header contains same group of or instructions, GVNPass found this fact, it deletes these or instructions in our interesting BB and reuse the results of those or instructions in loop header. Later SLPVectorize can still understand GEP instructions compute consecutive memory addresses, and vectorized this BB.

In the new version IR, the loop header doesn't contain those or instructions, instead one of the predecessors of this BB contains these or instructions, they look like

  BB1:
     br %cond, label %BB2, label %BB3

  BB2:
     ...
     br label BBX

  BB3:
     ...
    %179 = or i64 %24, 1
    ...
     br label BBX

  BBX:
     // our interesting bb
    ...
    %242 = or i64 %24, 1, !dbg !2165
    ...

Then GVN insert or instructions to BB2, replaces the or instructions with PHIs in BBX.

  BB1:
     br %cond, label %BB2, label %BB3

  BB2:
     ...
    %161 = or i64 %24, 1
    ...
     br label BBX

  BB3:
     ...
    %179 = or i64 %24, 1
    ...
     br label BBX

  BBX:
     // our interesting bb
    %245 = phi i64 [ %161, %BB2 ], [ %179, %BB3 ]
    ...
    %296 = mul nsw i64 %245, %5, !dbg !2155
    %297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
    ...

Although all PHI operands have same value, SLPVectorizer couldn't recognize it, so it can't figure out the GEPs computes consecutive memory addresses, and failed to vectorize this BB.

There are 3 potential solutions.

- GVNPass,  if a new PHI's operands have same value, we can move them to dominator, and delete the PHI and its operands.
- InstCombinePass, do the same thing described above, but as a clean up work in a later pass.
- SLPVectorizerPass, we can teach it look into PHI operands, PHI's operands may have same value, we can get more useful information from it.

Which method do you think is better?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119965/new/

https://reviews.llvm.org/D119965