[PATCH] D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate
Guozhi Wei via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 1 16:22:20 PDT 2022
Carrot added a comment.
I still failed to reproduce it in plain mode.
But now I understand the problem more clear. It looks this patch triggered some inefficiency in following optimizations.
After the LICM pass, the two versions of IR differs significantly, but our interesting BB is the same.
%236 = fmul float %162, %6, !dbg !2160
%237 = mul nsw i64 %19, %5, !dbg !2161
%238 = getelementptr inbounds float, float* %4, i64 %237, !dbg !2162
%239 = load float, float* %238, align 4, !dbg !2163
%240 = fadd float %239, %236, !dbg !2163
store float %240, float* %238, align 4, !dbg !2163
%241 = fmul float %163, %6, !dbg !2164
%242 = or i64 %19, 1, !dbg !2165 // *
%243 = mul nsw i64 %242, %5, !dbg !2166
%244 = getelementptr inbounds float, float* %4, i64 %243, !dbg !2167
%245 = load float, float* %244, align 4, !dbg !2168
%246 = fadd float %245, %241, !dbg !2168
store float %246, float* %244, align 4, !dbg !2168
%247 = fmul float %164, %6, !dbg !2169
%248 = or i64 %19, 2, !dbg !2170 // *
%249 = mul nsw i64 %248, %5, !dbg !2171
%250 = getelementptr inbounds float, float* %4, i64 %249, !dbg !2172
%251 = load float, float* %250, align 4, !dbg !2173
%252 = fadd float %251, %247, !dbg !2173
store float %252, float* %250, align 4, !dbg !2173
%253 = fmul float %165, %6, !dbg !2174
%254 = or i64 %19, 3, !dbg !2175 // *
%255 = mul nsw i64 %254, %5, !dbg !2176
%256 = getelementptr inbounds float, float* %4, i64 %255, !dbg !2177
...
Notice those or instructions, they are used together with following mul/GEP instructions to access consecutive array elements.
In old version IR, the loop header contains same group of or instructions, GVNPass found this fact, it deletes these or instructions in our interesting BB and reuse the results of those or instructions in loop header. Later SLPVectorize can still understand GEP instructions compute consecutive memory addresses, and vectorized this BB.
In the new version IR, the loop header doesn't contain those or instructions, instead one of the predecessors of this BB contains these or instructions, they look like
BB1:
br %cond, label %BB2, label %BB3
BB2:
...
br label BBX
BB3:
...
%179 = or i64 %24, 1
...
br label BBX
BBX:
// our interesting bb
...
%242 = or i64 %24, 1, !dbg !2165
...
Then GVN insert or instructions to BB2, replaces the or instructions with PHIs in BBX.
BB1:
br %cond, label %BB2, label %BB3
BB2:
...
%161 = or i64 %24, 1
...
br label BBX
BB3:
...
%179 = or i64 %24, 1
...
br label BBX
BBX:
// our interesting bb
%245 = phi i64 [ %161, %BB2 ], [ %179, %BB3 ]
...
%296 = mul nsw i64 %245, %5, !dbg !2155
%297 = getelementptr inbounds float, float* %4, i64 %296, !dbg !2156
...
Although all PHI operands have same value, SLPVectorizer couldn't recognize it, so it can't figure out the GEPs computes consecutive memory addresses, and failed to vectorize this BB.
There are 3 potential solutions.
- GVNPass, if a new PHI's operands have same value, we can move them to dominator, and delete the PHI and its operands.
- InstCombinePass, do the same thing described above, but as a clean up work in a later pass.
- SLPVectorizerPass, we can teach it look into PHI operands, PHI's operands may have same value, we can get more useful information from it.
Which method do you think is better?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D119965/new/
https://reviews.llvm.org/D119965
More information about the llvm-commits
mailing list