[PATCH] D47113: [CVP] Teach CorrelatedValuePropagation to reduce the width of lshr instruction.
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu May 24 07:27:03 PDT 2018
dmgreen added a comment.
So yes, I ran some quick benchmarks and I believe this will cause regressions in some circumstances. In one case I looked at (which is running under our special LTO pipeline and may be a little difficult to replicate), we start off with this:
%shr = lshr i32 %sub, 6
%arrayidx = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr
This is turned into:
%shr.lhs.trunc = trunc i32 %sub to i16
%shr.rhs.trunc = trunc i32 6 to i16
%shr = lshr i16 %shr.lhs.trunc, %shr.rhs.trunc
%shr.zext = zext i16 %shr to i32
%arrayidx = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr.zext
Which gets turned right back into:
%shr = lshr i32 %sub, 6
%shr.zext = and i32 %shr, 1023
%arrayidx11 = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr.zext
I think extra And node will, under most circumstances, be removed during isel. But here this is part of a loop, and the extra cost causes us to go over the loop unroll threshold, so the loop is no longer fully unrolled.
Another case on v6m (thumb1only) looks more like a simple extra instruction in the final assembly. In either case the extra And 1023 seems to only be causing trouble.
I'm running some more benchmarks and will see what happens on other cores/benchmarks.
Repository:
rL LLVM
https://reviews.llvm.org/D47113
More information about the llvm-commits
mailing list