[llvm-bugs] [Bug 38691] New: addus/subus-as-native IR can be defeated by optimizer
via llvm-bugs
llvm-bugs at lists.llvm.org
Fri Aug 24 10:53:12 PDT 2018
https://bugs.llvm.org/show_bug.cgi?id=38691
Bug ID: 38691
Summary: addus/subus-as-native IR can be defeated by optimizer
Product: new-bugs
Version: trunk
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: srj at google.com
CC: llvm-bugs at lists.llvm.org
The revision from patch https://reviews.llvm.org/D46179#1211902 (Lowering
addus/subus intrinsics to native IR) requires that IR be emitted in certain
patterns in order to produce paddus/psubus instructions; however, it's not hard
to emit IR patterns that the LLVM optimizer can rearrange such that the
instructions won't be produced, and instead have a much slower combination of
instructions generated.
For example, if user code assembles a vector from smaller pieces (e.g., on
sse2, by loading two 8-byte halves rather than a single 16-byte whole), code
might have formerly been something like:
```
# Do a saturating unsigned add on two <8 x i8> vectors,
# then widen to an <8 x i32> result
%20 = load <8 x i8>
%21 = load <8 x i8>
%22 = shufflevector <8 x i8> %20, <8 x i8> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
%23 = shufflevector <8 x i8> %21, <8 x i8> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
%24 = call <16 x i8> @llvm.x86.sse2.psubus.b(<16 x i8> %22, <16 x i8> %23) #5
%25 = shufflevector <16 x i8> %24, <16 x i8> undef, <8 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
```
To work with this patch, I revised my project's code to emit inline code that
should pattern-match properly (based on the new self-tests for the IR),
something like:
```
%20 = load <8 x i8>
%21 = load <8 x i8>
%22 = shufflevector <8 x i8> %20, <8 x i8> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
%23 = shufflevector <8 x i8> %21, <8 x i8> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
# Here's the inline pattern that should match paddusb
%24 = add <16 x i8> %22, %23
%25 = icmp ugt <16 x i8> %22, %24
%26 = select <16 x i1> %25, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8
-1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16
x i8> %24
#
%25 = shufflevector <16 x i8> %26, <16 x i8> undef, <8 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
```
And, in fact, if I don't use any optimizer passes, this works perfectly.
Unfortunately, the LLVM optimizer passes can do some rearranging of this, e.g.
into a form something like this:
```
%20 = load <8 x i8>
%21 = load <8 x i8>
%22 = add <8 x i8> %20, %16
%23 = shufflevector <8 x i8> %22, <8 x i8> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
%24 = icmp ult <8 x i8> %22, %20
%25 = shufflevector <8 x i1> %24, <8 x i1> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
%26 = select <16 x i1> %25, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8
-1, i8 -1, i8 -1, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef,
i8 undef, i8 undef>, <16 x i8> %23
%27 = shufflevector <16 x i8> %26, <16 x i8> undef, <8 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
```
...which no longer gets recognized as a pattern that produces paddusb, since
the select no longer refers directly to the result of the compare (but rather
to an intermediate shuffle).
Either the recognizer needs to be smarter about this, or there needs to be an
explicit way to emit code that is guaranteed to produce the expected
instruction(s).
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180824/dc0b337d/attachment.html>
More information about the llvm-bugs
mailing list