[llvm-bugs] [Bug 34694] New: Improve lshr on <16 x i8> with variable shift amounts on X86
via llvm-bugs
llvm-bugs at lists.llvm.org
Thu Sep 21 13:41:01 PDT 2017
https://bugs.llvm.org/show_bug.cgi?id=34694
Bug ID: 34694
Summary: Improve lshr on <16 x i8> with variable shift amounts
on X86
Product: libraries
Version: 5.0
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: llvm at henning-thielemann.de
CC: llvm-bugs at lists.llvm.org
I want to shift byte/character vectors by variable amounts.
In the first case all bytes shall be shifted by the same amount,
in the second case all bytes shall be shifted by individual amounts:
define <16 x i8> @shift_vector_uniform(<16 x i8>, i8) {
_L1:
%nv = insertelement <1 x i8> undef, i8 %1, i32 0
%shift = shufflevector <1 x i8> %nv, <1 x i8> undef, <16 x i32>
zeroinitializer
%v = lshr <16 x i8> %0, %shift
ret <16 x i8> %v
}
define <16 x i8> @shift_vector_mixed(<16 x i8>, <16 x i8>) {
_L1:
%v = lshr <16 x i8> %0, %1
ret <16 x i8> %v
}
X86 does not allow to shift byte vectors, thus LLVM's backend must work with
shift of i16 vectors. It does so using three shifts psrlw $4, %xmm0; psrlw $2,
%xmm0; psrlw $1, %xmm0.
The result looks pretty lengthy. I have not tried but I think we can do that
better.
The first case should need only one psrlw and an appropriate mask. We could
create the required mask by a scalar shift which we then broadcast to all
vector elements.
We could implement the second case using two psrlw's. One applied to the even
and one to the odd vector elements and then blend the results.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170921/42631a07/attachment.html>
More information about the llvm-bugs
mailing list