[PATCH] [AArch64] Support ISD::SIGN_EXTEND_INREG
Ana Pazos
apazos at codeaurora.org
Wed Jan 8 18:45:43 PST 2014
Thanks Jiangning,
It looks like the failing patterns are corrected in this patch.
I will rerun the benchmarks with -mfpu=neon using this patch and let you
know if I encounter failures.
Your solution is fine, but I think doing it at the time of lowering
sext_inreg would be simpler, since you do not have to re-discover that the
pattern SHL/SHRA is doing a sext_inreg.
Thanks,
Ana.
From: Jiangning Liu [mailto:liujiangning1 at gmail.com]
Sent: Tuesday, January 07, 2014 11:22 PM
To: Ana Pazos
Cc: llvm-commits at cs.uiuc.edu for LLVM; mcrosier at codeaurora.org
Subject: Re: [PATCH] [AArch64] Support ISD::SIGN_EXTEND_INREG
Hi Ana,
I'm attaching a new patch and now we can generate more SSHLL instructions.
Refer to my test cases, please!
However, there is a hole in lowering shuffle_vector, so we couldn't generate
instruction uzip1 yet.
Kevin is giving follow-up and will upstream a separate patch to generate
uzip1, and he will also adding more CHECK in my test case to capture this
instruction.
Thanks,
-Jiangning
2014/1/8 Jiangning Liu <liujiangning1 at gmail.com>
Hi Ana,
Consider more about this optimization, and now I think the second patch I
gave is incorrect.
Basically we have the following three cases to cover,
1) sext_inreg(v2i16, v2i8)
sshll v0.8h, v0.8b, #0
shuffle_vector(<8xi16>, <8xi16>, <i32, i32> (0, 4))
2) sext_inreg(v4i16, v4i8)
sshll v0.8h, v0.bb, #0
shuffle_vector(<8xi16>, <8xi16>, <i32, i32, i32, i32> (0, 2, 4, 6))
3) sext_inreg(v2i32, v2i16)
sshll v0.4s, v0.4h, #0
shuffle_vector(<4xi32>, <4xi32>, <i32, i32> (0, 2))
All other cases should have been covered by the 1st patch.
Thanks,
-Jiangning
2014/1/7 Jiangning Liu <liujiangning1 at gmail.com>
Ana,
I see your point now.
Actually
with my patch
sign_extend_inreg(v8i16, v8i8) can generate SXTL(8b->8h) as shown with my
test case below,
define <8 x i8> @test_sext_inreg_v8i8i16(<8 x i8> %v1, <8 x i8> %v2)
nounwind readnone {
; CHECK-LABEL: test_sext_inreg_v8i8i16
; CHECK: sshll v0.8h, v0.8b, #0
; CHECK: sshll v1.8h, v1.8b, #0
%1 = sext <8 x i8> %v1 to <8 x i16>
%2 = sext <8 x i8> %v2 to <8 x i16>
%3 = shufflevector <8 x i16> %1, <8 x i16> %2, <8 x i32> <i32 0, i32 2,
i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
%4 = trunc <8 x i16> %3 to <8 x i8>
ret <8 x i8> %4
}
And
sign_extend_inreg(v2i64, v2i32) doesn't exist, because we always use
sign_extend(v2i64, v2i32) to solve it as shown by test case below,
define <2 x i32> @test_sext_inreg_v2i32i64(<2 x i32> %v1, <2 x i32> %v2)
nounwind readnone {
; CHECK-LABEL: test_sext_inreg_v2i32i64
; CHECK: sshll v0.2d, v0.2s, #0
; CHECK: sshll v1.2d, v1.2s, #0
%1 = sext <2 x i32> %v1 to <2 x i64>
%2 = sext <2 x i32> %v2 to <2 x i64>
%3 = shufflevector <2 x i64> %1, <2 x i64> %2, <2 x i32> <i32 0, i32 2>
%4 = trunc <2 x i64> %3 to <2 x i32>
ret <2 x i32> %4
}
However, yes
,
sign_extend_inreg(v2i32, v2i16) would be an issue, so I modified my patch as
attached and changed the test test_sext_inreg_v2i16i32 to be like below by
using sshll instruction.
define <2 x i16> @test_sext_inreg_v2i16i32(<2 x i16> %v1, <2 x i16> %v2)
nounwind readnone {
; CHECK-LABEL: test_sext_inreg_v2i16i32
; CHECK: sshll v0.4s, v0.4h, #0
; CHECK: sshll v1.4s, v1.4h, #0
%1 = sext <2 x i16> %v1 to <2 x i32>
%2 = sext <2 x i16> %v2 to <2 x i32>
%3 = shufflevector <2 x i32> %1, <2 x i32> %2, <2 x i32> <i32 0, i32 2>
%4 = trunc <2 x i32> %3 to <2 x i16>
ret <2 x i16> %4
}
The solution is by doing combine to capture this special sha/shl pair. Do we
have more missing cases?
Thanks,
-Jiangning
2014/1/7 Ana Pazos <apazos at codeaurora.org>
Hi Jiangning,
The test cases I see failure are
sign_extend_inreg(v2i32, v2i16) and
sign_extend_inreg(v4i16, v4i8) - sorry I had a typo v8i8 but I meant
v4i8 which confused you.
So it seems your patch addresses both cases I was concerned about.
But for such cases I think the SXTL instruction could be used instead of the
combo shift right + shift left.
For example sign_extend_inreg(v2i32, v2i16):
- Input are 16-bit values in a 2S register
- Reinterpret register as 4H register
- SXTL (4S <- 4H)
- Ins/uzp1 (to extract the vector indexes 0, 2 we need into a 2S
register)
The same can be done for sign_extend_inreg(v8i16, v8i8) and
sign_extend_inreg(v2i64, v2i32).
I think in some cases the extraction of vector indexes we are interested in
will be a no-op and an instruction will be saved.
I am just suggesting to use a hardware instruction that does the sign
extension for those vector types it supports.
Do you agree?
Thanks,
Ana.
From: Jiangning Liu [mailto:liujiangning1 at gmail.com]
Sent: Sunday, January 05, 2014 10:44 PM
To: Ana Pazos
Cc: llvm-commits at cs.uiuc.edu for LLVM; mcrosier at codeaurora.org
Subject: Re: [PATCH] [AArch64] Support ISD::SIGN_EXTEND_INREG
Hi
Ana,
Sorry, I don't quite understand what you said. Do you have a small test to
articulate what you mentioned?
For sign_extend_inreg(v2i32, v2i16), my test case below should show my patch
work,
define <2 x i16> @test_sext_inreg_v2i16i32(<2 x i16> %v1, <2 x i16> %v2)
nounwind readnone {
; CHECK-LABEL: test_sext_inreg_v2i16i32
; CHECK: shl v0.2s, v0.2s, #16
; CHECK: sshr v0.2s, v0.2s, #16
; CHECK: shl v1.2s, v1.2s, #16
; CHECK: sshr v1.2s, v1.2s, #16
%1 = sext <2 x i16> %v1 to <2 x i32>
%2 = sext <2 x i16> %v2 to <2 x i32>
%3 = shufflevector <2 x i32> %1, <2 x i32> %2, <2 x i32> <i32 0, i32 2>
%4 = trunc <2 x i32> %3 to <2 x i16>
ret <2 x i16> %4
}
For sign_extend_inreg(v4i16, v8i8), is this a valid? I thought it should be
sign_extend_inreg(v8i16, v8i8). If this is the case, my test below should
also show my patch work,
define <8 x i8> @test_sext_inreg_v8i8i16(<8 x i8> %v1, <8 x i8> %v2)
nounwind readnone {
; CHECK-LABEL: test_sext_inreg_v8i8i16
; CHECK: sshll v0.8h, v0.8b, #0
; CHECK: sshll v1.8h, v1.8b, #0
%1 = sext <8 x i8> %v1 to <8 x i16>
%2 = sext <8 x i8> %v2 to <8 x i16>
%3 = shufflevector <8 x i16> %1, <8 x i16> %2, <8 x i32> <i32 0, i32 2,
i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
%4 = trunc <8 x i16> %3 to <8 x i8>
ret <8 x i8> %4
}
Thanks,
-Jiangning
--
Thanks,
-Jiangning
--
Thanks,
-Jiangning
--
Thanks,
-Jiangning
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140108/8d8e4526/attachment.html>
More information about the llvm-commits
mailing list