[PATCH] Implement aarch64 neon instruction class SIMD lsone and lsone-post - LLVM

Tue Nov 26 00:03:45 PST 2013

Tim,

Thanks for your detailed explanation, and I can understand your point now
now.

In my mind, LLVM IR intrinsic can be treated as a kind of common LLVM IR
extension to implement special semantics for specific targets. Generally a
common LLVM IR opcode like add/sub can accept different types of operands
as a overload behavior. So comparing to common LLVM IR, it make sense to
overload LLVM IR intrinsic with different types of operands, if only every
overloaded intrinsic can finally map to unique instruction. This sounds
like n:1 mapping from LLVM IR intrinsic to hardware instructions.

For this specific case, I don't really see llvm.arm.neon.vld2lane.v8i8 would
introduce side effects to middle-end and back-end. Semantically it is
almost the same as llvm.arm.neon.vld2lane.v16i8, and we can say they will
be translated into an unique hardware instruction at the very beginning of
compilation stages. As far as the complexity of implementation concerned, I
think they are comparable.

However, I think you are right the 1:1 mapping from LLVM IR intrinsic to
hardware instruction would be helpful to maintain a robust compiler
infrastructure, so now I'm OK to move forward to refactor this piece of
code.

Thanks,
-Jiangning

2013/11/25 Tim Northover <t.p.northover at gmail.com>

> > This sounds good, and we are suggesting using EXTRACT_SUBREG and
> friends, so
> > we are on the same page, right?
>
> I'm not suggesting creating the EXTRACT_SUBREGs as a special-case in
> AArch64 code, but leaving it to generic handling of shuffles. The less
> special-cases we have to deal with, the better.
>
> > I don't understand about this. Can you explicitly point out what
> intrinsics
> > names are really unnecessary?
>
> These two are distinct intrinsics:
>
> { <16 x i8>, <16 x i8> } @llvm.arm.neon.vld2lane.v16i8(i8*, <16 x i8>,
> <16 x i8>, i32, i32)
> { <8 x i8>, <8 x i8> } @llvm.arm.neon.vld2lane.v8i8(i8*, <8 x i8>, <8
> x i8>, i32, i32)
>
> The first maps directly to the AArch64 instruction, the second needs
> backend hacks to put it into a form suitable for precisely the same
> instruction as the first.
>
> Cheers.
>
> Tim.
>

-- 
Thanks,
-Jiangning
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131126/fb050750/attachment.html>