[PATCH][AArch64]RE: patches with initial implementation of Neon scalar instructions

Fri Sep 13 00:59:26 PDT 2013

Hi Ana,

>2)      Using v1ix and v1fx to represent Neon scalar data types in the
backend.

For this point, I agree that introduce v1i8 and v1i16 to protect 8 bit and
16 bit integer value type, but why we need v1i32 , v1i64 ,v1f32 and v1f64?

>From my perspective, DAG should only hold operations with value type, but
not a certain register class. Which register class to be used is decided by
compiler after some cost calculation. If we bind v1i32 and v1i64 to FPR,
then it's hard for compiler to make this optimization.

Consider this case, if we introduce  v1i32 and v1i64, what the output value
of EXTRACT_VECTOR_ELT should be? If the value is i32, then only 'umov Wx,
Vx.s[lane]' can be selected(i32 bind to GPR); if the value if v1i32, only
'ins Vx.s[0], Vx.s[lane]' can be selected(v1i32 bind to FPR). This maybe
introduce extra mov instruction even load/store instruction if registers
aren't enough. Better choice is binding i32 to both GPR and FPR, and let
compiler to decide which instruction should be emitted considering context
and register pressure.

Also, f32 and f64 have already bind to FPR with no conflict, I don't know
why we need v1f32 and v1f64. I know v1f64 is used in ACLE intrinsic, but we
can convert it to f64 in LLVM IR, so it should not be seen in both
middleend and backend.

2013/9/13 Ana Pazos <apazos at codeaurora.org>

> Hi Tim and Jiangning,****
>
> ** **
>
> The patches bring up a couple discussion points:****
>
> ** **
>
> **1)      **Type of code generated by ACLE Neon intrinsics****
>
> ** **
>
> From what I have experimented with, to guarantee only Neon code is
> generated for the ACLE Neon intrisics, you will need to use builtins and
> translate those builtins into LLVM intrinsics.****
>
> Otherwise you are vulnerable to the compiler capabilities (e.g.,
> current/future optimizations, data layout changes) and might not generate
> the expected Neon instructions.****
>
> ** **
>
> If this is not a requirement, than the way we generate tests for ACLE Neon
> intrinsics in NeonCodeEmitter needs to be fixed. We cannot auto generate
> “//CHECK” strings with the Neon instructions.****
>
> ** **
>
> **2)      **Using v1ix and v1fx to represent Neon scalar data types in
> the backend. ****
>
> ** **
>
> This is the important decision we need to make soon.****
>
> ** **
>
> ARMv8 supports 64, 32, 16 and 8 bit scalar operations in Neon.****
>
> ** **
>
> I think the compiler should be able to distinguish when to generate Neon
> scalar from non-Neon scalar operations.****
>
> ** **
>
> How to achieve that without defining different data types? Only through
> using Neon intrinsics?****
>
> ** **
>
> Regarding impact on middle end optimizations effectiveness:****
>
> This is my understanding. Tim and others, correct me if I got it wrong.***
> *
>
> The data layout string defined for AArch64 only contains 32 and 64 as
> native types. ****
>
> See AArch64TargetInfo::DescriptionString in
> tools\clang\lib\Basic\Targets.cpp: *n32:64*****
>
> The middle end uses this data layout information to perform the
> optimizations.****
>
> Right now it promotes sub-word data types to 32-bit. You can see the
> generation of “sext” IR operations when you emit LLVM code. I do not see it
> doing sub-word optimizations.****
>
> * *
>
> If this data layout is in the future changed to *n8:16:32:64 *and we use
> ixx and fxx for Neon scalar types, we will have more mix of Neon and
> Non-neon code, more copy operations between Neon and Non-neon registers
> which can have a bad impact on performance.****
>
> ** **
>
> Hope Tim and the community can give me some more guidance in this area.***
> *
>
> ** **
>
> Thanks,****
>
> Ana.****
>
> ** **
>
> *From:* Jiangning Liu [mailto:liujiangning1 at gmail.com]
> *Sent:* Wednesday, September 11, 2013 11:32 PM
> *To:* Ana Pazos
> *Cc:* Tim Northover; rajav at codeaurora.org; llvm-commits;
> cfe-commits at cs.uiuc.edu
> *Subject:* Re: [PATCH][AArch64]RE: patches with initial implementation of
> Neon scalar instructions****
>
> ** **
>
> Ana,****
>
> ** **
>
> I personally think acle functions for neon should be expected to generate
> neon instruction, because it would be able to ask compiler to generate
> special instructions supporting complex functionality.****
>
> ** **
>
> The test case given by Tim should be able to still generate "add d0, d1,
> d0", if you define vaddd_s64 using vadd_s64, rather than using an IR
> intrinsic.****
>
> ** **
>
> Since most of middle end optimizations are based on scalar data type, if
> we use v1ixx instead of ixx, do we have any scenario to lose optimization
> opportunities in middle end? Or we don't care about that at all, because
> this is being introduced by acle intrinsics. I'm also fine with this
> conclusion.****
>
> ** **
>
> Thanks,****
>
> -Jiangning****
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
>

-- 
Best Regards,

Kevin Qin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130913/3be36e04/attachment.html>