[llvm-dev] Pattern transformation between scalar and vector on IR.

Mon Sep 26 13:45:00 PDT 2016

I don't know anything specific about the scalar-to-vector problem that you
are seeing, but you might get some ideas about how to solve it by seeing
what x86 does for ops that only exist in vector form: ISD::SCALAR_TO_VECTOR
and some pattern matching in tablegen?

Patches to get reciprocal settings out of target options and into function
attributes are up for review here:
https://reviews.llvm.org/D24816
https://reviews.llvm.org/D24815

On Mon, Sep 26, 2016 at 6:50 AM, Jojo Ma <jojo.ma at linaro.org> wrote:

> Hi James & all,
>
> I don't know if you have saw the topic disscusion about Evandro's patch on
> AArch64
> ([llvm] r268539 - [AArch64] Use the reciprocal estimation machinery
> <http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160919/391140.html>).The
> patchset is relate
> to this topic. And Actually I did a draft patch on ARM which exactly the
> same with this before
> starting this thread. Any more discussion about this topic will help me
> doing things on the right way.
>
> More introduce about what i was doing:
> I rasied this thread because I am looking on Bug 27107
> <https://llvm.org/bugs/show_bug.cgi?id=27107>- LLVM misses reciprocal
> estimate instructions in ISel on ARMv7/8.
> My planning about this task is :
>
>    - Emitting rsqrt[es].
>
>    Necessary preparation. I supposed we may not necessay to give a final
>    design at this step. Just be able to emit.
>    My patch on ARM is exactly the same with Evandro's. But I was troubled
>    with scalar-to-vector
>    when I tried to support the operation on f32 in rsqrt as my first
>    step(rsqrt[es] of ARM is just for vector).
>    I would appreciate it if you could help me take a look at it and show
>    me the problems.
>    https://reviews.llvm.org/D24911
>
>    Or would it  make sense if I do validating on AArch64 and porting to
>    ARM after that?
>
>    - Benchmarking on both strategies.(N-body and others)
>    Validating whether the replacing profitable or not.
>
>    - Giving a final decision.
>    Hi James, Thanks again for your detailed instroduction. I think what
>    you instroduct would be our final goal,
>    and changes based on MachineCombiner should be done for this.I have no
>    specific progress on this yet.
>
>
> Hi Evandro,
>
> I believe we will be working for the same goal, I think we could work
> together to make a patch that
> not only won't break the LTO but also based on the context. I'm a new
> recruit, any comments from
>  this thread or bugzilla are welcome. Thanks!
>
> Best Regards,
> Jojo
>
>
> On 8 September 2016 at 18:59, James Molloy <James.Molloy at arm.com> wrote:
>
>> Hi,
>>
>> The cost model for this transform is really difficult to get right. The
>> latencies and throughputs for VRSQRTE/SQRT vary between microarchitectures
>> but in general it is fair to say that;
>>
>> * There are two possible fast sequences for calculating 1/sqrt(x):
>>   a) (1 / x)  * sqrt(x)  (DIV, SQRT, MUL, where the DIV and SQRT are
>> *independent* and can issue in parallel)
>>   b) VRSQRTE + s*(VRSQRTS + MUL + MUL) where s is the number of
>> newton-raphson steps required - 2 for 32-bit floats and 4 for doubles.
>>
>> * SQRT and DIV are iterative instructions and commonly the hardware for
>> this, because it must iterate, is not pipelined.
>>   * As a consequence of this, these instructions can also commonly *exit
>> early* if the calculation converges early.
>>
>> * SQRT and DIV will almost always have a shorter latency than the
>> equivalent VRSQRTE sequence due to the sheer number of instructions in that
>> sequence and the early exit capability of SQRT/DIV.
>>
>> So the calculation on which to choose depends on several factors:
>>   1) Is the calculation throughput or latency limited? This loop is
>> throughput limited - the result of the sqrt is not on the cyclic critical
>> path, so we expect to be able to vectorize it or at least look ahead and
>> have the core execute multiple iterations in parallel. We’d probably then
>> want to use VRSQRTE.
>>      for (int i = 0; i < n; ++i)
>>        p[i] = 1.0 / q[i];
>>
>>       This loop is latency limited. Here, we don’t care about throughput
>> as only one iteration can ever be executed in the core at once due to the
>> critical path. We’d want to optimise for latency over anything else, so
>> we’d use SQRT + DIV.
>>       for (int i = 0; i < n; ++i)
>>         p = 1.0 / p + q[i];
>>
>>   2) Can a SQRT and DIV execute in parallel on the microarchitecture?
>> both these instructions use similar hardware, so it’s possible that they
>> both need the same functional unit that isn’t pipelined. If so, the sqrt()
>> sequence’s latency gets drastically increased and the profitability
>> calculation changes.
>>
>> The major one is latency versus throughput. This is very difficult to do
>> at the IR level, but at the MachineInstr level we have MachineTraceMetrics
>> which is able to analyze loops and obtain their functional unit usage
>> (“resource height”) and critical path length (“depth”). Using these two
>> metrics we can determine if it’d make sense to swap the SQRT for a VRSQRTE.
>>
>> So in summary it is a hard problem with a difficult cost model, that can
>> only reasonably be done at the MachineInstr level.
>>
>> Cheers,
>>
>> James
>>
>> On 8 Sep 2016, at 10:32, Jojo Ma <jojo.ma at linaro.org> wrote:
>>
>> Hi All,
>>
>> I'm tring to use RSQRT instructions on follow case for ARM
>> (now what using is sqrt):
>>
>>     1.0 / sqrt(x)
>>
>> The RSQRT instructions(VRSQRTE/VRSQRTS) are vector type,
>> but above operation is scalar type. So a transformation must be
>> done(transform sqrt pattern to rsqrt).
>>
>> I have completed a patch for this, but I made the transformation in the
>> backend which will leads to additional latencies.And actually it's not
>> reasonable doing transformation in backend.
>> I think it would be better done that on IR. I am a novice to llvm.I don't
>> know  anything about this subject. If anyone could
>> give me some advice would be appreciated.
>>
>> Thanks!
>>
>> --
>> Best Regards,
>> Jojo
>>
>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>
>
>
> --
> Best Regards,
> Jojo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160926/5219f35a/attachment-0001.html>