[llvm-dev] Pattern transformation between scalar and vector on IR.
Sanjay Patel via llvm-dev
llvm-dev at lists.llvm.org
Mon Sep 26 13:45:00 PDT 2016
I don't know anything specific about the scalar-to-vector problem that you
are seeing, but you might get some ideas about how to solve it by seeing
what x86 does for ops that only exist in vector form: ISD::SCALAR_TO_VECTOR
and some pattern matching in tablegen?
Patches to get reciprocal settings out of target options and into function
attributes are up for review here:
On Mon, Sep 26, 2016 at 6:50 AM, Jojo Ma <jojo.ma at linaro.org> wrote:
> Hi James & all,
> I don't know if you have saw the topic disscusion about Evandro's patch on
> ([llvm] r268539 - [AArch64] Use the reciprocal estimation machinery
> patchset is relate
> to this topic. And Actually I did a draft patch on ARM which exactly the
> same with this before
> starting this thread. Any more discussion about this topic will help me
> doing things on the right way.
> More introduce about what i was doing:
> I rasied this thread because I am looking on Bug 27107
> <https://llvm.org/bugs/show_bug.cgi?id=27107>- LLVM misses reciprocal
> estimate instructions in ISel on ARMv7/8.
> My planning about this task is :
> - Emitting rsqrt[es].
> Necessary preparation. I supposed we may not necessay to give a final
> design at this step. Just be able to emit.
> My patch on ARM is exactly the same with Evandro's. But I was troubled
> with scalar-to-vector
> when I tried to support the operation on f32 in rsqrt as my first
> step(rsqrt[es] of ARM is just for vector).
> I would appreciate it if you could help me take a look at it and show
> me the problems.
> Or would it make sense if I do validating on AArch64 and porting to
> ARM after that?
> - Benchmarking on both strategies.(N-body and others)
> Validating whether the replacing profitable or not.
> - Giving a final decision.
> Hi James, Thanks again for your detailed instroduction. I think what
> you instroduct would be our final goal,
> and changes based on MachineCombiner should be done for this.I have no
> specific progress on this yet.
> Hi Evandro,
> I believe we will be working for the same goal, I think we could work
> together to make a patch that
> not only won't break the LTO but also based on the context. I'm a new
> recruit, any comments from
> this thread or bugzilla are welcome. Thanks!
> Best Regards,
> On 8 September 2016 at 18:59, James Molloy <James.Molloy at arm.com> wrote:
>> The cost model for this transform is really difficult to get right. The
>> latencies and throughputs for VRSQRTE/SQRT vary between microarchitectures
>> but in general it is fair to say that;
>> * There are two possible fast sequences for calculating 1/sqrt(x):
>> a) (1 / x) * sqrt(x) (DIV, SQRT, MUL, where the DIV and SQRT are
>> *independent* and can issue in parallel)
>> b) VRSQRTE + s*(VRSQRTS + MUL + MUL) where s is the number of
>> newton-raphson steps required - 2 for 32-bit floats and 4 for doubles.
>> * SQRT and DIV are iterative instructions and commonly the hardware for
>> this, because it must iterate, is not pipelined.
>> * As a consequence of this, these instructions can also commonly *exit
>> early* if the calculation converges early.
>> * SQRT and DIV will almost always have a shorter latency than the
>> equivalent VRSQRTE sequence due to the sheer number of instructions in that
>> sequence and the early exit capability of SQRT/DIV.
>> So the calculation on which to choose depends on several factors:
>> 1) Is the calculation throughput or latency limited? This loop is
>> throughput limited - the result of the sqrt is not on the cyclic critical
>> path, so we expect to be able to vectorize it or at least look ahead and
>> have the core execute multiple iterations in parallel. We’d probably then
>> want to use VRSQRTE.
>> for (int i = 0; i < n; ++i)
>> p[i] = 1.0 / q[i];
>> This loop is latency limited. Here, we don’t care about throughput
>> as only one iteration can ever be executed in the core at once due to the
>> critical path. We’d want to optimise for latency over anything else, so
>> we’d use SQRT + DIV.
>> for (int i = 0; i < n; ++i)
>> p = 1.0 / p + q[i];
>> 2) Can a SQRT and DIV execute in parallel on the microarchitecture?
>> both these instructions use similar hardware, so it’s possible that they
>> both need the same functional unit that isn’t pipelined. If so, the sqrt()
>> sequence’s latency gets drastically increased and the profitability
>> calculation changes.
>> The major one is latency versus throughput. This is very difficult to do
>> at the IR level, but at the MachineInstr level we have MachineTraceMetrics
>> which is able to analyze loops and obtain their functional unit usage
>> (“resource height”) and critical path length (“depth”). Using these two
>> metrics we can determine if it’d make sense to swap the SQRT for a VRSQRTE.
>> So in summary it is a hard problem with a difficult cost model, that can
>> only reasonably be done at the MachineInstr level.
>> On 8 Sep 2016, at 10:32, Jojo Ma <jojo.ma at linaro.org> wrote:
>> Hi All,
>> I'm tring to use RSQRT instructions on follow case for ARM
>> (now what using is sqrt):
>> 1.0 / sqrt(x)
>> The RSQRT instructions(VRSQRTE/VRSQRTS) are vector type,
>> but above operation is scalar type. So a transformation must be
>> done(transform sqrt pattern to rsqrt).
>> I have completed a patch for this, but I made the transformation in the
>> backend which will leads to additional latencies.And actually it's not
>> reasonable doing transformation in backend.
>> I think it would be better done that on IR. I am a novice to llvm.I don't
>> know anything about this subject. If anyone could
>> give me some advice would be appreciated.
>> Best Regards,
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
> Best Regards,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev