[llvm-dev] Enabling scalarized conditional stores in the loop vectorizer

Michael Kuperstein via llvm-dev llvm-dev at lists.llvm.org
Thu Dec 15 08:49:34 PST 2016


SGTM.

On Dec 15, 2016 08:09, "Matthew Simpson" <mssimpso at codeaurora.org> wrote:

> If there are no objections, I'll submit a patch for review that sets the
> default value of "-enable-cond-stores-vec" to "true". Thanks!
>
> -- Matt
>
>
> On Wed, Dec 14, 2016 at 12:55 PM, Michael Kuperstein via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I haven't verified what Matt described is what actually happens, but
>> assuming it is - that is a known issue in the x86 cost model.
>>
>> Vectorizing interleaved memory accesses on x86 was, until recently,
>> disabled by default. It's been enabled since r284779, but the cost model is
>> very conservative, and basically assumes we're going to scalarize
>> interleaved ops.
>>
>> I believe Farhana is working on improving that.
>>
>> Michael
>>
>>
>> On Wed, Dec 14, 2016 at 8:44 AM, Das, Dibyendu <Dibyendu.Das at amd.com>
>> wrote:
>>
>>> Hi Matt-
>>>
>>>
>>>
>>> Yeah I used a pretty recent llvm (post 3.9) on an x86-64 ( both AMD and
>>> Intel ).
>>>
>>>
>>>
>>> -dibyendu
>>>
>>>
>>>
>>> *From:* Matthew Simpson [mailto:mssimpso at codeaurora.org]
>>> *Sent:* Wednesday, December 14, 2016 10:03 PM
>>> *To:* Das, Dibyendu <Dibyendu.Das at amd.com>
>>> *Cc:* Michael Kuperstein <mkuper at google.com>; llvm-dev at lists.llvm.org
>>>
>>> *Subject:* Re: [llvm-dev] Enabling scalarized conditional stores in the
>>> loop vectorizer
>>>
>>>
>>>
>>> Hi Dibyendu,
>>>
>>>
>>>
>>> Are you using a recent compiler? What architecture are you targeting?
>>> The target will determine whether the vectorizer thinks vectorization is
>>> profitable without having to manually force the vector width.
>>>
>>>
>>>
>>> For example, top-of-trunk vectorizes your snippet with "clang -O2 -mllvm
>>> -enable-cond-stores-vec" and "--target=aarch64-unknown-linux-gnu".
>>> However, with "--target=x86_64-unknown-linux-gnu" the vectorizer
>>> doesn't find the snippet profitable to vectorize.
>>>
>>>
>>>
>>> This is probably due to the interleaved load in the loop. When targeting
>>> AArch64, the cost model reports the interleaved load as inexpensive
>>> (AArch64 has dedicated instructions for interleaved memory accesses), but
>>> when targeting X86 it doesn't. You can take a look at the costs with
>>> "-mllvm -debug-only=loop-vectorize"
>>>
>>>
>>>
>>> Hope that helps.
>>>
>>>
>>>
>>> -- Matt
>>>
>>>
>>>
>>> On Wed, Dec 14, 2016 at 12:59 AM, Das, Dibyendu via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>> Hi Michael-
>>>
>>>
>>>
>>> Since you bring up libquantum performance can you let me know what the
>>> IR will look like for this small code snippet (libquantum-like) with
>>> –enable-cond-stores-vec  ? I ask because I don’t see vectorization kicking
>>> in unless -force-vector-width=<> is specified. Let me know if I am missing
>>> something.
>>>
>>>
>>>
>>> -Thx
>>>
>>>
>>>
>>> struct nodeTy
>>>
>>> {
>>>
>>>     unsigned int c1;
>>>
>>>     unsigned int c2;
>>>
>>>     unsigned int state;
>>>
>>> };
>>>
>>>
>>>
>>> struct quantum_reg
>>>
>>> {
>>>
>>>     struct nodeTy node[32];
>>>
>>>     unsigned int size;
>>>
>>> };
>>>
>>>
>>>
>>> void
>>>
>>> quantum_toffoli(int control1, int control2, int target, struct
>>> quantum_reg *reg, int n)
>>>
>>> {
>>>
>>>      int i;
>>>
>>>
>>>
>>>      int N = reg->size;
>>>
>>>      for(i=0; i < N; i++)
>>>
>>>      {
>>>
>>>          if(reg->node[i].state & ((unsigned int)1 << control1))
>>>
>>>              if(reg->node[i].state & ((unsigned int)1 << control2))
>>>
>>>                  reg->node[i].state ^= ((unsigned int)1 << target);
>>>
>>>      }
>>>
>>> }
>>>
>>> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of
>>> *Matthew Simpson via llvm-dev
>>> *Sent:* Tuesday, December 13, 2016 7:12 PM
>>> *To:* Michael Kuperstein <mkuper at google.com>
>>> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
>>> *Subject:* Re: [llvm-dev] Enabling scalarized conditional stores in the
>>> loop vectorizer
>>>
>>>
>>>
>>> Hi Michael,
>>>
>>>
>>>
>>> Thanks for testing this on your benchmarks and target. I think the
>>> results will help guide the direction we go. I tested the feature with
>>> spec2k/2k6 on AArch64/Kryo and saw minor performance swings, aside from a
>>> large (30%) improvement in spec2k6/libquantum. The primary loop in that
>>> benchmark has a conditional store, so I expected it to benefit.
>>>
>>>
>>>
>>> Regarding the cost model, I think the vectorizer's modeling of the
>>> conditional stores is good. We could potentially improve it by using
>>> profile information if available. But I'm not sure of the quality of the
>>> individual TTI implementations other than AArch64. I assume they are
>>> adequate.
>>>
>>>
>>>
>>> Since the conditional stores remain scalar in the vector loop, their
>>> cost is essentially the same as it is in the scalar loop (aside from
>>> scalarization overhead, which we account for). So when we compare the cost
>>> of the scalar and vector loops when deciding to vectorize, we're basically
>>> comparing the cost of everything else.
>>>
>>>
>>>
>>> -- Matt
>>>
>>>
>>>
>>> On Mon, Dec 12, 2016 at 7:03 PM, Michael Kuperstein via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>> Conceptually speaking, I think we really ought to enable this.
>>>
>>>
>>>
>>> Practically, I'm going to test it on our benchmarks (on x86), and see if
>>> we have any regressions - this seems like a fairly major change.
>>>
>>> Re targets - let's see where we stand w.r.t regressions first. What kind
>>> of performance testing have you already run on this? Do you know of
>>> specific targets where the cost model is known to be good enough, so it's
>>> clearly beneficial?
>>>
>>>
>>>
>>> (+Arnold, who probably knows why this is disabled by default. :-) )
>>>
>>>
>>>
>>> Thanks,
>>>
>>>   Michael
>>>
>>>
>>>
>>> On Mon, Dec 12, 2016 at 2:52 PM, Matthew Simpson <
>>> mssimpso at codeaurora.org> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> I'd like to enable the scalarized conditional stores feature in the loop
>>> vectorizer (-enable-cond-stores-vec=true). The feature allows us to
>>> vectorize loops containing conditional stores that must be scalarized and
>>> predicated in the vectorized loop.
>>>
>>>
>>>
>>> Note that this flag does not affect the decision to generate masked
>>> vector stores. That is a separate feature and is guarded by a TTI hook.
>>> Currently, we give up on loops containing conditional stores that must be
>>> scalarized (i.e., conditional stores that can't be represented with masked
>>> vector stores). If the feature is enabled, we attempt to vectorize those
>>> loops if profitable, while scalarizing and predicating the conditional
>>> stores.
>>>
>>>
>>>
>>> I think these stores are fairly well modeled in the cost model at this
>>> point using the static estimates. They're modeled similar to the way we
>>> model other non-store conditional instructions that must be scalarized and
>>> predicated (e.g., instructions that may divide by zero); however, only the
>>> conditional stores are currently disabled by default.
>>>
>>>
>>>
>>> I'd appreciate any opinions on how/if we can enable this feature. For
>>> example, can we enable it for all targets or would a target-by-target
>>> opt-in mechanism using a TTI hook be preferable? If you'd like to test the
>>> feature on your target, please report any significant regressions and
>>> improvements you find.
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>> -- Matt
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161215/5719c189/attachment.html>


More information about the llvm-dev mailing list