[llvm-dev] Enabling scalarized conditional stores in the loop vectorizer

Thu Dec 15 08:09:01 PST 2016

If there are no objections, I'll submit a patch for review that sets the
default value of "-enable-cond-stores-vec" to "true". Thanks!

-- Matt

On Wed, Dec 14, 2016 at 12:55 PM, Michael Kuperstein via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> I haven't verified what Matt described is what actually happens, but
> assuming it is - that is a known issue in the x86 cost model.
>
> Vectorizing interleaved memory accesses on x86 was, until recently,
> disabled by default. It's been enabled since r284779, but the cost model is
> very conservative, and basically assumes we're going to scalarize
> interleaved ops.
>
> I believe Farhana is working on improving that.
>
> Michael
>
>
> On Wed, Dec 14, 2016 at 8:44 AM, Das, Dibyendu <Dibyendu.Das at amd.com>
> wrote:
>
>> Hi Matt-
>>
>>
>>
>> Yeah I used a pretty recent llvm (post 3.9) on an x86-64 ( both AMD and
>> Intel ).
>>
>>
>>
>> -dibyendu
>>
>>
>>
>> *From:* Matthew Simpson [mailto:mssimpso at codeaurora.org]
>> *Sent:* Wednesday, December 14, 2016 10:03 PM
>> *To:* Das, Dibyendu <Dibyendu.Das at amd.com>
>> *Cc:* Michael Kuperstein <mkuper at google.com>; llvm-dev at lists.llvm.org
>>
>> *Subject:* Re: [llvm-dev] Enabling scalarized conditional stores in the
>> loop vectorizer
>>
>>
>>
>> Hi Dibyendu,
>>
>>
>>
>> Are you using a recent compiler? What architecture are you targeting? The
>> target will determine whether the vectorizer thinks vectorization is
>> profitable without having to manually force the vector width.
>>
>>
>>
>> For example, top-of-trunk vectorizes your snippet with "clang -O2 -mllvm
>> -enable-cond-stores-vec" and "--target=aarch64-unknown-linux-gnu".
>> However, with "--target=x86_64-unknown-linux-gnu" the vectorizer doesn't
>> find the snippet profitable to vectorize.
>>
>>
>>
>> This is probably due to the interleaved load in the loop. When targeting
>> AArch64, the cost model reports the interleaved load as inexpensive
>> (AArch64 has dedicated instructions for interleaved memory accesses), but
>> when targeting X86 it doesn't. You can take a look at the costs with
>> "-mllvm -debug-only=loop-vectorize"
>>
>>
>>
>> Hope that helps.
>>
>>
>>
>> -- Matt
>>
>>
>>
>> On Wed, Dec 14, 2016 at 12:59 AM, Das, Dibyendu via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Hi Michael-
>>
>>
>>
>> Since you bring up libquantum performance can you let me know what the IR
>> will look like for this small code snippet (libquantum-like) with
>> –enable-cond-stores-vec  ? I ask because I don’t see vectorization kicking
>> in unless -force-vector-width=<> is specified. Let me know if I am missing
>> something.
>>
>>
>>
>> -Thx
>>
>>
>>
>> struct nodeTy
>>
>> {
>>
>>     unsigned int c1;
>>
>>     unsigned int c2;
>>
>>     unsigned int state;
>>
>> };
>>
>>
>>
>> struct quantum_reg
>>
>> {
>>
>>     struct nodeTy node[32];
>>
>>     unsigned int size;
>>
>> };
>>
>>
>>
>> void
>>
>> quantum_toffoli(int control1, int control2, int target, struct
>> quantum_reg *reg, int n)
>>
>> {
>>
>>      int i;
>>
>>
>>
>>      int N = reg->size;
>>
>>      for(i=0; i < N; i++)
>>
>>      {
>>
>>          if(reg->node[i].state & ((unsigned int)1 << control1))
>>
>>              if(reg->node[i].state & ((unsigned int)1 << control2))
>>
>>                  reg->node[i].state ^= ((unsigned int)1 << target);
>>
>>      }
>>
>> }
>>
>> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Matthew
>> Simpson via llvm-dev
>> *Sent:* Tuesday, December 13, 2016 7:12 PM
>> *To:* Michael Kuperstein <mkuper at google.com>
>> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
>> *Subject:* Re: [llvm-dev] Enabling scalarized conditional stores in the
>> loop vectorizer
>>
>>
>>
>> Hi Michael,
>>
>>
>>
>> Thanks for testing this on your benchmarks and target. I think the
>> results will help guide the direction we go. I tested the feature with
>> spec2k/2k6 on AArch64/Kryo and saw minor performance swings, aside from a
>> large (30%) improvement in spec2k6/libquantum. The primary loop in that
>> benchmark has a conditional store, so I expected it to benefit.
>>
>>
>>
>> Regarding the cost model, I think the vectorizer's modeling of the
>> conditional stores is good. We could potentially improve it by using
>> profile information if available. But I'm not sure of the quality of the
>> individual TTI implementations other than AArch64. I assume they are
>> adequate.
>>
>>
>>
>> Since the conditional stores remain scalar in the vector loop, their cost
>> is essentially the same as it is in the scalar loop (aside from
>> scalarization overhead, which we account for). So when we compare the cost
>> of the scalar and vector loops when deciding to vectorize, we're basically
>> comparing the cost of everything else.
>>
>>
>>
>> -- Matt
>>
>>
>>
>> On Mon, Dec 12, 2016 at 7:03 PM, Michael Kuperstein via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Conceptually speaking, I think we really ought to enable this.
>>
>>
>>
>> Practically, I'm going to test it on our benchmarks (on x86), and see if
>> we have any regressions - this seems like a fairly major change.
>>
>> Re targets - let's see where we stand w.r.t regressions first. What kind
>> of performance testing have you already run on this? Do you know of
>> specific targets where the cost model is known to be good enough, so it's
>> clearly beneficial?
>>
>>
>>
>> (+Arnold, who probably knows why this is disabled by default. :-) )
>>
>>
>>
>> Thanks,
>>
>>   Michael
>>
>>
>>
>> On Mon, Dec 12, 2016 at 2:52 PM, Matthew Simpson <mssimpso at codeaurora.org>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> I'd like to enable the scalarized conditional stores feature in the loop
>> vectorizer (-enable-cond-stores-vec=true). The feature allows us to
>> vectorize loops containing conditional stores that must be scalarized and
>> predicated in the vectorized loop.
>>
>>
>>
>> Note that this flag does not affect the decision to generate masked
>> vector stores. That is a separate feature and is guarded by a TTI hook.
>> Currently, we give up on loops containing conditional stores that must be
>> scalarized (i.e., conditional stores that can't be represented with masked
>> vector stores). If the feature is enabled, we attempt to vectorize those
>> loops if profitable, while scalarizing and predicating the conditional
>> stores.
>>
>>
>>
>> I think these stores are fairly well modeled in the cost model at this
>> point using the static estimates. They're modeled similar to the way we
>> model other non-store conditional instructions that must be scalarized and
>> predicated (e.g., instructions that may divide by zero); however, only the
>> conditional stores are currently disabled by default.
>>
>>
>>
>> I'd appreciate any opinions on how/if we can enable this feature. For
>> example, can we enable it for all targets or would a target-by-target
>> opt-in mechanism using a TTI hook be preferable? If you'd like to test the
>> feature on your target, please report any significant regressions and
>> improvements you find.
>>
>>
>>
>> Thanks!
>>
>>
>>
>> -- Matt
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161215/8167c549/attachment-0001.html>