<div dir="ltr"><div>I haven't looked into actually implementing revectorization, so we may just want to ignore that possibility for now. <br><br>But I imagined that revectorization could hit the same problem that we're trying to avoid here: if the cost models say that wider vectors are legal and cheaper, but the reality is that perf will suffer when using those wider vectors, then we want to avoid using the wider ops. The user pref/override will be taken into account when deciding if we should go wider.<br><br></div>In either scenario, we're not actually removing or limiting vector widths, right? They're still legal as far as the ISA is concerned. We're just avoiding those ops because the programmer and/or the CPU model says we'll do better with narrower ops.<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 14, 2017 at 10:26 AM, Craig Topper via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">For the re-vectorization case mentioned by Sanjay. That seems like a different type of limit than what's being proposed here. For revectorization you want to remove smaller vector widths. This is removing larger vector widths. I don't think we want the -mprefer-vector-width=256 being proposed here to say we can't do 128-bit vectors with the 256-bit. Maybe this should be called -mlimit-vector-width?<div><br></div><div>Its not clear to be why revectorization would need a preference at all? Shouldn't we be able to decide from the cost models? We go from scalar to vector today based on cost models. Why couldn't we go from vector to wider vector?</div></div><div class="gmail_extra"><br clear="all"><div><div class="m_-5672491955778672750gmail_signature" data-smartmail="gmail_signature">~Craig</div></div><div><div class="h5">

<br><div class="gmail_quote">On Mon, Nov 13, 2017 at 3:54 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000"><span>

    <p><br>

    </p>

    <br>

    <div class="m_-5672491955778672750m_-938083871188661067moz-cite-prefix">On 11/13/2017 05:49 PM, Eric

      Christopher wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr"><br>

        <br>

        <div class="gmail_quote">

          <div dir="ltr">On Mon, Nov 13, 2017 at 2:15 PM Craig Topper

            via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr">

              <div class="gmail_extra">

                <div class="gmail_quote">On Sat, Nov 11, 2017 at 8:52

                  PM, Hal Finkel via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>

                  wrote:<br>

                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                    <div bgcolor="#FFFFFF"><span class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-">

                        <p><br>

                        </p>

                        <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527moz-cite-prefix">On

                          11/11/2017 09:52 PM, UE US via llvm-dev wrote:<br>

                        </div>

                        <blockquote type="cite">

                          <div dir="ltr">

                            <div>If skylake is that bad at AVX2</div>

                          </div>

                        </blockquote>

                        <br>

                      </span> I don't think this says anything negative

                      about AVX2, but AVX-512.</div>

                  </blockquote>

                </div>

              </div>

            </div>

          </blockquote>

          <div><br>

          </div>

          <div>Right. I think we're at AVX/AVX2 is "bad" on

            Haswell/Broadwell and AVX512 is "bad" on Skylake. At least

            in the "random autovectorization spread out" aspect.</div>

          <div> </div>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr">

              <div class="gmail_extra">

                <div class="gmail_quote">

                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                    <div bgcolor="#FFFFFF"><span class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-"><br>

                        <br>

                        <blockquote type="cite">

                          <div dir="ltr">

                            <div> it belongs in -mcpu / -march IMO. </div>

                          </div>

                        </blockquote>

                        <br>

                      </span> No. We'd still want to enable the

                      architectural features for vector intrinsics and

                      the like.</div>

                  </blockquote>

                  <div><br>

                  </div>

                </div>

              </div>

            </div>

            <div dir="ltr">

              <div class="gmail_extra">

                <div class="gmail_quote">

                  <div>I took this to mean that the feature should be

                    enabled by default for -march=skylake-avx512.</div>

                </div>

              </div>

            </div>

          </blockquote>

          <div><br>

          </div>

          <div><br>

          </div>

          <div>Agreed.</div>

        </div>

      </div>

    </blockquote>

    <br></span>

    Yes. Also, GNOMETOYS clarified to me (off list) that is what he

    meant.<span class="m_-5672491955778672750HOEnZb"><font color="#888888"><br>

    <br>

     -Hal</font></span><div><div class="m_-5672491955778672750h5"><br>

    <br>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_quote">

          <div><br>

          </div>

          <div>-eric</div>

          <div> </div>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr">

              <div class="gmail_extra">

                <div class="gmail_quote">

                  <div><br>

                  </div>

                  <div> </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                    <div bgcolor="#FFFFFF"><span class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-"><br>

                        <br>

                        <blockquote type="cite">Based on the current

                          performance data we're seeing, we think we

                          need to ultimately default skylake-avx512 to

                          -mprefer-vector-width=256.</blockquote>

                        <br>

                      </span> Craig, is this for both integer and

                      floating-point code?</div>

                  </blockquote>

                  <div><br>

                  </div>

                </div>

              </div>

            </div>

            <div dir="ltr">

              <div class="gmail_extra">

                <div class="gmail_quote">

                  <div>I believe so, but I'll try to get confirmation

                    from the people with more data.</div>

                </div>

              </div>

            </div>

            <div dir="ltr">

              <div class="gmail_extra">

                <div class="gmail_quote">

                  <div> </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                    <div bgcolor="#FFFFFF"><span class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-HOEnZb"><font color="#888888"><br>

                          <br>

                           -Hal <br>

                        </font></span>

                      <div>

                        <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-h5"> <br>

                          <blockquote type="cite">

                            <div dir="ltr">

                              <div>   Most people will build for the

                                standard x86_64-pc-linux or whatever

                                anyway,  and completely ignore the

                                change. This will mainly affect those

                                who build their own software and

                                optimize for their system, and lots

                                there have probably caught on to this

                                already.  I always thought that's what

                                -march was made for, really. <br>

                              </div>

                            </div>

                            <div class="gmail_extra"><br clear="all">

                              <div>

                                <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527gmail_signature">GNOMETOYS<br>

                                </div>

                              </div>

                              <br>

                              <div class="gmail_quote">On Sat, Nov 11,

                                2017 at 10:25 AM, Sanjay Patel via

                                llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>

                                wrote:<br>

                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                  <div dir="ltr">

                                    <div>

                                      <div>Yes - I was thinking of

                                        FeatureFastScalarFSQRT /

                                        FeatureFastVectorFSQRT which are

                                        used by isFsqrtCheap(). These

                                        were added to override the

                                        default x86 sqrt estimate

                                        codegen with:<br>

                                        <a href="https://reviews.llvm.org/D21379" target="_blank">https://reviews.llvm.org/D2137<wbr>9</a><br>

                                        <br>

                                      </div>

                                      But I'm not sure we really need

                                      that kind of hack. Can we adjust

                                      the attribute in clang based on

                                      the target cpu? Ie, if you have

                                      something like:<br>

                                    </div>

                                    $ clang -O2 -march=skylake-avx512

                                    foo.c<br>

                                    <br>

                                    Then you can detect that in the

                                    clang driver and pass

                                    -mprefer-vector-width=256 to clang

                                    codegen as an option? Clang codegen

                                    then adds that function attribute to

                                    everything it outputs. Then, the

                                    vectorizers and/or backend detect

                                    that attribute and adjust their

                                    behavior based on it. <br>

                                  </div>

                                </blockquote>

                              </div>

                            </div>

                          </blockquote>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                  <div><br>

                  </div>

                </div>

              </div>

            </div>

            <div dir="ltr">

              <div class="gmail_extra">

                <div class="gmail_quote">

                  <div>Do we have a precedent for setting a target

                    independent flag from a target specific cpu string

                    in the clang driver? Want to make sure I understand

                    what the processing on such a thing would look like.

                    Particularly to get the order right so the user can

                    override it.<br>

                  </div>

                </div>

              </div>

            </div>

            <div dir="ltr">

              <div class="gmail_extra">

                <div class="gmail_quote">

                  <div> </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                    <div bgcolor="#FFFFFF">

                      <div>

                        <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-h5">

                          <blockquote type="cite">

                            <div class="gmail_extra">

                              <div class="gmail_quote">

                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                  <div dir="ltr"> <br>

                                    So I don't think we should be

                                    messing with any kind of type

                                    legality checking because that stuff

                                    should all be correct already. We're

                                    just choosing a vector size based on

                                    a pref. I think we should even allow

                                    the pref to go bigger than a legal

                                    type. This came up somewhere on

                                    llvm-dev or in a bug recently in the

                                    context of vector reductions.<br>

                                    <br>

                                    <br>

                                  </div>

                                  <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527HOEnZb">

                                    <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527h5">

                                      <div class="gmail_extra"><br>

                                        <div class="gmail_quote">On Fri,

                                          Nov 10, 2017 at 6:04 PM, Craig

                                          Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span>

                                          wrote:<br>

                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                            <div dir="ltr">Are you

                                              referring to

                                              the X86TargetLowering::isFsqrt<wbr>Cheap

                                              hook?</div>

                                            <div class="gmail_extra"><br clear="all">

                                              <div>

                                                <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527m_6454106954572217318m_771050129279988374gmail_signature">~Craig</div>

                                              </div>

                                              <br>

                                              <div class="gmail_quote">On

                                                Fri, Nov 10, 2017 at

                                                7:39 AM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span>

                                                wrote:<br>

                                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                                  <div dir="ltr">We can

                                                    tie a user

                                                    preference /

                                                    override to a CPU

                                                    model. We do

                                                    something like that

                                                    for square root

                                                    estimates already

                                                    (although it does

                                                    use a

                                                    SubtargetFeature

                                                    currently for x86;

                                                    ideally, we'd key

                                                    that off of

                                                    something in the CPU

                                                    scheduler model).

                                                    <div>

                                                      <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527m_6454106954572217318m_771050129279988374h5"><br>

                                                        <div>

                                                          <div class="gmail_extra"><br>

                                                          <div class="gmail_quote">On

                                                          Thu, Nov 9,

                                                          2017 at 4:21

                                                          PM, Craig

                                                          Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span>

                                                          wrote:<br>

                                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                                          <div dir="ltr">I

                                                          agree that a

                                                          less x86

                                                          specific

                                                          command line

                                                          makes sense.

                                                          I've been

                                                          having an

                                                          internal

                                                          discussions

                                                          with gcc folks

                                                          and their

                                                          evaluating

                                                          switching to

                                                          something like

-mprefer-vector-width=128/256/<wbr>512/none

                                                          <div><br>

                                                          </div>

                                                          <div>Based on

                                                          the current

                                                          performance

                                                          data we're

                                                          seeing, we

                                                          think we need

                                                          to ultimately

                                                          default

                                                          skylake-avx512

                                                          to

                                                          -mprefer-vector-width=256.

                                                          If we go with

                                                          a target

                                                          independent

                                                          option/implementation

                                                          is there

                                                          someway we

                                                          could still

                                                          affect the

                                                          default

                                                          behavior in a

                                                          target

                                                          specific way?</div>

                                                          </div>

                                                          <div class="gmail_extra"><br clear="all">

                                                          <div>

                                                          <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527m_6454106954572217318m_771050129279988374m_4887027107317541871m_-9050519988835790991gmail_signature">~Craig</div>

                                                          </div>

                                                          <br>

                                                          <div class="gmail_quote">On

                                                          Tue, Nov 7,

                                                          2017 at 9:06

                                                          AM, Sanjay

                                                          Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span>

                                                          wrote:<br>

                                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                                          <div dir="ltr">

                                                          <div>It's

                                                          clear from the

                                                          Intel docs how

                                                          this has

                                                          evolved, but

                                                          from a

                                                          compiler

                                                          perspective,

                                                          this isn't a

                                                          Skylake

                                                          "feature" :)

                                                          ... nor an

                                                          Intel feature,

                                                          nor an x86

                                                          feature. <br>

                                                          <br>

                                                          It's a generic

                                                          programmer

                                                          hint for any

                                                          target with

                                                          multiple

                                                          potential

                                                          vector

                                                          lengths. <br>

                                                          </div>

                                                          <div><br>

                                                          </div>

                                                          <div>On x86,

                                                          there's

                                                          already a

                                                          potential use

                                                          case for this

                                                          hint with a

                                                          different

                                                          starting

                                                          motivation:

                                                          re-vectorization.

                                                          That's where

                                                          we take C code

                                                          that uses

                                                          128-bit vector

                                                          intrinsics and

                                                          selectively

                                                          widen it to

                                                          256- or

                                                          512-bit vector

                                                          ops based on a

                                                          newer CPU

                                                          target than

                                                          the code was

                                                          originally

                                                          written for.<br>

                                                          <div><br>

                                                          </div>

                                                          <div>I think

                                                          it's just a

                                                          matter of time

                                                          before a

                                                          customer

                                                          requests the

                                                          same ability

                                                          for another

                                                          target (maybe

                                                          they already

                                                          have and I

                                                          don't know

                                                          about it). So

                                                          we should have

                                                          a solution

                                                          that

                                                          recognizes

                                                          that

                                                          possibility. <br>

                                                          </div>

                                                          <div><br>

                                                          </div>

                                                          </div>

                                                          Note that

                                                          having a

                                                          target-independent

                                                          implementation

                                                          in the

                                                          optimizer

                                                          doesn't

                                                          preclude a

                                                          flag alias in

                                                          clang to

                                                          maintain

                                                          compatibility

                                                          with gcc.

                                                          <div>

                                                          <div class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527m_6454106954572217318m_771050129279988374m_4887027107317541871m_-9050519988835790991h5"><br>

                                                          <div><br>

                                                          </div>

                                                          <div class="gmail_extra"><br>

                                                          <div class="gmail_quote">On

                                                          Tue, Nov 7,

                                                          2017 at 2:02

                                                          AM, Tobias

                                                          Grosser via

                                                          llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>

                                                          wrote:<br>

                                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On

                                                          Fri, Nov 3,

                                                          2017, at

                                                          05:47, Craig

                                                          Topper via

                                                          llvm-dev

                                                          wrote:<br>

                                                          > That's a

                                                          very good

                                                          point about

                                                          the ordering

                                                          of the command

                                                          line options.<br>

                                                          > gcc's

                                                          current

                                                          implementation

                                                          treats

                                                          -mprefer-avx256

                                                          has "prefer

                                                          256 over<br>

                                                          > 512" and

-mprefer-avx128 as "prefer 128 over 256". Which feels weird for<br>

                                                          > other

                                                          reasons, but

                                                          has less of an

                                                          ordering

                                                          ambiguity.<br>

                                                          ><br>

                                                          >

                                                          -mprefer-avx128

                                                          has been in

                                                          gcc for many

                                                          years and

                                                          predates the

                                                          creation<br>

                                                          > of<br>

                                                          > avx512.

                                                          -mprefer-avx256

                                                          was added a

                                                          couple months

                                                          ago.<br>

                                                          ><br>

                                                          > We've had

                                                          an internal

                                                          conversation

                                                          with the

                                                          implementor of<br>

                                                          >

                                                          -mprefer-avx256<br>

                                                          > in gcc

                                                          about making

                                                          -mprefer-avx128

                                                          affect 512-bit

                                                          vectors as

                                                          well. I'll<br>

                                                          > bring up

                                                          the ambiguity

                                                          issue with

                                                          them.<br>

                                                          ><br>

                                                          > Do we

                                                          want to be

                                                          compatible

                                                          with gcc here?<br>

                                                          <br>

                                                          I certainly

                                                          believe we

                                                          would want to

                                                          be compatible

                                                          with gcc (if

                                                          we use<br>

                                                          the same

                                                          names).<br>

                                                          <br>

                                                          Best,<br>

                                                          Tobias<br>

                                                          <br>

                                                          ><br>

                                                          > ~Craig<br>

                                                          ><br>

                                                          > On Thu,

                                                          Nov 2, 2017 at

                                                          7:18 PM, Eric

                                                          Christopher

                                                          <<a href="mailto:echristo@gmail.com" target="_blank">echristo@gmail.com</a>><br>

                                                          > wrote:<br>

                                                          ><br>

                                                          > ><br>

                                                          > ><br>

                                                          > > On

                                                          Thu, Nov 2,

                                                          2017 at 7:05

                                                          PM James Y

                                                          Knight via

                                                          llvm-dev <<br>

                                                          > > <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>

                                                          wrote:<br>

                                                          > ><br>

                                                          > >>

                                                          On Wed, Nov 1,

                                                          2017 at 7:35

                                                          PM, Craig

                                                          Topper via

                                                          llvm-dev <<br>

                                                          > >>

                                                          <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>

                                                          wrote:<br>

                                                          > >><br>

                                                          >

                                                          >>>

                                                          Hello all,<br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>> I

                                                          would like to

                                                          propose adding

                                                          the

                                                          -mprefer-avx256

                                                          and

                                                          -mprefer-avx128<br>

                                                          >

                                                          >>>

                                                          command line

                                                          flags

                                                          supported by

                                                          latest GCC to

                                                          clang. These

                                                          flags will be<br>

                                                          >

                                                          >>>

                                                          used to limit

                                                          the vector

                                                          register size

                                                          presented by

                                                          TTI to the

                                                          vectorizers.<br>

                                                          >

                                                          >>>

                                                          The backend

                                                          will still be

                                                          able to use

                                                          wider

                                                          registers for

                                                          code written<br>

                                                          >

                                                          >>>

                                                          using the

                                                          instrinsics in

                                                          x86intrin.h.

                                                          And the

                                                          backend will

                                                          still be able

                                                          to<br>

                                                          >

                                                          >>>

                                                          use AVX512VL

                                                          instructions

                                                          and the

                                                          additional

                                                          XMM16-31 and

                                                          YMM16-31<br>

                                                          >

                                                          >>>

                                                          registers.<br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>>

                                                          Motivation:<br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>>

                                                          -Using 512-bit

                                                          operations on

                                                          some Intel

                                                          CPUs may cause

                                                          a decrease in

                                                          CPU<br>

                                                          >

                                                          >>>

                                                          frequency that

                                                          may offset the

                                                          gains from

                                                          using the

                                                          wider register

                                                          size. See<br>

                                                          >

                                                          >>>

                                                          section 15.26

                                                          of Intel® 64

                                                          and IA-32

                                                          Architectures

                                                          Optimization

                                                          Reference<br>

                                                          >

                                                          >>>

                                                          Manual

                                                          published

                                                          October 2017.<br>

                                                          >

                                                          >>><br>

                                                          > >><br>

                                                          > >>

                                                          I note the doc

                                                          mentions that

                                                          256-bit AVX

                                                          operations

                                                          also have the

                                                          same<br>

                                                          > >>

                                                          issue with

                                                          reducing the

                                                          CPU frequency,

                                                          which is nice

                                                          to see

                                                          documented!<br>

                                                          > >><br>

                                                          > >>

                                                          There's also

                                                          the issues

                                                          discussed here

                                                          <<a href="http://www.agner.org/" rel="noreferrer" target="_blank">http://www.agner.org/</a><br>

                                                          > >>

optimize/blog/read.php?i=165> (and elsewhere) related to warm-up time<br>

                                                          > >>

                                                          for the

                                                          256-bit

                                                          execution

                                                          pipeline,

                                                          which is

                                                          another issue

                                                          with using<br>

                                                          > >>

                                                          wide-vector

                                                          ops.<br>

                                                          > >><br>

                                                          > >><br>

                                                          > >>

                                                          -The vector

                                                          ALUs on ports

                                                          0 and 1 of the

                                                          Skylake Server

microarchitecture<br>

                                                          >

                                                          >>>

                                                          are only

                                                          256-bits wide.

                                                          512-bit

                                                          instructions

                                                          using these

                                                          ALUs must use

                                                          both<br>

                                                          >

                                                          >>>

                                                          ports. See

                                                          section 2.1 of

                                                          Intel® 64 and

                                                          IA-32

                                                          Architectures

                                                          Optimization<br>

                                                          >

                                                          >>>

                                                          Reference

                                                          Manual

                                                          published

                                                          October 2017.<br>

                                                          >

                                                          >>><br>

                                                          > >><br>

                                                          > >><br>

                                                          >

                                                          >>> 

                                                          Implementation

                                                          Plan:<br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>>

                                                          -Add

                                                          prefer-avx256

                                                          and

                                                          prefer-avx128

                                                          as

                                                          SubtargetFeatures

                                                          in X86.td not<br>

                                                          >

                                                          >>>

                                                          mapped to any

                                                          CPU.<br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>>

                                                          -Add

                                                          mprefer-avx256

                                                          and

                                                          mprefer-avx128

                                                          and the

                                                          corresponding<br>

                                                          >

                                                          >>>

                                                          -mno-prefer-avx128/256

                                                          options to

                                                          clang's driver

                                                          Options.td

                                                          file. I

                                                          believe<br>

                                                          >

                                                          >>>

                                                          this will

                                                          allow clang to

                                                          pass these

                                                          straight

                                                          through to the

-target-feature<br>

                                                          >

                                                          >>>

                                                          attribute in

                                                          IR.<br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>>

                                                          -Modify

                                                          X86TTIImpl::getRegisterBitWidt<wbr>h

                                                          to only return

                                                          512 if AVX512

                                                          is<br>

                                                          >

                                                          >>>

                                                          enabled and

                                                          prefer-avx256

                                                          and

                                                          prefer-avx128

                                                          is not set.

                                                          Similarly

                                                          return<br>

                                                          >

                                                          >>>

                                                          256 if AVX is

                                                          enabled and

                                                          prefer-avx128

                                                          is not set.<br>

                                                          >

                                                          >>><br>

                                                          > >><br>

                                                          > >>

                                                          Instead of

                                                          multiple flags

                                                          that have

                                                          difficult to

                                                          understand

                                                          intersecting<br>

                                                          > >>

                                                          behavior, one

                                                          flag with a

                                                          value would be

                                                          better. E.g.,

                                                          what should<br>

                                                          > >>

"-mprefer-avx256 -mprefer-avx128 -mno-prefer-avx256" do? No matter the<br>

                                                          > >>

                                                          answer, it's

                                                          confusing.

                                                          (Similarly

                                                          with other

                                                          such

                                                          combinations).

                                                          Just a<br>

                                                          > >>

                                                          single arg

                                                          "-mprefer-avx={128/256/512}"

                                                          (with no "no"

                                                          version) seems

                                                          easier<br>

                                                          > >>

                                                          to understand

                                                          to me (keeping

                                                          the same

                                                          behavior as

                                                          you mention:

                                                          asking to<br>

                                                          > >>

                                                          prefer a

                                                          larger width

                                                          than is

                                                          supported by

                                                          your

                                                          architecture

                                                          should be fine<br>

                                                          > >>

                                                          but ignored).<br>

                                                          > >><br>

                                                          > >><br>

                                                          > > I

                                                          agree with

                                                          this. It's a

                                                          little more

                                                          plumbing as

                                                          far as

                                                          subtarget<br>

                                                          > >

                                                          features etc

                                                          (represent via

                                                          an optional

                                                          value or just

                                                          various "set

                                                          the avx<br>

                                                          > >

                                                          width"

                                                          features - the

                                                          latter being

                                                          easier, but

                                                          uglier),

                                                          however, it's<br>

                                                          > >

                                                          probably the

                                                          right thing to

                                                          do.<br>

                                                          > ><br>

                                                          > > I

                                                          was looking at

                                                          this myself

                                                          just a couple

                                                          weeks ago and

                                                          think this is

                                                          the<br>

                                                          > >

                                                          right

                                                          direction

                                                          (when and how

                                                          to turn things

                                                          off) - and

                                                          probably makes<br>

                                                          > >

                                                          sense to be a

                                                          default for

                                                          these

                                                          architectures?

                                                          We might end

                                                          up needing to<br>

                                                          > >

                                                          check a couple

                                                          of additional

                                                          TTI places,

                                                          but it sounds

                                                          like you're on

                                                          top<br>

                                                          > > of

                                                          it. :)<br>

                                                          > ><br>

                                                          > >

                                                          Thanks very

                                                          much for doing

                                                          this work.<br>

                                                          > ><br>

                                                          > >

                                                          -eric<br>

                                                          > ><br>

                                                          > ><br>

                                                          > >><br>

                                                          > >><br>

                                                          > >>

                                                          There may be

                                                          some other

                                                          backend

                                                          changes

                                                          needed, but I

                                                          plan to

                                                          address<br>

                                                          >

                                                          >>>

                                                          those as we

                                                          find them.<br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>>

                                                          At a later

                                                          point,

                                                          consider

                                                          making

                                                          -mprefer-avx256

                                                          the default

                                                          for<br>

                                                          >

                                                          >>>

                                                          Skylake Server

                                                          due to the

                                                          above

                                                          mentioned

                                                          performance

                                                          considerations.<br>

                                                          >

                                                          >>><br>

                                                          > >><br>

                                                          > >><br>

                                                          > >><br>

                                                          > >><br>

                                                          > >><br>

                                                          >

                                                          >>><br>

                                                          > >>

                                                          Does this

                                                          sound

                                                          reasonable?<br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>>

                                                          *Latest Intel

                                                          Optimization

                                                          manual

                                                          available

                                                          here:<br>

                                                          >

                                                          >>> <a href="https://software.intel.com/en-us/articles/intel-sdm#optimization" rel="noreferrer" target="_blank">https://software.intel.com/en-<wbr>us/articles/intel-sdm#optimiza<wbr>tion</a><br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>>

                                                          -Craig Topper<br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>>

                                                          ______________________________<wbr>_________________<br>

                                                          >

                                                          >>>

                                                          LLVM

                                                          Developers

                                                          mailing list<br>

                                                          >

                                                          >>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

                                                          >

                                                          >>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

                                                          >

                                                          >>><br>

                                                          >

                                                          >>>

                                                          ______________________________<wbr>_________________<br>

                                                          > >>

                                                          LLVM

                                                          Developers

                                                          mailing list<br>

                                                          > >>

                                                          <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

                                                          > >>

                                                          <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

                                                          > >><br>

                                                          > ><br>

                                                          >

                                                          ______________________________<wbr>_________________<br>

                                                          > LLVM

                                                          Developers

                                                          mailing list<br>

                                                          > <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

                                                          > <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

______________________________<wbr>_________________<br>

                                                          LLVM

                                                          Developers

                                                          mailing list<br>

                                                          <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

                                                          <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          </div>

                                                          </blockquote>

                                                          </div>

                                                          <br>

                                                          </div>

                                                        </div>

                                                      </div>

                                                    </div>

                                                  </div>

                                                </blockquote>

                                              </div>

                                              <br>

                                            </div>

                                          </blockquote>

                                        </div>

                                        <br>

                                      </div>

                                    </div>

                                  </div>

                                  <br>

______________________________<wbr>_________________<br>

                                  LLVM Developers mailing list<br>

                                  <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

                                  <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

                                  <br>

                                </blockquote>

                              </div>

                              <br>

                            </div>

                            <br>

                            <fieldset class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527mimeAttachmentHeader"></fieldset>

                            <br>

                            <pre>______________________________<wbr>_________________

LLVM Developers mailing list

<a class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>

<a class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>

</pre>

                          </blockquote>

                          <br>

                        </div>

                      </div>

                      <span class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-">

                        <pre class="m_-5672491955778672750m_-938083871188661067m_-2096253803562932609gmail-m_264012946301939527moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

                      </span></div>

                    <br>

                    ______________________________<wbr>_________________<br>

                    LLVM Developers mailing list<br>

                    <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

                    <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

                    <br>

                  </blockquote>

                </div>

              </div>

            </div>

            ______________________________<wbr>_________________<br>

            LLVM Developers mailing list<br>

            <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

            <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

          </blockquote>

        </div>

      </div>

    </blockquote>

    <br>

    <pre class="m_-5672491955778672750m_-938083871188661067moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </div></div></div>

</blockquote></div><br></div></div></div>

<br>______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

<br></blockquote></div><br></div>