<div dir="ltr">clang doesn't support -mtune. But we do have -march. llc calls it -mcpu. "skylake" refers to the mobile and desktop version with 4-way associativity. "skylake-avx512" refers to the Xeon server version.</div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature">~Craig</div></div>

<br><div class="gmail_quote">On Fri, Aug 25, 2017 at 10:11 AM, Tobias Grosser <span dir="ltr"><<a href="mailto:tobias@grosser.es" target="_blank">tobias@grosser.es</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Thu, Aug 24, 2017, at 23:39, Craig Topper via llvm-commits wrote:<br>

> I believe Skylake client's L2 associativity is only 4 way. While Skylake<br>

> server has a much larger L2 with more associativity.<br>

><br>

> Is this something we should get from CPUID if the user does -mcpu=native?<br>

<br>

I was so glad that -- according to 7-cpu -- all have the same<br>

characteristics. But it seems I indeed overlooked the 4way associativity<br>

of skylake.<br>

<br>

I wonder what would be the best way to model this. Are there different<br>

mtune flags for Skylake server and client? Gcc distinguishes betweek<br>

skylake and ‘skylake-avx512’? Are these the two variants you talk about?<br>

<br>

Best,<br>

Tobias<br>

<br>

> ~Craig<br>

><br>

> On Thu, Aug 24, 2017 at 2:46 AM, Tobias Grosser via llvm-commits <<br>

> <a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>> wrote:<br>

><br>

> > Author: grosser<br>

> > Date: Thu Aug 24 02:46:25 2017<br>

> > New Revision: 311647<br>

> ><br>

> > URL: <a href="http://llvm.org/viewvc/llvm-project?rev=311647&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project?rev=311647&view=rev</a><br>

> > Log:<br>

> > Model cache size and associativity in TargetTransformInfo<br>

> ><br>

> > Summary:<br>

> > We add the precise cache sizes and associativity for the following Intel<br>

> > architectures:<br>

> ><br>

> >   - Penry<br>

> >   - Nehalem<br>

> >   - Westmere<br>

> >   - Sandy Bridge<br>

> >   - Ivy Bridge<br>

> >   - Haswell<br>

> >   - Broadwell<br>

> >   - Skylake<br>

> >   - Kabylake<br>

> ><br>

> > Polly uses since several months a performance model for BLAS computations<br>

> > that<br>

> > derives optimal cache and register tile sizes from cache and latency<br>

> > information (based on ideas from "Analytical Modeling Is Enough for<br>

> > High-Performance BLIS", by Tze Meng Low published at TOMS 2016).<br>

> > While bootstrapping this model, these target values have been kept in<br>

> > Polly.<br>

> > However, as our implementation is now rather mature, it seems time to teach<br>

> > LLVM itself about cache sizes.<br>

> ><br>

> > Interestingly, L1 and L2 cache sizes are pretty constant across<br>

> > micro-architectures, hence a set of architecture specific default values<br>

> > seems like a good start. They can be expanded to more target specific<br>

> > values,<br>

> > in case certain newer architectures require different values. For now a set<br>

> > of Intel architectures are provided.<br>

> ><br>

> > Just as a little teaser, for a simple gemm kernel this model allows us to<br>

> > improve performance from 1.2s to 0.27s. For gemm kernels with less optimal<br>

> > memory layouts even larger speedups can be reported.<br>

> ><br>

> > Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn,<br>

> > sebpop, efriedma, asb<br>

> ><br>

> > Reviewed By: fhahn, asb<br>

> ><br>

> > Subscribers: lsaba, asb, pollydev, llvm-commits<br>

> ><br>

> > Differential Revision: <a href="https://reviews.llvm.org/D37051" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D37051</a><br>

> ><br>

> > Modified:<br>

> >     llvm/trunk/include/llvm/<wbr>Analysis/TargetTransformInfo.h<br>

> >     llvm/trunk/include/llvm/<wbr>Analysis/<wbr>TargetTransformInfoImpl.h<br>

> >     llvm/trunk/lib/Analysis/<wbr>TargetTransformInfo.cpp<br>

> >     llvm/trunk/lib/Target/X86/<wbr>X86TargetTransformInfo.cpp<br>

> >     llvm/trunk/lib/Target/X86/<wbr>X86TargetTransformInfo.h<br>

> ><br>

> > Modified: llvm/trunk/include/llvm/<wbr>Analysis/TargetTransformInfo.h<br>

> > URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/include/<wbr>llvm/Analysis/</a><br>

> > TargetTransformInfo.h?rev=<wbr>311647&r1=311646&r2=311647&<wbr>view=diff<br>

> > ==============================<wbr>==============================<br>

> > ==================<br>

> > --- llvm/trunk/include/llvm/<wbr>Analysis/TargetTransformInfo.h (original)<br>

> > +++ llvm/trunk/include/llvm/<wbr>Analysis/TargetTransformInfo.h Thu Aug 24<br>

> > 02:46:25 2017<br>

> > @@ -603,6 +603,22 @@ public:<br>

> >    /// \return The size of a cache line in bytes.<br>

> >    unsigned getCacheLineSize() const;<br>

> ><br>

> > +  /// The possible cache levels<br>

> > +  enum class CacheLevel {<br>

> > +    L1D,   // The L1 data cache<br>

> > +    L2D,   // The L2 data cache<br>

> > +<br>

> > +    // We currently do not model L3 caches, as their sizes differ widely<br>

> > between<br>

> > +    // microarchitectures. Also, we currently do not have a use for L3<br>

> > cache<br>

> > +    // size modeling yet.<br>

> > +  };<br>

> > +<br>

> > +  /// \return The size of the cache level in bytes, if available.<br>

> > +  llvm::Optional<unsigned> getCacheSize(CacheLevel Level) const;<br>

> > +<br>

> > +  /// \return The associativity of the cache level, if available.<br>

> > +  llvm::Optional<unsigned> getCacheAssociativity(<wbr>CacheLevel Level) const;<br>

> > +<br>

> >    /// \return How much before a load we should place the prefetch<br>

> > instruction.<br>

> >    /// This is currently measured in number of instructions.<br>

> >    unsigned getPrefetchDistance() const;<br>

> > @@ -937,6 +953,8 @@ public:<br>

> >    virtual bool shouldConsiderAddressTypePromo<wbr>tion(<br>

> >        const Instruction &I, bool &<wbr>AllowPromotionWithoutCommonHea<wbr>der) = 0;<br>

> >    virtual unsigned getCacheLineSize() = 0;<br>

> > +  virtual llvm::Optional<unsigned> getCacheSize(CacheLevel Level) = 0;<br>

> > +  virtual llvm::Optional<unsigned> getCacheAssociativity(<wbr>CacheLevel<br>

> > Level) = 0;<br>

> >    virtual unsigned getPrefetchDistance() = 0;<br>

> >    virtual unsigned getMinPrefetchStride() = 0;<br>

> >    virtual unsigned getMaxPrefetchIterationsAhead(<wbr>) = 0;<br>

> > @@ -1209,6 +1227,12 @@ public:<br>

> >    unsigned getCacheLineSize() override {<br>

> >      return Impl.getCacheLineSize();<br>

> >    }<br>

> > +  llvm::Optional<unsigned> getCacheSize(CacheLevel Level) override {<br>

> > +    return Impl.getCacheSize(Level);<br>

> > +  }<br>

> > +  llvm::Optional<unsigned> getCacheAssociativity(<wbr>CacheLevel Level)<br>

> > override {<br>

> > +    return Impl.getCacheAssociativity(<wbr>Level);<br>

> > +  }<br>

> >    unsigned getPrefetchDistance() override { return<br>

> > Impl.getPrefetchDistance(); }<br>

> >    unsigned getMinPrefetchStride() override {<br>

> >      return Impl.getMinPrefetchStride();<br>

> ><br>

> > Modified: llvm/trunk/include/llvm/<wbr>Analysis/<wbr>TargetTransformInfoImpl.h<br>

> > URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/include/<wbr>llvm/Analysis/</a><br>

> > TargetTransformInfoImpl.h?rev=<wbr>311647&r1=311646&r2=311647&<wbr>view=diff<br>

> > ==============================<wbr>==============================<br>

> > ==================<br>

> > --- llvm/trunk/include/llvm/<wbr>Analysis/<wbr>TargetTransformInfoImpl.h (original)<br>

> > +++ llvm/trunk/include/llvm/<wbr>Analysis/<wbr>TargetTransformInfoImpl.h Thu Aug 24<br>

> > 02:46:25 2017<br>

> > @@ -340,6 +340,29 @@ public:<br>

> ><br>

> >    unsigned getCacheLineSize() { return 0; }<br>

> ><br>

> > +  llvm::Optional<unsigned> getCacheSize(<wbr>TargetTransformInfo::<wbr>CacheLevel<br>

> > Level) {<br>

> > +    switch (Level) {<br>

> > +    case TargetTransformInfo::<wbr>CacheLevel::L1D:<br>

> > +      LLVM_FALLTHROUGH;<br>

> > +    case TargetTransformInfo::<wbr>CacheLevel::L2D:<br>

> > +      return llvm::Optional<unsigned>();<br>

> > +    }<br>

> > +<br>

> > +    llvm_unreachable("Unknown TargetTransformInfo::<wbr>CacheLevel");<br>

> > +  }<br>

> > +<br>

> > +  llvm::Optional<unsigned> getCacheAssociativity(<br>

> > +    TargetTransformInfo::<wbr>CacheLevel Level) {<br>

> > +    switch (Level) {<br>

> > +    case TargetTransformInfo::<wbr>CacheLevel::L1D:<br>

> > +      LLVM_FALLTHROUGH;<br>

> > +    case TargetTransformInfo::<wbr>CacheLevel::L2D:<br>

> > +      return llvm::Optional<unsigned>();<br>

> > +    }<br>

> > +<br>

> > +    llvm_unreachable("Unknown TargetTransformInfo::<wbr>CacheLevel");<br>

> > +  }<br>

> > +<br>

> >    unsigned getPrefetchDistance() { return 0; }<br>

> ><br>

> >    unsigned getMinPrefetchStride() { return 1; }<br>

> ><br>

> > Modified: llvm/trunk/lib/Analysis/<wbr>TargetTransformInfo.cpp<br>

> > URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/lib/</a><br>

> > Analysis/TargetTransformInfo.<wbr>cpp?rev=311647&r1=311646&r2=<wbr>311647&view=diff<br>

> > ==============================<wbr>==============================<br>

> > ==================<br>

> > --- llvm/trunk/lib/Analysis/<wbr>TargetTransformInfo.cpp (original)<br>

> > +++ llvm/trunk/lib/Analysis/<wbr>TargetTransformInfo.cpp Thu Aug 24 02:46:25<br>

> > 2017<br>

> > @@ -321,6 +321,16 @@ unsigned TargetTransformInfo::<wbr>getCacheLi<br>

> >    return TTIImpl->getCacheLineSize();<br>

> >  }<br>

> ><br>

> > +llvm::Optional<unsigned> TargetTransformInfo::<wbr>getCacheSize(CacheLevel<br>

> > Level)<br>

> > +  const {<br>

> > +  return TTIImpl->getCacheSize(Level);<br>

> > +}<br>

> > +<br>

> > +llvm::Optional<unsigned> TargetTransformInfo::<wbr>getCacheAssociativity(<br>

> > +  CacheLevel Level) const {<br>

> > +  return TTIImpl-><wbr>getCacheAssociativity(Level);<br>

> > +}<br>

> > +<br>

> >  unsigned TargetTransformInfo::<wbr>getPrefetchDistance() const {<br>

> >    return TTIImpl->getPrefetchDistance()<wbr>;<br>

> >  }<br>

> ><br>

> > Modified: llvm/trunk/lib/Target/X86/<wbr>X86TargetTransformInfo.cpp<br>

> > URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/lib/Target/</a><br>

> > X86/X86TargetTransformInfo.<wbr>cpp?rev=311647&r1=311646&r2=<wbr>311647&view=diff<br>

> > ==============================<wbr>==============================<br>

> > ==================<br>

> > --- llvm/trunk/lib/Target/X86/<wbr>X86TargetTransformInfo.cpp (original)<br>

> > +++ llvm/trunk/lib/Target/X86/<wbr>X86TargetTransformInfo.cpp Thu Aug 24<br>

> > 02:46:25 2017<br>

> > @@ -66,6 +66,57 @@ X86TTIImpl::getPopcntSupport(<wbr>unsigned Ty<br>

> >    return ST->hasPOPCNT() ? TTI::PSK_FastHardware : TTI::PSK_Software;<br>

> >  }<br>

> ><br>

> > +llvm::Optional<unsigned> X86TTIImpl::getCacheSize(<br>

> > +  TargetTransformInfo::<wbr>CacheLevel Level) const {<br>

> > +  switch (Level) {<br>

> > +  case TargetTransformInfo::<wbr>CacheLevel::L1D:<br>

> > +    //   - Penry<br>

> > +    //   - Nehalem<br>

> > +    //   - Westmere<br>

> > +    //   - Sandy Bridge<br>

> > +    //   - Ivy Bridge<br>

> > +    //   - Haswell<br>

> > +    //   - Broadwell<br>

> > +    //   - Skylake<br>

> > +    //   - Kabylake<br>

> > +    return 32 * 1024;  //  32 KByte<br>

> > +  case TargetTransformInfo::<wbr>CacheLevel::L2D:<br>

> > +    //   - Penry<br>

> > +    //   - Nehalem<br>

> > +    //   - Westmere<br>

> > +    //   - Sandy Bridge<br>

> > +    //   - Ivy Bridge<br>

> > +    //   - Haswell<br>

> > +    //   - Broadwell<br>

> > +    //   - Skylake<br>

> > +    //   - Kabylake<br>

> > +    return 256 * 1024; // 256 KByte<br>

> > +  }<br>

> > +<br>

> > +  llvm_unreachable("Unknown TargetTransformInfo::<wbr>CacheLevel");<br>

> > +}<br>

> > +<br>

> > +llvm::Optional<unsigned> X86TTIImpl::<wbr>getCacheAssociativity(<br>

> > +  TargetTransformInfo::<wbr>CacheLevel Level) const {<br>

> > +  //   - Penry<br>

> > +  //   - Nehalem<br>

> > +  //   - Westmere<br>

> > +  //   - Sandy Bridge<br>

> > +  //   - Ivy Bridge<br>

> > +  //   - Haswell<br>

> > +  //   - Broadwell<br>

> > +  //   - Skylake<br>

> > +  //   - Kabylake<br>

> > +  switch (Level) {<br>

> > +  case TargetTransformInfo::<wbr>CacheLevel::L1D:<br>

> > +    LLVM_FALLTHROUGH;<br>

> > +  case TargetTransformInfo::<wbr>CacheLevel::L2D:<br>

> > +    return 8;<br>

> > +  }<br>

> > +<br>

> > +  llvm_unreachable("Unknown TargetTransformInfo::<wbr>CacheLevel");<br>

> > +}<br>

> > +<br>

> >  unsigned X86TTIImpl::<wbr>getNumberOfRegisters(bool Vector) {<br>

> >    if (Vector && !ST->hasSSE1())<br>

> >      return 0;<br>

> ><br>

> > Modified: llvm/trunk/lib/Target/X86/<wbr>X86TargetTransformInfo.h<br>

> > URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/lib/Target/</a><br>

> > X86/X86TargetTransformInfo.h?<wbr>rev=311647&r1=311646&r2=<wbr>311647&view=diff<br>

> > ==============================<wbr>==============================<br>

> > ==================<br>

> > --- llvm/trunk/lib/Target/X86/<wbr>X86TargetTransformInfo.h (original)<br>

> > +++ llvm/trunk/lib/Target/X86/<wbr>X86TargetTransformInfo.h Thu Aug 24<br>

> > 02:46:25 2017<br>

> > @@ -47,6 +47,14 @@ public:<br>

> ><br>

> >    /// @}<br>

> ><br>

> > +  /// \name Cache TTI Implementation<br>

> > +  /// @{<br>

> > +  llvm::Optional<unsigned> getCacheSize(<br>

> > +    TargetTransformInfo::<wbr>CacheLevel Level) const;<br>

> > +  llvm::Optional<unsigned> getCacheAssociativity(<br>

> > +    TargetTransformInfo::<wbr>CacheLevel Level) const;<br>

> > +  /// @}<br>

> > +<br>

> >    /// \name Vector TTI Implementations<br>

> >    /// @{<br>

> ><br>

> ><br>

> ><br>

> > ______________________________<wbr>_________________<br>

> > llvm-commits mailing list<br>

> > <a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>

> > <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>

> ><br>

> ______________________________<wbr>_________________<br>

> llvm-commits mailing list<br>

> <a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>

</blockquote></div><br></div>