[llvm] r311647 - Model cache size and associativity in TargetTransformInfo

Fri Aug 25 10:58:31 PDT 2017

Once https://reviews.llvm.org/D35348 goes in maybe we can detect it with a
feature bit.

~Craig

On Fri, Aug 25, 2017 at 10:54 AM, Craig Topper <craig.topper at gmail.com>
wrote:

> I believe there is a getCPU() method in the Subtarget. It's inherited from
> MCSubtargetInfo.
>
> ~Craig
>
> On Fri, Aug 25, 2017 at 10:29 AM, Tobias Grosser via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> On Fri, Aug 25, 2017, at 19:20, Craig Topper via llvm-commits wrote:
>> > clang doesn't support -mtune. But we do have -march. llc calls it -mcpu.
>> > "skylake" refers to the mobile and desktop version with 4-way
>> > associativity. "skylake-avx512" refers to the Xeon server version.
>>
>> The X86TargetTransformInfo seems to only refer to instruction set
>> properties. I can check for AVX512 && !Atom. Maybe this will give me the
>> skylake-avx512. However, this seems rather encrypted. I should probably
>> look for some better alternative. Any idea?
>>
>>  // Attempt to lookup cost.
>>   if (ST->hasCDI())
>>     if (const auto *Entry = CostTableLookup(AVX512CDCostTbl, ISD, MTy))
>>       return LT.first * Entry->Cost;
>>
>>   if (ST->hasBWI())
>>     if (const auto *Entry = CostTableLookup(AVX512BWCostTbl, ISD, MTy))
>>       return LT.first * Entry->Cost;
>>
>>   if (ST->hasAVX512())
>>
>> Best,
>> Tobias
>>
>> >
>> > ~Craig
>> >
>> > On Fri, Aug 25, 2017 at 10:11 AM, Tobias Grosser <tobias at grosser.es>
>> > wrote:
>> >
>> > > On Thu, Aug 24, 2017, at 23:39, Craig Topper via llvm-commits wrote:
>> > > > I believe Skylake client's L2 associativity is only 4 way. While
>> Skylake
>> > > > server has a much larger L2 with more associativity.
>> > > >
>> > > > Is this something we should get from CPUID if the user does
>> -mcpu=native?
>> > >
>> > > I was so glad that -- according to 7-cpu -- all have the same
>> > > characteristics. But it seems I indeed overlooked the 4way
>> associativity
>> > > of skylake.
>> > >
>> > > I wonder what would be the best way to model this. Are there different
>> > > mtune flags for Skylake server and client? Gcc distinguishes betweek
>> > > skylake and ‘skylake-avx512’? Are these the two variants you talk
>> about?
>> > >
>> > > Best,
>> > > Tobias
>> > >
>> > > > ~Craig
>> > > >
>> > > > On Thu, Aug 24, 2017 at 2:46 AM, Tobias Grosser via llvm-commits <
>> > > > llvm-commits at lists.llvm.org> wrote:
>> > > >
>> > > > > Author: grosser
>> > > > > Date: Thu Aug 24 02:46:25 2017
>> > > > > New Revision: 311647
>> > > > >
>> > > > > URL: http://llvm.org/viewvc/llvm-project?rev=311647&view=rev
>> > > > > Log:
>> > > > > Model cache size and associativity in TargetTransformInfo
>> > > > >
>> > > > > Summary:
>> > > > > We add the precise cache sizes and associativity for the following
>> > > Intel
>> > > > > architectures:
>> > > > >
>> > > > >   - Penry
>> > > > >   - Nehalem
>> > > > >   - Westmere
>> > > > >   - Sandy Bridge
>> > > > >   - Ivy Bridge
>> > > > >   - Haswell
>> > > > >   - Broadwell
>> > > > >   - Skylake
>> > > > >   - Kabylake
>> > > > >
>> > > > > Polly uses since several months a performance model for BLAS
>> > > computations
>> > > > > that
>> > > > > derives optimal cache and register tile sizes from cache and
>> latency
>> > > > > information (based on ideas from "Analytical Modeling Is Enough
>> for
>> > > > > High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
>> > > > > While bootstrapping this model, these target values have been
>> kept in
>> > > > > Polly.
>> > > > > However, as our implementation is now rather mature, it seems
>> time to
>> > > teach
>> > > > > LLVM itself about cache sizes.
>> > > > >
>> > > > > Interestingly, L1 and L2 cache sizes are pretty constant across
>> > > > > micro-architectures, hence a set of architecture specific default
>> > > values
>> > > > > seems like a good start. They can be expanded to more target
>> specific
>> > > > > values,
>> > > > > in case certain newer architectures require different values. For
>> now
>> > > a set
>> > > > > of Intel architectures are provided.
>> > > > >
>> > > > > Just as a little teaser, for a simple gemm kernel this model
>> allows us
>> > > to
>> > > > > improve performance from 1.2s to 0.27s. For gemm kernels with less
>> > > optimal
>> > > > > memory layouts even larger speedups can be reported.
>> > > > >
>> > > > > Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman,
>> > > fhahn,
>> > > > > sebpop, efriedma, asb
>> > > > >
>> > > > > Reviewed By: fhahn, asb
>> > > > >
>> > > > > Subscribers: lsaba, asb, pollydev, llvm-commits
>> > > > >
>> > > > > Differential Revision: https://reviews.llvm.org/D37051
>> > > > >
>> > > > > Modified:
>> > > > >     llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h
>> > > > >     llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h
>> > > > >     llvm/trunk/lib/Analysis/TargetTransformInfo.cpp
>> > > > >     llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>> > > > >     llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h
>> > > > >
>> > > > > Modified: llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h
>> > > > > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/
>> > > llvm/Analysis/
>> > > > > TargetTransformInfo.h?rev=311647&r1=311646&r2=311647&view=diff
>> > > > > ============================================================
>> > > > > ==================
>> > > > > --- llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h
>> (original)
>> > > > > +++ llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h Thu
>> Aug 24
>> > > > > 02:46:25 2017
>> > > > > @@ -603,6 +603,22 @@ public:
>> > > > >    /// \return The size of a cache line in bytes.
>> > > > >    unsigned getCacheLineSize() const;
>> > > > >
>> > > > > +  /// The possible cache levels
>> > > > > +  enum class CacheLevel {
>> > > > > +    L1D,   // The L1 data cache
>> > > > > +    L2D,   // The L2 data cache
>> > > > > +
>> > > > > +    // We currently do not model L3 caches, as their sizes differ
>> > > widely
>> > > > > between
>> > > > > +    // microarchitectures. Also, we currently do not have a use
>> for L3
>> > > > > cache
>> > > > > +    // size modeling yet.
>> > > > > +  };
>> > > > > +
>> > > > > +  /// \return The size of the cache level in bytes, if available.
>> > > > > +  llvm::Optional<unsigned> getCacheSize(CacheLevel Level) const;
>> > > > > +
>> > > > > +  /// \return The associativity of the cache level, if available.
>> > > > > +  llvm::Optional<unsigned> getCacheAssociativity(CacheLevel
>> Level)
>> > > const;
>> > > > > +
>> > > > >    /// \return How much before a load we should place the prefetch
>> > > > > instruction.
>> > > > >    /// This is currently measured in number of instructions.
>> > > > >    unsigned getPrefetchDistance() const;
>> > > > > @@ -937,6 +953,8 @@ public:
>> > > > >    virtual bool shouldConsiderAddressTypePromotion(
>> > > > >        const Instruction &I, bool &AllowPromotionWithoutCommonHe
>> ader)
>> > > = 0;
>> > > > >    virtual unsigned getCacheLineSize() = 0;
>> > > > > +  virtual llvm::Optional<unsigned> getCacheSize(CacheLevel
>> Level) = 0;
>> > > > > +  virtual llvm::Optional<unsigned> getCacheAssociativity(CacheLev
>> el
>> > > > > Level) = 0;
>> > > > >    virtual unsigned getPrefetchDistance() = 0;
>> > > > >    virtual unsigned getMinPrefetchStride() = 0;
>> > > > >    virtual unsigned getMaxPrefetchIterationsAhead() = 0;
>> > > > > @@ -1209,6 +1227,12 @@ public:
>> > > > >    unsigned getCacheLineSize() override {
>> > > > >      return Impl.getCacheLineSize();
>> > > > >    }
>> > > > > +  llvm::Optional<unsigned> getCacheSize(CacheLevel Level)
>> override {
>> > > > > +    return Impl.getCacheSize(Level);
>> > > > > +  }
>> > > > > +  llvm::Optional<unsigned> getCacheAssociativity(CacheLevel
>> Level)
>> > > > > override {
>> > > > > +    return Impl.getCacheAssociativity(Level);
>> > > > > +  }
>> > > > >    unsigned getPrefetchDistance() override { return
>> > > > > Impl.getPrefetchDistance(); }
>> > > > >    unsigned getMinPrefetchStride() override {
>> > > > >      return Impl.getMinPrefetchStride();
>> > > > >
>> > > > > Modified: llvm/trunk/include/llvm/Analys
>> is/TargetTransformInfoImpl.h
>> > > > > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/
>> > > llvm/Analysis/
>> > > > > TargetTransformInfoImpl.h?rev=311647&r1=311646&r2=311647&vie
>> w=diff
>> > > > > ============================================================
>> > > > > ==================
>> > > > > --- llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h
>> > > (original)
>> > > > > +++ llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h
>> Thu
>> > > Aug 24
>> > > > > 02:46:25 2017
>> > > > > @@ -340,6 +340,29 @@ public:
>> > > > >
>> > > > >    unsigned getCacheLineSize() { return 0; }
>> > > > >
>> > > > > +  llvm::Optional<unsigned> getCacheSize(TargetTransformInfo::
>> > > CacheLevel
>> > > > > Level) {
>> > > > > +    switch (Level) {
>> > > > > +    case TargetTransformInfo::CacheLevel::L1D:
>> > > > > +      LLVM_FALLTHROUGH;
>> > > > > +    case TargetTransformInfo::CacheLevel::L2D:
>> > > > > +      return llvm::Optional<unsigned>();
>> > > > > +    }
>> > > > > +
>> > > > > +    llvm_unreachable("Unknown TargetTransformInfo::CacheLevel");
>> > > > > +  }
>> > > > > +
>> > > > > +  llvm::Optional<unsigned> getCacheAssociativity(
>> > > > > +    TargetTransformInfo::CacheLevel Level) {
>> > > > > +    switch (Level) {
>> > > > > +    case TargetTransformInfo::CacheLevel::L1D:
>> > > > > +      LLVM_FALLTHROUGH;
>> > > > > +    case TargetTransformInfo::CacheLevel::L2D:
>> > > > > +      return llvm::Optional<unsigned>();
>> > > > > +    }
>> > > > > +
>> > > > > +    llvm_unreachable("Unknown TargetTransformInfo::CacheLevel");
>> > > > > +  }
>> > > > > +
>> > > > >    unsigned getPrefetchDistance() { return 0; }
>> > > > >
>> > > > >    unsigned getMinPrefetchStride() { return 1; }
>> > > > >
>> > > > > Modified: llvm/trunk/lib/Analysis/TargetTransformInfo.cpp
>> > > > > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/
>> > > > > Analysis/TargetTransformInfo.cpp?rev=311647&r1=311646&r2=
>> > > 311647&view=diff
>> > > > > ============================================================
>> > > > > ==================
>> > > > > --- llvm/trunk/lib/Analysis/TargetTransformInfo.cpp (original)
>> > > > > +++ llvm/trunk/lib/Analysis/TargetTransformInfo.cpp Thu Aug 24
>> > > 02:46:25
>> > > > > 2017
>> > > > > @@ -321,6 +321,16 @@ unsigned TargetTransformInfo::getCacheLi
>> > > > >    return TTIImpl->getCacheLineSize();
>> > > > >  }
>> > > > >
>> > > > > +llvm::Optional<unsigned> TargetTransformInfo::getCacheS
>> ize(CacheLevel
>> > > > > Level)
>> > > > > +  const {
>> > > > > +  return TTIImpl->getCacheSize(Level);
>> > > > > +}
>> > > > > +
>> > > > > +llvm::Optional<unsigned> TargetTransformInfo::getCacheA
>> ssociativity(
>> > > > > +  CacheLevel Level) const {
>> > > > > +  return TTIImpl->getCacheAssociativity(Level);
>> > > > > +}
>> > > > > +
>> > > > >  unsigned TargetTransformInfo::getPrefetchDistance() const {
>> > > > >    return TTIImpl->getPrefetchDistance();
>> > > > >  }
>> > > > >
>> > > > > Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>> > > > > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/
>> > > > > X86/X86TargetTransformInfo.cpp?rev=311647&r1=311646&r2=
>> > > 311647&view=diff
>> > > > > ============================================================
>> > > > > ==================
>> > > > > --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>> (original)
>> > > > > +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp Thu Aug
>> 24
>> > > > > 02:46:25 2017
>> > > > > @@ -66,6 +66,57 @@ X86TTIImpl::getPopcntSupport(unsigned Ty
>> > > > >    return ST->hasPOPCNT() ? TTI::PSK_FastHardware :
>> TTI::PSK_Software;
>> > > > >  }
>> > > > >
>> > > > > +llvm::Optional<unsigned> X86TTIImpl::getCacheSize(
>> > > > > +  TargetTransformInfo::CacheLevel Level) const {
>> > > > > +  switch (Level) {
>> > > > > +  case TargetTransformInfo::CacheLevel::L1D:
>> > > > > +    //   - Penry
>> > > > > +    //   - Nehalem
>> > > > > +    //   - Westmere
>> > > > > +    //   - Sandy Bridge
>> > > > > +    //   - Ivy Bridge
>> > > > > +    //   - Haswell
>> > > > > +    //   - Broadwell
>> > > > > +    //   - Skylake
>> > > > > +    //   - Kabylake
>> > > > > +    return 32 * 1024;  //  32 KByte
>> > > > > +  case TargetTransformInfo::CacheLevel::L2D:
>> > > > > +    //   - Penry
>> > > > > +    //   - Nehalem
>> > > > > +    //   - Westmere
>> > > > > +    //   - Sandy Bridge
>> > > > > +    //   - Ivy Bridge
>> > > > > +    //   - Haswell
>> > > > > +    //   - Broadwell
>> > > > > +    //   - Skylake
>> > > > > +    //   - Kabylake
>> > > > > +    return 256 * 1024; // 256 KByte
>> > > > > +  }
>> > > > > +
>> > > > > +  llvm_unreachable("Unknown TargetTransformInfo::CacheLevel");
>> > > > > +}
>> > > > > +
>> > > > > +llvm::Optional<unsigned> X86TTIImpl::getCacheAssociativity(
>> > > > > +  TargetTransformInfo::CacheLevel Level) const {
>> > > > > +  //   - Penry
>> > > > > +  //   - Nehalem
>> > > > > +  //   - Westmere
>> > > > > +  //   - Sandy Bridge
>> > > > > +  //   - Ivy Bridge
>> > > > > +  //   - Haswell
>> > > > > +  //   - Broadwell
>> > > > > +  //   - Skylake
>> > > > > +  //   - Kabylake
>> > > > > +  switch (Level) {
>> > > > > +  case TargetTransformInfo::CacheLevel::L1D:
>> > > > > +    LLVM_FALLTHROUGH;
>> > > > > +  case TargetTransformInfo::CacheLevel::L2D:
>> > > > > +    return 8;
>> > > > > +  }
>> > > > > +
>> > > > > +  llvm_unreachable("Unknown TargetTransformInfo::CacheLevel");
>> > > > > +}
>> > > > > +
>> > > > >  unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) {
>> > > > >    if (Vector && !ST->hasSSE1())
>> > > > >      return 0;
>> > > > >
>> > > > > Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h
>> > > > > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/
>> > > > > X86/X86TargetTransformInfo.h?rev=311647&r1=311646&r2=311647&
>> view=diff
>> > > > > ============================================================
>> > > > > ==================
>> > > > > --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h (original)
>> > > > > +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h Thu Aug 24
>> > > > > 02:46:25 2017
>> > > > > @@ -47,6 +47,14 @@ public:
>> > > > >
>> > > > >    /// @}
>> > > > >
>> > > > > +  /// \name Cache TTI Implementation
>> > > > > +  /// @{
>> > > > > +  llvm::Optional<unsigned> getCacheSize(
>> > > > > +    TargetTransformInfo::CacheLevel Level) const;
>> > > > > +  llvm::Optional<unsigned> getCacheAssociativity(
>> > > > > +    TargetTransformInfo::CacheLevel Level) const;
>> > > > > +  /// @}
>> > > > > +
>> > > > >    /// \name Vector TTI Implementations
>> > > > >    /// @{
>> > > > >
>> > > > >
>> > > > >
>> > > > > _______________________________________________
>> > > > > llvm-commits mailing list
>> > > > > llvm-commits at lists.llvm.org
>> > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>> > > > >
>> > > > _______________________________________________
>> > > > llvm-commits mailing list
>> > > > llvm-commits at lists.llvm.org
>> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>> > >
>> > _______________________________________________
>> > llvm-commits mailing list
>> > llvm-commits at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170825/95421801/attachment-0001.html>