[PATCH] Store intrinsic ID by value in Function instead of a string lookup. NFC

Duncan P. N. Exon Smith dexonsmith at apple.com
Mon May 18 16:02:07 PDT 2015


> On 2015-May-18, at 14:53, Pete Cooper <peter_cooper at apple.com> wrote:
> 
> Hi dexonsmith,
> 
> On 64-bit targets, Function has 4-bytes of padding in its struct layout.
> 
> This uses the space for the intrinsic ID. It is set and recalculated whenever the function name is set.  This is similar to the current behavior which clears the function from the intrinsic ID cache when its renamed.
> 
> The intrinsic cache itself is removed as the only purpose was to speedup calls to getIntrinsicID() which now just reading the new field in the struct.
> 
> http://reviews.llvm.org/D9836
> 
> Files:
>  include/llvm/IR/Function.h
>  include/llvm/IR/GlobalValue.h
>  include/llvm/IR/Intrinsics.h
>  include/llvm/IR/Value.h
>  lib/IR/Function.cpp
>  lib/IR/LLVMContextImpl.h
>  lib/IR/Value.cpp
> 
> EMAIL PREFERENCES
>  http://reviews.llvm.org/settings/panel/emailpreferences/
> <D9836.26018.patch>

Previously, the intrinsic ID would only be looked up if
`getIntrinsicID()` was called; now it happens whenever `setName()` is
used.  How sure are you that pre-calculating this answer won't cause a
slowdown for (e.g.) JIT consumers?

Also, previously this was cached per LLVMContext.  So if you looked up the
same intrinsic in different modules (or multiple times on the same
module), you only had to do the "work" once.  Are there any places LLVM is
used where saving the lookup on the LLVMContext is important?  (Is this
the sort of thing a JIT does (multiple modules on the same context)?)

Finally, is it a good idea to specify that `Intrinsic::ID` is backed by
an `unsigned`?  (More below.)

Aside from those questions, I just have nitpicks below.  It seems nice
to factor out the cache, and even for 32-bit targets the memory seems like
a low price to pay.

> Index: include/llvm/IR/Function.h
> ===================================================================
> --- include/llvm/IR/Function.h
> +++ include/llvm/IR/Function.h
> @@ -109,10 +109,6 @@
>    Function(const Function&) = delete;
>    void operator=(const Function&) = delete;
>  
> -  /// Do the actual lookup of an intrinsic ID when the query could not be
> -  /// answered from the cache.
> -  unsigned lookupIntrinsicID() const LLVM_READONLY;
> -

Can this change (making the function local to `Function.cpp`) be done in
a separate commit?

>    /// Function ctor - If the (optional) Module argument is specified, the
>    /// function is automatically inserted into the end of the function list for
>    /// the module.
> @@ -147,13 +143,16 @@
>    /// intrinsic, or if the pointer is null.  This value is always defined to be
>    /// zero to allow easy checking for whether a function is intrinsic or not.
>    /// The particular intrinsic functions which correspond to this value are
> -  /// defined in llvm/Intrinsics.h.  Results are cached in the LLVM context,
> -  /// subsequent requests for the same ID return results much faster from the
> -  /// cache.
> +  /// defined in llvm/Intrinsics.h.
>    ///
> -  unsigned getIntrinsicID() const LLVM_READONLY;
> +  unsigned getIntrinsicID() const LLVM_READONLY { return IntID; }
>    bool isIntrinsic() const { return getName().startswith("llvm."); }
>  
> +  /// \brief Recalculate the ID for this function if it is an Intrinsic defined
> +  /// in llvm/Intrinsics.h.  Sets the intrinsic ID to Intrinsic::not_intrinsic
> +  /// if the name of this function does not match an intrinsic in that header.
> +  void recalculateIntrinsicID();
> +

Should this be `private`?

>    /// getCallingConv()/setCallingConv(CC) - These method get and set the
>    /// calling convention of this function.  The enum values for the known
>    /// calling conventions are defined in CallingConv.h.
> Index: include/llvm/IR/GlobalValue.h
> ===================================================================
> --- include/llvm/IR/GlobalValue.h
> +++ include/llvm/IR/GlobalValue.h
> @@ -28,6 +28,10 @@
>  class PointerType;
>  class Module;
>  
> +namespace Intrinsic {
> +  enum ID : unsigned;
> +};
> +
>  class GlobalValue : public Constant {
>    GlobalValue(const GlobalValue &) = delete;
>  public:
> @@ -66,7 +70,7 @@
>        : Constant(Ty, VTy, Ops, NumOps), Linkage(Linkage),
>          Visibility(DefaultVisibility), UnnamedAddr(0),
>          DllStorageClass(DefaultStorageClass),
> -        ThreadLocal(NotThreadLocal), Parent(nullptr) {
> +        ThreadLocal(NotThreadLocal), IntID((Intrinsic::ID)0U), Parent(nullptr) {
>      setName(Name);
>    }
>  
> @@ -84,7 +88,16 @@
>    // Give subclasses access to what otherwise would be wasted padding.
>    // (19 + 3 + 2 + 1 + 2 + 5) == 32.
>    unsigned SubClassData : 19;
> +
>  protected:
> +  /// \brief The intrinsic ID for this subclass (which must be a Function).
> +  ///
> +  /// This member is defined by this class, but not used for anything.
> +  /// Subclasses can use it to store their intrinsic ID, if they have one.
> +  ///
> +  /// This is stored here to save space in Function on 64-bit hosts.
> +  Intrinsic::ID IntID;
> +
>    static const unsigned GlobalValueSubClassDataBits = 19;
>    unsigned getGlobalValueSubClassData() const {
>      return SubClassData;
> Index: include/llvm/IR/Intrinsics.h
> ===================================================================
> --- include/llvm/IR/Intrinsics.h
> +++ include/llvm/IR/Intrinsics.h
> @@ -32,7 +32,7 @@
>  /// function known by LLVM. The enum values are returned by
>  /// Function::getIntrinsicID().
>  namespace Intrinsic {
> -  enum ID {
> +  enum ID : unsigned {

This seems like a reasonable restriction to me (32-bits).  However, I
believe this means we lose covered `switch` warnings.  Is that a
problem?

Either way, it seems independent of your change -- you can just use an
`unsigned` for `IntID` up above and `assert()` that the intrinsic ID is
less than 32-bits.

>      not_intrinsic = 0,   // Must be zero
>  
>      // Get the intrinsic enums generated from Intrinsics.td
> Index: include/llvm/IR/Value.h
> ===================================================================
> --- include/llvm/IR/Value.h
> +++ include/llvm/IR/Value.h
> @@ -216,6 +216,7 @@
>  
>  private:
>    void destroyValueName();
> +  void setNameImpl(const Twine &Name);
>  
>  public:
>    /// \brief Return a constant reference to the value's name.
> Index: lib/IR/Function.cpp
> ===================================================================
> --- lib/IR/Function.cpp
> +++ lib/IR/Function.cpp
> @@ -266,8 +266,10 @@
>      ParentModule->getFunctionList().push_back(this);
>  
>    // Ensure intrinsics have the right parameter attributes.
> -  if (unsigned IID = getIntrinsicID())
> -    setAttributes(Intrinsic::getAttributes(getContext(), Intrinsic::ID(IID)));
> +  // Note, the IntID field will have been set in Value::setName if this function
> +  // name is a valid intrinsic ID.
> +  if (IntID)
> +    setAttributes(Intrinsic::getAttributes(getContext(), IntID));
>  

`clang-format` would clean up this unnecessary newline ;).

>  }
>  
> @@ -280,10 +282,6 @@
>  
>    // Remove the function from the on-the-side GC table.
>    clearGC();
> -
> -  // Remove the intrinsicID from the Cache.
> -  if (getValueName() && isIntrinsic())
> -    getContext().pImpl->IntrinsicIDCache.erase(this);
>  }
>  
>  void Function::BuildLazyArguments() const {
> @@ -433,41 +431,26 @@
>      setPrologueData(nullptr);
>  }
>  
> -/// getIntrinsicID - This method returns the ID number of the specified
> -/// function, or Intrinsic::not_intrinsic if the function is not an
> -/// intrinsic, or if the pointer is null.  This value is always defined to be
> -/// zero to allow easy checking for whether a function is intrinsic or not.  The
> -/// particular intrinsic functions which correspond to this value are defined in
> -/// llvm/Intrinsics.h.  Results are cached in the LLVM context, subsequent
> -/// requests for the same ID return results much faster from the cache.
> -///
> -unsigned Function::getIntrinsicID() const {
> -  const ValueName *ValName = this->getValueName();
> -  if (!ValName || !isIntrinsic())
> -    return 0;
> -
> -  LLVMContextImpl::IntrinsicIDCacheTy &IntrinsicIDCache =
> -    getContext().pImpl->IntrinsicIDCache;
> -  if (!IntrinsicIDCache.count(this)) {
> -    unsigned Id = lookupIntrinsicID();
> -    IntrinsicIDCache[this]=Id;
> -    return Id;
> -  }
> -  return IntrinsicIDCache[this];
> -}
> -
> -/// This private method does the actual lookup of an intrinsic ID when the query
> -/// could not be answered from the cache.
> -unsigned Function::lookupIntrinsicID() const {
> -  const ValueName *ValName = this->getValueName();
> +/// \brief This does the actual lookup of an intrinsic ID which
> +/// matches the given function name.
> +static Intrinsic::ID lookupIntrinsicID(const ValueName *ValName) {

Can this change (making the function local to `Function.cpp`) be done in
a separate commit?  (Second half.)

>    unsigned Len = ValName->getKeyLength();
>    const char *Name = ValName->getKeyData();
>  
>  #define GET_FUNCTION_RECOGNIZER
>  #include "llvm/IR/Intrinsics.gen"
>  #undef GET_FUNCTION_RECOGNIZER
>  
> -  return 0;
> +  return Intrinsic::not_intrinsic;
> +}
> +
> +void Function::recalculateIntrinsicID() {
> +  const ValueName *ValName = this->getValueName();
> +  if (!ValName || !isIntrinsic()) {
> +    IntID = Intrinsic::not_intrinsic;
> +    return;
> +  }
> +  IntID = lookupIntrinsicID(ValName);
>  }
>  
>  /// Returns a stable mangling for the type specified for use in the name
> Index: lib/IR/LLVMContextImpl.h
> ===================================================================
> --- lib/IR/LLVMContextImpl.h
> +++ lib/IR/LLVMContextImpl.h
> @@ -1000,11 +1000,6 @@
>    /// instructions in different blocks at the same location.
>    DenseMap<std::pair<const char *, unsigned>, unsigned> DiscriminatorTable;
>  
> -  /// IntrinsicIDCache - Cache of intrinsic name (string) to numeric ID mappings
> -  /// requested in this context
> -  typedef DenseMap<const Function*, unsigned> IntrinsicIDCacheTy;
> -  IntrinsicIDCacheTy IntrinsicIDCache;
> -
>    /// \brief Mapping from a function to its prefix data, which is stored as the
>    /// operand of an unparented ReturnInst so that the prefix data has a Use.
>    typedef DenseMap<const Function *, ReturnInst *> PrefixDataMapTy;
> Index: lib/IR/Value.cpp
> ===================================================================
> --- lib/IR/Value.cpp
> +++ lib/IR/Value.cpp
> @@ -166,7 +166,7 @@
>    return getValueName()->getKey();
>  }
>  
> -void Value::setName(const Twine &NewName) {
> +void Value::setNameImpl(const Twine &NewName) {
>    // Fast path for common IRBuilder case of setName("") when there is no name.
>    if (NewName.isTriviallyEmpty() && !hasName())
>      return;
> @@ -187,9 +187,6 @@
>    if (getSymTab(this, ST))
>      return;  // Cannot set a name on this value (e.g. constant).
>  
> -  if (Function *F = dyn_cast<Function>(this))
> -    getContext().pImpl->IntrinsicIDCache.erase(F);
> -
>    if (!ST) { // No symbol table to update?  Just do the change.
>      if (NameRef.empty()) {
>        // Free the name for this value.
> @@ -222,6 +219,12 @@
>    setValueName(ST->createValueName(NameRef, this));
>  }
>  
> +void Value::setName(const Twine &NewName) {
> +  setNameImpl(NewName);
> +  if (Function *F = dyn_cast<Function>(this))
> +    F->recalculateIntrinsicID();
> +}
> +
>  void Value::takeName(Value *V) {
>    ValueSymbolTable *ST = nullptr;
>    // If this value has a name, drop it.
> 





More information about the llvm-commits mailing list