[PATCH] LLVM changes for indirect call target profiling support

Mon Jun 15 21:02:01 PDT 2015

"Betul Buyukkurt" <betulb at codeaurora.org> writes:
> Hi Justin,
>
> Since value profiling changes are very large in size and complexity wise
> required multiple design iterations (i.e. IC-profiling -> value profiling
> -> raw file size reduction), I did not reply to much of the minor
> commentary since the design changes usually resulted in those fixes being
> changed again anyway. So I saved any of those typo, naming, wording, doc
> extending etc related comments addressing for merge time. Today I've
> posted first of my merge intended CL's for review where all of your
> concerns are addressed.

I'm not sure why we need yet another thread for that. In any case, two
things:

1. There isn't much point in checking in the addition of the intrinsic
   separate from the lowering of the intrinsic, since the first patch
   doesn't actually do anything in that case.
2. My comment about precalculating the indices below seems to still need
   discussion.

> Thanks,
> -Betul
>
>> "Betul Buyukkurt" <betulb at codeaurora.org> writes:
>>>>> +'``llvm.instrprof_instrument_target``' Intrinsic
>>>>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> +
>>>>> +Syntax:
>>>>> +"""""""
>>>>> +
>>>>> +::
>>>>> +
>>>>> +      declare void @llvm.instrprof_instrument_target(i8* <name>, i64
>> <hash>,
>>>>> +                                                     i32
>> <value_kind>,
>>>>> +                                                     i64 <value>, i32
>> <index>)
>>>>> +Overview:
>>>>> +"""""""""
>>>>> +
>>>>> +The '``llvm.instrprof_instrument_target``' intrinsic can be emitted
>> by a
>>>>> +frontend for use with instrumentation based profiling. This will be
>>>>> +lowered by the ``-instrprof`` pass to find out the target values,
>>>>> +instrumented expressions take in a program at runtime.
>>>>> +
>>>>> +Arguments:
>>>>> +""""""""""
>>>>> +
>>>>> +The first argument is a pointer to a global variable containing the
>>>>> +name of the entity being instrumented. ``name`` should generally be
>> the
>>>>> +(mangled) function name for a set of counters.
>>>>> +
>>>>> +The second argument is a hash value that can be used by the consumer
>>>>> +of the profile data to detect changes to the instrumented source. It
>>>>> +is an error if ``hash`` differs between two instances of
>>>>> +``instrprof_instrument_target`` that refer to the same name.
>>>>> +
>>>>> +The third argument represents the kind of the value profiling that is
>> done.
>>>>> +The fourth argument is the value of the expression being profiled.
>> The last
>>>>> +argument is the index of the instrumented expression within ``name``.
>>>>> +It should be >= 0.
>>>>
>>>> These arguments need further explanation.
>>>
>>> Not addressed in the latest patch.
>>
>> Please do explain these.
>>
>>>>> +  data_type ReadDataV3(StringRef K, const unsigned char *D,
>> offset_type N) {
>>>>>      using namespace support;
>>>>> -    // We just treat the data as opaque here. It's simpler to handle
>> in
>>>>> -    // IndexedInstrProfReader.
>>>>> -    unsigned NumEntries = N / sizeof(uint64_t);
>>>>> -    DataBuffer.reserve(NumEntries);
>>>>> -    for (unsigned I = 0; I < NumEntries; ++I)
>>>>> -      DataBuffer.push_back(endian::readNext<uint64_t, little,
>> unaligned>(D));
>>>>> -    return data_type(K, DataBuffer);
>>>>> +    const unsigned char *Start = D;
>>>>> +    if (D - Start + sizeof (uint64_t) > N)
>>>>> +      return data_type();
>>>>> +    uint64_t DataBufferSize = endian::readNext<uint64_t, little,
>> unaligned>(D);
>>>>> +    DataBuffer.clear();
>>>>> +    DataBuffer.reserve(DataBufferSize);
>>>>> +    std::vector<uint64_t> CounterBuffer;
>>>>> +    std::vector<InstrProfValueRecord> ValueBuffer;
>>>>> +    for (unsigned I = 0; I < DataBufferSize; ++I) {
>>>>> +      if (D - Start + sizeof (uint64_t) + sizeof(uint32_t) >= N)
>>>>> +        return data_type();
>>>>> +      uint64_t Hash = endian::readNext<uint64_t, little,
>> unaligned>(D);
>>>>> +      uint32_t CountsSize = endian::readNext<uint32_t, little,
>> unaligned>(D);
>>>>> +      if (D - Start + CountsSize * sizeof(uint64_t) >= N)
>>>>> +        return data_type();
>>>>> +      CounterBuffer.clear();
>>>>> +      for (unsigned I = 0; I < CountsSize; ++I)
>>>>> +        CounterBuffer.push_back(endian::readNext<uint64_t, little,
>> unaligned>(D));
>>>>> +
>>>>> +      DataBuffer.push_back(InstrProfRecord(K, Hash, CounterBuffer));
>> +
>>>>> +      for (uint32_t Kind = instr_value_prof_kind::first;
>>>>> +           Kind < instr_value_prof_kind::last; ++Kind) {
>>>>> +        if (D - Start + 2 * sizeof(uint32_t) > N)
>>>>> +          return data_type();
>>>>> +        uint32_t ValueKind = endian::readNext<uint32_t, little,
>> unaligned>(D);
>>>>> +        if (ValueKind != Kind)
>>>>> +          return data_type();
>>>>> +        uint32_t ValuesSize = endian::readNext<uint32_t, little,
>> unaligned>(D);
>>>>> +        ValueBuffer.clear();
>>>>> +        for (unsigned I = 0; I < ValuesSize; ++I) {
>>>>> +          if (D - Start + sizeof(uint32_t) >= N) return data_type();
>>>>> +          uint32_t NameSize = endian::readNext<uint32_t, little,
>> unaligned>(D);
>>>>> +          if (D - Start + NameSize * sizeof(char)>= N)
>>>>> +            return data_type();
>>>>> +
>>>>> +          std::string Name((const char *)D, NameSize);
>>>>> +          D += NameSize;
>>>>> +
>>>>> +          if (D - Start + 2 * sizeof(uint64_t) >= N)
>>>>> +            return data_type();
>>>>> +          uint64_t Value = endian::readNext<uint64_t, little,
>> unaligned>(D);
>>>>> +          uint64_t NumTaken = endian::readNext<uint64_t, little,
>> unaligned>(D);
>>>>> +
>>>>> +          if (D - Start + sizeof(uint32_t) > N)
>>>>> +            return data_type();
>>>>> +          uint32_t CounterIndex = endian::readNext<uint32_t, little,
>> unaligned>(D);
>>>>> +          ValueBuffer.push_back(
>>>>> +            InstrProfValueRecord(Name, Value, NumTaken,
>> CounterIndex));
>>>>> +        }
>>>>> +        DataBuffer[DataBuffer.size()-1].Values[Kind] = ValueBuffer;
>>>>> +     }
>>>>
>>>> I guess this'll be simpler without having to deal with the kinds, but
>>>> there's also a lot of "D - Start + ..." math in here that I think would
>> be
>>>> easier to understand with a few more named variables.
>>>
>>> Not addressed in the latest patch.
>>
>> I guess this is waiting until we figure out how we're going forward with
>> the issue about the kinds, but I think you can make the math clearer in
>> the meantime regardless.
>>
>>>>> +    }
>>>>> +    return DataBuffer;
>>>>>    }
>>>>>  };
>>>>> +
>>>>>  typedef OnDiskIterableChainedHashTable<InstrProfLookupTrait>
>>>>>      InstrProfReaderIndex;
>>>>> @@ -267,8 +409,6 @@
>>>>>    std::unique_ptr<InstrProfReaderIndex> Index;
>>>>>    /// Iterator over the profile data.
>>>>>    InstrProfReaderIndex::data_iterator RecordIterator;
>>>>
>>>> This name is a bit confusing now - it iterates over arrays of records,
>>>> not individual ones.
>>>
>>> Not addressed in the latest patch.
>>
>> Why not?
>>
>>>>> -  /// Offset into our current data set.
>>>>> -  size_t CurrentOffset;
>>>>>    /// The file format version of the profile data.
>>>>>    uint64_t FormatVersion;
>>>>>    /// The maximal execution count among all functions.
>>>>> @@ -278,7 +418,7 @@
>>>>>    IndexedInstrProfReader &operator=(const IndexedInstrProfReader &) =
>>>>> delete;
>>>>>  public:
>>>>>    IndexedInstrProfReader(std::unique_ptr<MemoryBuffer> DataBuffer)
>>>>> -      : DataBuffer(std::move(DataBuffer)), Index(nullptr),
>>>>> CurrentOffset(0) {}
>>>>> +      : DataBuffer(std::move(DataBuffer)), Index(nullptr) {}
>>>>>    /// Return true if the given buffer is in an indexed instrprof
>>>>> format.
>>>>>    static bool hasFormat(const MemoryBuffer &DataBuffer);
>>>>> @@ -288,9 +428,16 @@
>>>>>    /// Read a single record.
>>>>>    std::error_code readNextRecord(InstrProfRecord &Record) override;
>>>>> -  /// Fill Counts with the profile data for the given function name.
>>>>> +  /// Return profile data for the given function name and hash
>>>>>    std::error_code getFunctionCounts(StringRef FuncName, uint64_t
>> FuncHash,
>>>>> -                                    std::vector<uint64_t> &Counts);
>>>>> +    std::vector<uint64_t> &Counts);
>>>>
>>>> It really does fill "Counts". It doesn't return anything, so the old
>> doc
>>>> comment was better.
>>>
>>> Done.
>>>
>>>> Also the formatting is funny here and for the function below -
>>>> clang-format will fix that for you.
>>>
>>> Not addressed in the latest patch.
>>
>> Why not?
>>
>>>>> +  /// Count the number of instrumented value sites for the function.
>>>>> +  void computeNumValueSiteCounts(InstrProfInstrumentTargetInst *Ins);
>> +
>>>>> +  /// Replace instrprof_instrument_target with a call to runtime
>> library.
>>>>> +  void lowerInstrumentTargetInst(InstrProfInstrumentTargetInst *Ins);
>>>>> +
>>>>>    /// Replace instrprof_increment with an increment of the
>> appropriate value.
>>>>>    void lowerIncrement(InstrProfIncrementInst *Inc);
>>>>> @@ -117,20 +132,39 @@
>>>>>    bool MadeChange = false;
>>>>>    this->M = &M;
>>>>> -  RegionCounters.clear();
>>>>> +  ProfileDataMap.clear();
>>>>>    UsedVars.clear();
>>>>> +  // We did not know how many value sites there would be inside
>>>>> +  // the instrumented function. This is counting the number of
>> instrumented
>>>>> +  // target value sites to enter it as field in the profile data
>> variable.
>>>>> +  for (Function &F : M)
>>>>> +    for (BasicBlock &BB : F)
>>>>> +      for (auto I = BB.begin(), E = BB.end(); I != E;)
>>>>> +        if (auto *Ind = dyn_cast<InstrProfInstrumentTargetInst>(I++))
>>>>> +          computeNumValueSiteCounts(Ind);
>>>>
>>>> The frontend should pass the number of value sites in the function in
>>>> the arguments to the intrinsic, like we do for the number of counters
>> in
>>>> the increment intrinsic. In addition to being simpler, it's more robust
>>>> against things like dead code elimination which could remove the call
>> to
>>>> an intrinsic completely.
>>>
>>> Unfortunately, this is not straightforwardly doable. We do not know the
>>> number of target expressions we'd like to profile prior to inserting the
>>> intrinsics. This is due to the fact that AST level expressions are not
>> one
>>> to one mappable to the generated LLVM IR level expressions. In the case
>> of
>>> indirect calls for instance the number of calls generated per a single
>>> CXXDestructor could be more than one.
>>>
>>> Instead the current implementation assigns an index by incrementing the
>>> NumValueSites counter at each point where a target expression is
>> detected.
>>> The index value is passed along to the InstrProfiling.cpp as an argument
>>> of the intrinsic. Then NumValueSites is determined by taking the max of
>>> all indices recorded for a function + 1.
>>
>> Getting the indices by incrementing NumValueSites as you go leads to a
>> few problems - this is actually how I initially implemented the counter
>> indices, but changed it after finding that it doesn't actually work.
>>
>> The problems occur if a call site is eliminated before we get to the
>> lowering, which can happen today from certain types of dead code
>> elimination and may become more common as we make improvements to how
>> lowering and optimizations interact. Numbering in the backend means that
>> the counter indices and even the number of counters may change depending
>> on optimization and essentially disassociates the results from the PGO
>> hash and the frontend locations. This loses any advantages of doing
>> instrumentation in the frontend, and can easily lead to unintentional
>> pessimization when we try to use the data since it's no longer mapped
>> correctly.
>>
>>>>> +
>>>>>    for (Function &F : M)
>>>>>      for (BasicBlock &BB : F)
>>>>>        for (auto I = BB.begin(), E = BB.end(); I != E;)
>>>>>          if (auto *Inc = dyn_cast<InstrProfIncrementInst>(I++)) {
>>>>>            lowerIncrement(Inc);
>>>>>            MadeChange = true;
>>>>>          }
>>>>> +
>>>>> +  for (Function &F : M)
>>>>> +    for (BasicBlock &BB : F)
>>>>> +      for (auto I = BB.begin(), E = BB.end(); I != E;)
>>>>> +        if (auto *Ind = dyn_cast<InstrProfInstrumentTargetInst>(I++))
>> {
>>>>> +          lowerInstrumentTargetInst(Ind);
>>>>> +          MadeChange = true;
>>>>> +        }
>>>>
>>>> It seems like we could do both of these in one loop, no?
>>>
>>> Not addressed in this patch.
>>
>> Why not?
>>