[PATCH] InstrProf: Calculate a better function hash

Duncan P. N. Exon Smith dexonsmith at apple.com
Wed Mar 26 16:27:17 PDT 2014


On Mar 26, 2014, at 4:02 PM, Chandler Carruth <chandlerc at google.com> wrote:

> 
> On Wed, Mar 26, 2014 at 3:53 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> On Mar 26, 2014, at 3:12 PM, Chandler Carruth <chandlerc at google.com> wrote:
> 
> > +uint64_t PGOHash::finalize() {
> > +  // Use Working as the hash directly if we never used MD5.
> > +  if (Count <= NumTypesPerWord)
> > +    return Working;
> >
> > Doesn't this need to byteswap to LE in the event of a BE host?
> 
> I don’t think so.
> 
> All the math on Working is 64-bit integer math.  In IRGen with
> -fprofile-instr-generate, it's emitted as an i64.  The various writers
> and readers treat it as a 64-bit value (byte swapping as necessary)
> until -fprofile-instr-use sees it on the other side.
> 
> So, I still don't understand how this works: host = LE, target = BE; the IRGen writes an i64 bigendian hash, the profile reader reads an i64 littleendian hash, boom? I'm probably missing a step…

The hash here (Working) is just like any other number in IRGen.  Its byte-
swapping requirements are no different from, e.g.,
CodeGenPGO::NumRegionCounters.

Here’s the flow (with the current file formats):

  - IRGen emits an i64 as part of the __llvm_profile_data_foo aggregate.

  - CodeGen emits it in the right section, byte swapping if host!=target.

  - The profile runtime dumps it in the raw binary format.  This code is in
    compiler-rt:lib/profile/InstrProfilingFile.c and
    compiler-rt:lib/profile/InstrProfilingBuffer.c.

  - llvm-profdata reads it in, byte swapping if host!=target.  This code is
    in llvm:lib/ProfileData/InstrProfReader.cpp.

  - llvm-profdata writes it out with printf in the text (!) format.  (This
    step *should* write an indexed binary format.  Not there yet.)

  - clang will read it in from text (!) with strtoull.

FYI (on a tangent here), the raw binary format deserves more detailed
documentation in an .rst.  On my todo list.

> We do need to byte swap the inputs and outputs to MD5, since its
> interface uses uint8_t.
> 
> Yea, this part makes sense.
>  
> 
> Does this deserve a comment somewhere?
> 
> Based on my comment above, yea, because I still don't get it. =D
>  




More information about the cfe-commits mailing list