[PATCH] D76255: [SampleFDO] Port MD5 name table support to extbinary format.

Wei Mi via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Mar 20 14:40:13 PDT 2020


wmi marked 3 inline comments as done.
wmi added a comment.

In D76255#1931410 <https://reviews.llvm.org/D76255#1931410>, @wenlei wrote:

> Great to MD5 support added to ext-bin, thanks @wmi. After this change it seems compact-bin is completely superseded by ext-bin, do you plan to deprecate/remove it?


Sorry for the late reply.

I plan to deprecate it, but only after people migrate to the new format. The problem is I don't know who is using the old format. I think I can leave it for longer time before deprecate it.



================
Comment at: llvm/include/llvm/ProfileData/SampleProf.h:157
 
-enum SecFlags { SecFlagInValid = 0, SecFlagCompress = (1 << 0) };
+enum SecFlags { SecFlagInValid = 0, SecFlagCompress = (1 << 0), SecFlagMD5Name = (1 << 1) };
 
----------------
wenlei wrote:
> I thought `SecFlags` is a set of general purpose flags that can be applied to all section, at least `SecFlagCompress` falls under that category. For Name vs MD5Name, to some extended it's like section with different content. So would it make sense to have `SecMD5NameTable` in addition to `SecNameTable` for `SecType`, instead of relying on `SecFlags`? All reader/writer code can still be shared between the two.
SecFlags was not designed to be general purpose only. I want to make it more flexible so that every section can define their own flags. I want to set aside some bits for general purpose only flags and some bits for  section specific flags. For section specific flag, every section can reuse the same bit for different purpose. We may need a verification in place to prevent misuse. If it sounds good to you, I will add some comments describing it and put the verification in place.   


================
Comment at: llvm/include/llvm/ProfileData/SampleProfReader.h:637
+  /// converted from uint64_t data.
+  std::unique_ptr<std::vector<std::string>> MD5Names;
+
----------------
davidxl wrote:
> wmi wrote:
> > davidxl wrote:
> > > wmi wrote:
> > > > davidxl wrote:
> > > > > What is the use of MD5Names?  It seems only used in useMD5() method.
> > > > > 
> > > > > Also why does it need to be unique_ptr?
> > > > Profile only contains md5 numbers (type of uint64_t). NameTable contains StringRef to the names (All formats of profile use the same NameTable). When reading profile using md5, to populate the NameTable structure, I need to convert uint64_t number to std::string and then convert it to StringRef, so I need a buffer to save those std::string because StringRef doesn't own any string data. To be a buffer of NameTable, that is the use of MD5Names. 
> > > > 
> > > > I make it a unique_ptr because for formats not using md5, MD5Names is useless, so I think an empty unique_ptr may have less cost than an empty std::vector in that case. 
> > > The type of NameTable is vector<string> not vector<StringRef> right?
> > There is another one defined as vector<StringRef> in the base class SampleProfileReaderBinary.
> Make the comment clearer that MD5Names serves as string buffer that  is referenced by NameTable (vector of StringRef).
> 
> Is the lifetime of this buffer guarenteed to be longer than NameTable?
Yes, they are both in SampleProfileReader object so they have the same lifetime.


================
Comment at: llvm/tools/llvm-profdata/llvm-profdata.cpp:474
+          static_cast<sampleprof::SampleProfileWriterExtBinary *>(&Writer);
+      ExtBinaryWriter->setUseMD5();
+    }
----------------
wenlei wrote:
> The incompatibility between `-use-md5` and format is detected and handled at runtime. Can we do the same for `setUseMD5`? Specifically, make `setUseMD5` part of `SampleProfileWriter`, and error/assert in the base version of it - that would avoid the casting and simplifies a bit in couple places. (Could probably do the same for compression).
> 
I was trying to find some balance here and not to put too much special things in the base class. For SampleProfileReader, if we don't add the interfaces in it, we will have many places that we need the casting, so it has a bunch of special interfaces there. For SampleProfileWriter, we only have a few places needing the special interfaces so adding the casting is still ok. I can change the code and only add casting once here. 

I am just not quite sure which way is better. If you feel adding the interfaces in SampleProfileWriter is better, I will be happy to do it. 

 


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76255/new/

https://reviews.llvm.org/D76255





More information about the llvm-commits mailing list