[PATCH] D12715: Reduce PGO Instrumentation binary and profile data size (Patch-1)

Sean Silva via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 8 22:00:49 PDT 2015


silvas added a comment.

Fundamentally, this approach is not a long-term solution to reducing PGO data size. The correct long-term solution for reducing PGO data size requires passing the binary to llvm-profdata. The approach of using name hashes is also not an incremental step towards the long-term solution. So thinking about it a bit more, I'm on the fence about whether a quick solution like this one even makes sense to do. I am very apprehensive of adding complexity to make this approach work, in particular threading information through many layers, as this is likely to get in the way of future work towards the long-term solution.

My takeaways from the RFC thread (maybe I am misunderstanding) is that we are still using the same format just with one string vs. another. We shouldn't need to rev the version and there shouldn't need to be any changes in compiler-rt or llvm. Clang can easily diagnose (as an isolated piece of logic) if the user did the pgo training run with md5 mode enabled but the md5 mode flag was not passed or vice versa (it is easy to tell if input function names are md5 sums or not as a diagnostic heuristic). It is also inconsistent for llvm-profdata to be able to be passed `-function=foo` for an md5 profile; the user should have to pass (truncated) md5 of the function name.


================
Comment at: test/Instrumentation/InstrProfiling/profiling_md5hash_key.ll:5-6
@@ +4,4 @@
+
+ at __llvm_profile_name_foo = hidden constant [16 x i8] c"5cf8c24cdb18bdac"
+; CHECK: @__llvm_profile_name_foo = hidden constant [16 x i8] c"5cf8c24cdb18bdac", section "__DATA,__llvm_prf_names", align 1
+ at __llvm_profile_name_bar = hidden constant [17 x i8] c"e413754a191db537\00"
----------------
This is backward from a standard md5 sum in hex. We should maintain the invariant that the resulting symbol name is a substring of a standard md5 sum. Otherwise it is very difficult to explain how to get the hash for a function.

E.g. we should be able to say "the symbol name is effectively replaced by the first half of its md5 sum in hex":

```
Sean:~ % echo -n 'foo' | md5             
acbd18db4cc2f85cedef654fccc4a4d8
Sean:~ % echo -n 'foo' | md5 | head -c 16
acbd18db4cc2f85c
```

The symbol name string should be `acbd18db4cc2f85c`.


http://reviews.llvm.org/D12715





More information about the llvm-commits mailing list