<p dir="ltr">Interesting that it didn't speed things up as much for you. The only thing I can think of that seems likely is that I was testing this on an i7 machine whereas your machine (I'm assuming mac pro) has a beefier xeon with much more L3 cache. That would mitigate the benefit of this patch since I suspect the main benefit is letting the processor's speculative execution run farther ahead, hiding load latency (if my hypothesis is correct, it must be LLC vs DRAM latency).</p>
<div class="gmail_extra"><br><div class="gmail_quote">On Dec 6, 2016 2:11 PM, "Rafael Avila de Espindola" <<a href="mailto:rafael.espindola@gmail.com">rafael.espindola@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Thanks for the patch.<br>
<br>
I uploaded a new version of the benchmark to<br>
<br>
<a href="https://drive.google.com/open?id=0B7iRtublysV6RjFBcVJhSjg1eHM" rel="noreferrer" target="_blank">https://drive.google.com/open?<wbr>id=<wbr>0B7iRtublysV6RjFBcVJhSjg1eHM</a><br>
<br>
The problem you had with it using files from my machine should be<br>
fixed. The issue was with linker scripts using absolute paths. It can be<br>
avoided with fakeroot, but I just edited the scripts to make it easier.<br>
<br>
I got a smaller benefit from this patch. The results were<br>
<br>
firefox-O0<br>
master 4.150496547<br>
patch <a href="tel:4.144422417" value="+14144422417">4.144422417</a> 1.00146561556x faster<br>
firefox<br>
master 6.901552864<br>
patch 6.900340356 1.00017571713x faster<br>
firefox-gc<br>
master 7.078655427<br>
patch 7.069354855 1.00131561821x faster<br>
chromium<br>
master 4.107906591<br>
patch 4.094586237 1.0032531624x faster<br>
chromium fast<br>
master 2.007441161<br>
patch 1.999441702 1.00400084633x faster<br>
the gold plugin<br>
master 0.337859787<br>
patch 0.334852395 1.0089812468x faster<br>
clang<br>
master 0.57712437<br>
patch 0.574027448 1.00539507651x faster<br>
llvm-as<br>
master 0.033570448<br>
patch 0.033285747 1.00855324052x faster<br>
the gold plugin fsds<br>
master 0.363855963<br>
patch 0.36106724 1.00772355587x faster<br>
clang fsds<br>
master 0.656084922<br>
patch 0.652941395 1.00481440911x faster<br>
llvm-as fsds<br>
master 0.033544621<br>
patch 0.033198883 1.01041414556x faster<br>
scylla<br>
master 3.112523401<br>
patch 3.103663875 1.00285453785x faster<br>
<br>
Sean Silva via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>> writes:<br>
<br>
> Author: silvas<br>
> Date: Wed Nov 30 23:43:48 2016<br>
> New Revision: 288314<br>
><br>
> URL: <a href="http://llvm.org/viewvc/llvm-project?rev=288314&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project?rev=288314&view=rev</a><br>
> Log:<br>
> Add `isRelExprOneOf` helper<br>
><br>
> In various places in LLD's hot loops, we have expressions of the form<br>
> "E == R_FOO || E == R_BAR || ..." (E is a RelExpr).<br>
><br>
> Some of these expressions are quite long, and even though they usually go just<br>
> a very small number of ways and so should be well predicted, they can still<br>
> occupy branch predictor resources harming other parts of the code, or they<br>
> won't be predicted well if they overflow branch predictor resources or if the<br>
> branches are too dense and the branch predictor can't track them all (the<br>
> compiler can in theory avoid this, at a cost in text size). And some of these<br>
> expressions are so large and executed so frequently that even when<br>
> well-predicted they probably still have a nontrivial cost.<br>
><br>
> This speedup should be pretty portable. The cost of these simple bit tests is<br>
> independent of:<br>
><br>
> - the target we are linking for<br>
> - the distribution of RelExpr's for a given link (which can depend on how the<br>
> input files were compiled)<br>
> - what compiler was used to compile LLD (it is just a simple bit test;<br>
> hopefully the compiler gets it right!)<br>
> - adding new target-dependent relocations (e.g. needsPlt doesn't pay any extra<br>
> cost checking R_PPC_PLT_OPD on x86-64 builds)<br>
><br>
> I did some rough measurements on clang-fsds and this patch gives over about 4%<br>
> speedup for a regular -O1 link, about 2.5% for -O3 --gc-sections and over 5%<br>
> for -O0. Sorry, I don't have my current machine set up for doing really<br>
> accurate measurements right now.<br>
><br>
> This also is just a bit cleaner. Thanks for Joerg for suggesting for<br>
> this approach.<br>
><br>
> Differential Revision: <a href="https://reviews.llvm.org/D27156" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D27156</a><br>
><br>
> Modified:<br>
> lld/trunk/ELF/Relocations.cpp<br>
> lld/trunk/ELF/Relocations.h<br>
><br>
> Modified: lld/trunk/ELF/Relocations.cpp<br>
> URL: <a href="http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/Relocations.cpp?rev=288314&r1=288313&r2=288314&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/lld/trunk/ELF/<wbr>Relocations.cpp?rev=288314&r1=<wbr>288313&r2=288314&view=diff</a><br>
> ==============================<wbr>==============================<wbr>==================<br>
> --- lld/trunk/ELF/Relocations.cpp (original)<br>
> +++ lld/trunk/ELF/Relocations.cpp Wed Nov 30 23:43:48 2016<br>
> @@ -62,12 +62,10 @@ namespace lld {<br>
> namespace elf {<br>
><br>
> static bool refersToGotEntry(RelExpr Expr) {<br>
> - return Expr == R_GOT || Expr == R_GOT_OFF || Expr == R_MIPS_GOT_LOCAL_PAGE ||<br>
> - Expr == R_MIPS_GOT_OFF || Expr == R_MIPS_GOT_OFF32 ||<br>
> - Expr == R_MIPS_TLSGD || Expr == R_MIPS_TLSLD ||<br>
> - Expr == R_GOT_PAGE_PC || Expr == R_GOT_PC || Expr == R_GOT_FROM_END ||<br>
> - Expr == R_TLSGD || Expr == R_TLSGD_PC || Expr == R_TLSDESC ||<br>
> - Expr == R_TLSDESC_PAGE;<br>
> + return isRelExprOneOf<R_GOT, R_GOT_OFF, R_MIPS_GOT_LOCAL_PAGE, R_MIPS_GOT_OFF,<br>
> + R_MIPS_GOT_OFF32, R_MIPS_TLSGD, R_MIPS_TLSLD,<br>
> + R_GOT_PAGE_PC, R_GOT_PC, R_GOT_FROM_END, R_TLSGD,<br>
> + R_TLSGD_PC, R_TLSDESC, R_TLSDESC_PAGE>(Expr);<br>
> }<br>
><br>
> static bool isPreemptible(const SymbolBody &Body, uint32_t Type) {<br>
> @@ -302,16 +300,16 @@ template <class ELFT> static bool isAbso<br>
> }<br>
><br>
> static bool needsPlt(RelExpr Expr) {<br>
> - return Expr == R_PLT_PC || Expr == R_PPC_PLT_OPD || Expr == R_PLT ||<br>
> - Expr == R_PLT_PAGE_PC || Expr == R_THUNK_PLT_PC;<br>
> + return isRelExprOneOf<R_PLT_PC, R_PPC_PLT_OPD, R_PLT, R_PLT_PAGE_PC,<br>
> + R_THUNK_PLT_PC>(Expr);<br>
> }<br>
><br>
> // True if this expression is of the form Sym - X, where X is a position in the<br>
> // file (PC, or GOT for example).<br>
> static bool isRelExpr(RelExpr Expr) {<br>
> - return Expr == R_PC || Expr == R_GOTREL || Expr == R_GOTREL_FROM_END ||<br>
> - Expr == R_MIPS_GOTREL || Expr == R_PAGE_PC || Expr == R_RELAX_GOT_PC ||<br>
> - Expr == R_THUNK_PC || Expr == R_THUNK_PLT_PC;<br>
> + return isRelExprOneOf<R_PC, R_GOTREL, R_GOTREL_FROM_END, R_MIPS_GOTREL,<br>
> + R_PAGE_PC, R_RELAX_GOT_PC, R_THUNK_PC, R_THUNK_PLT_PC>(<br>
> + Expr);<br>
> }<br>
><br>
> template <class ELFT><br>
> @@ -320,12 +318,11 @@ static bool isStaticLinkTimeConstant(Rel<br>
> InputSectionBase<ELFT> &S,<br>
> typename ELFT::uint RelOff) {<br>
> // These expressions always compute a constant<br>
> - if (E == R_SIZE || E == R_GOT_FROM_END || E == R_GOT_OFF ||<br>
> - E == R_MIPS_GOT_LOCAL_PAGE || E == R_MIPS_GOT_OFF ||<br>
> - E == R_MIPS_GOT_OFF32 || E == R_MIPS_TLSGD || E == R_GOT_PAGE_PC ||<br>
> - E == R_GOT_PC || E == R_PLT_PC || E == R_TLSGD_PC || E == R_TLSGD ||<br>
> - E == R_PPC_PLT_OPD || E == R_TLSDESC_CALL || E == R_TLSDESC_PAGE ||<br>
> - E == R_HINT || E == R_THUNK_PC || E == R_THUNK_PLT_PC)<br>
> + if (isRelExprOneOf<R_SIZE, R_GOT_FROM_END, R_GOT_OFF, R_MIPS_GOT_LOCAL_PAGE,<br>
> + R_MIPS_GOT_OFF, R_MIPS_GOT_OFF32, R_MIPS_TLSGD,<br>
> + R_GOT_PAGE_PC, R_GOT_PC, R_PLT_PC, R_TLSGD_PC, R_TLSGD,<br>
> + R_PPC_PLT_OPD, R_TLSDESC_CALL, R_TLSDESC_PAGE, R_HINT,<br>
> + R_THUNK_PC, R_THUNK_PLT_PC>(E))<br>
> return true;<br>
><br>
> // These never do, except if the entire file is position dependent or if<br>
> @@ -659,12 +656,12 @@ static void scanRelocs(InputSectionBase<<br>
><br>
> // Ignore "hint" and TLS Descriptor call relocation because they are<br>
> // only markers for relaxation.<br>
> - if (Expr == R_HINT || Expr == R_TLSDESC_CALL)<br>
> + if (isRelExprOneOf<R_HINT, R_TLSDESC_CALL>(Expr))<br>
> continue;<br>
><br>
> - if (needsPlt(Expr) || Expr == R_THUNK_ABS || Expr == R_THUNK_PC ||<br>
> - Expr == R_THUNK_PLT_PC || refersToGotEntry(Expr) ||<br>
> - !isPreemptible(Body, Type)) {<br>
> + if (needsPlt(Expr) ||<br>
> + isRelExprOneOf<R_THUNK_ABS, R_THUNK_PC, R_THUNK_PLT_PC>(Expr) ||<br>
> + refersToGotEntry(Expr) || !isPreemptible(Body, Type)) {<br>
> // If the relocation points to something in the file, we can process it.<br>
> bool Constant =<br>
> isStaticLinkTimeConstant<ELFT><wbr>(Expr, Type, Body, C, RI.r_offset);<br>
><br>
> Modified: lld/trunk/ELF/Relocations.h<br>
> URL: <a href="http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/Relocations.h?rev=288314&r1=288313&r2=288314&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/lld/trunk/ELF/<wbr>Relocations.h?rev=288314&r1=<wbr>288313&r2=288314&view=diff</a><br>
> ==============================<wbr>==============================<wbr>==================<br>
> --- lld/trunk/ELF/Relocations.h (original)<br>
> +++ lld/trunk/ELF/Relocations.h Wed Nov 30 23:43:48 2016<br>
> @@ -73,6 +73,35 @@ enum RelExpr {<br>
> R_TLSLD_PC,<br>
> };<br>
><br>
> +// Build a bitmask with one bit set for each RelExpr.<br>
> +//<br>
> +// Constexpr function arguments can't be used in static asserts, so we<br>
> +// use template arguments to build the mask.<br>
> +// But function template partial specializations don't exist (needed<br>
> +// for base case of the recursion), so we need a dummy struct.<br>
> +template <RelExpr... Exprs> struct RelExprMaskBuilder {<br>
> + static inline uint64_t build() { return 0; }<br>
> +};<br>
> +<br>
> +// Specialization for recursive case.<br>
> +template <RelExpr Head, RelExpr... Tail><br>
> +struct RelExprMaskBuilder<Head, Tail...> {<br>
> + static inline uint64_t build() {<br>
> + static_assert(0 <= Head && Head < 64,<br>
> + "RelExpr is too large for 64-bit mask!");<br>
> + return (uint64_t(1) << Head) | RelExprMaskBuilder<Tail...>::<wbr>build();<br>
> + }<br>
> +};<br>
> +<br>
> +// Return true if `Expr` is one of `Exprs`.<br>
> +// There are fewer than 64 RelExpr's, so we can represent any set of<br>
> +// RelExpr's as a constant bit mask and test for membership with a<br>
> +// couple cheap bitwise operations.<br>
> +template <RelExpr... Exprs> bool isRelExprOneOf(RelExpr Expr) {<br>
> + assert(0 <= Expr && Expr < 64 && "RelExpr is too large for 64-bit mask!");<br>
> + return (uint64_t(1) << Expr) & RelExprMaskBuilder<Exprs...>::<wbr>build();<br>
> +}<br>
> +<br>
> // Architecture-neutral representation of relocation.<br>
> struct Relocation {<br>
> RelExpr Expr;<br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> llvm-commits mailing list<br>
> <a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
</blockquote></div></div>