<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">It’s not a big issue since they’re opt-in passes, naturally, but if we did want to use them, the main problem is that our addressing modes are of the form:</div><div class=""><br class=""></div><div class="">X + sext(Y) * scale</div><div class=""><br class=""></div><div class="">for i64 X and i32 Y and various (power of 2) scales.</div><div class=""><br class=""></div><div class="">A lot of these passes tend to move around the sexts in ways that prevent them from being folded into addressing modes. Another thing I’ve noticed (in separateGEPFromConstantOffset) ends up turning (X + sext(Y + C)) into (X + sext(Y)) + C. Unfortunately, unless this saves operations due to CSE, it ends up with us having:</div><div class=""><br class=""></div><div class="">tmp = X + sext(Y)</div><div class="">load with base = X and offset = C</div><div class=""><br class=""></div><div class="">instead of</div><div class=""><br class=""></div><div class="">tmp = Y + C</div><div class="">load with base = X and offset = tmp</div><div class=""><br class=""></div><div class="">Since 64-bit adds are more expensive than 32-bit adds on our GPU, this ends up being a pessimization.</div><div class=""><br class=""></div><div class="">These are just a few examples I spotted; it’s probably a more general LLVM problem as a whole that passes which manipulate GEPs and induction variables tend to be oblivious of addressing modes, or tuned towards a particular sort of addressing mode.</div><div class=""><br class=""></div><div class="">—escha</div><br class=""><div><blockquote type="cite" class=""><div class="">On Aug 24, 2015, at 6:52 PM, Jingyue Wu <<a href="mailto:jingyue@google.com" class="">jingyue@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hi Escha, <div class=""><br class=""></div><div class="">We certainly would love to generalize them as long as the performance doesn't suffer in general. If you have specific use cases that are regressed due to these optimizations, I am more than happy to take a look. </div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Mon, Aug 24, 2015 at 6:43 PM, escha <span dir="ltr" class=""><<a href="mailto:escha@apple.com" target="_blank" class="">escha@apple.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class="h5"><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Aug 24, 2015, at 11:10 AM, Jingyue Wu via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class=""><div class=""><div dir="ltr" class=""><div class="">Hi, </div><div class=""><br class=""></div><div class="">As you may have noticed, since last year, we (Google's CUDA compiler team) have contributed quite a lot to the effort of optimizing LLVM for CUDA programs. I think it's worthwhile to write some docs to wrap them up for two reasons. </div><div class="">1) Whoever wants to understand or work on these optimizations has some detailed docs instead of just source code to refer to. </div><div class="">2) RFC on how to improve these optimizations so that other targets can benefit from them as well. They are currently mostly restricted to the NVPTX backend, but I see many potentials to generalize them. </div><div class=""><br class=""></div><div class="">So, I started from this overdue <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1momWzKFf4D6h8H3YlfgKQ3qeZy5ayvMRh6yR-2DXn2hUE_edit-3Fusp-3Dsharing&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=szS1_DDBoKCtS8B5df7mJg&m=TggebUNOWYFU5W3tKpC_z1CkNT9MN05aBwWloSru2NI&s=vmPxp-RDJuf_ZN5X7LNlV10JwuHK5Pt1ljn96IenW-o&e=" target="_blank" class="">design doc</a> on the straight-line scalar optimizations. I will send out more docs on other optimizations later. Please feel free to comment. </div><div class=""><br class=""></div><div class="">Thanks, </div><div class="">Jingyue</div></div></div></blockquote><br class=""></div></div></div><div class="">Out of curiosity, is there any plan to make the NVPTX-originated passes (separateconstantoffsetfromgep, slsr, naryreassociate) more generic? They seem very specialized for the nVidia GPU addressing modes despite the generic names, and in my tests tend to pessimize our target more often than not for that reason.</div><div class=""><br class=""></div><div class="">It’d be really nice to have something more generic, and I might look into helping with that sort of thing in the future if it becomes important for us.</div><span class="HOEnZb"><font color="#888888" class=""><div class=""><br class=""></div><div class="">—escha</div></font></span></div></blockquote></div><br class=""></div>

</div></blockquote></div><br class=""></body></html>