<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Dec 21, 2016, at 10:26, Ruiling Song <<a href="mailto:ruiling.song83@gmail.com" class="">ruiling.song83@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><br class=""><br class="">2016-12-20 22:14 GMT+08:00 Tom Stellard <<a href="mailto:tom@stellard.net" class="">tom@stellard.net</a>>:<br class="">><br class="">> On Tue, Dec 20, 2016 at 11:00:09AM +0800, Ruiling Song wrote:<br class="">> > Hi,<br class="">> ><br class="">> > I am working on a new LLVM target for Intel GPU, which also has same kind<br class="">> > of scalar/vector register classes used in AMDGPU target. Like for a i32<br class="">> > virtual register, it will be held in scalar register if its value is<br class="">> > uniform across a wavefront/warp, otherwise it will be in a vector register.<br class="">> > Does AMDGPU already done this? I read the code, but I didn't figure out how<br class="">> > to do this. Anybody has idea on this?<br class="">> ><br class="">><br class="">> In the AMDGPU backend we select everything we can to scalar<br class="">> instructions, and then after instruction selection, we move<br class="">> non-uniform values to the vector ALU.  This is done by<br class="">> the SIFixSGPRCopiesPass, which relies heavily on<br class="">> SIInstrInfo::moveToVALU().<br class=""><br class="">Hi Tom,<br class=""><br class="">I take a look at the code, it looks like a good idea. It really helps me a lot. Thanks Tom! I have a question for the code, why it only pass copy-like instructions as TopInst to moveToALU()? Is there any special reason to do like this? I thought that iterating through all the MIs and fix regClass if needed would be ok. Am I thinking it too simple?<br class=""> <br class="">- Ruiling<br class="">><br class="">> -Tom<br class="">><br class="">> > - Ruiling<br class=""><br class=""><br class=""><br class=""><br class="">-- <br class=""><div dir="ltr" class="">- Ruiling</div><br class="">

</div></blockquote></div><br class=""><div class="">The instruction selector will insert these copies to satisfy the register operand constraints, so by finding all users (and users of users) of the illegal copies you find the same thing. The instruction set is different, so we’re really replacing the instructions and not exactly just changing the register classes.</div><div class=""><br class=""></div><div class="">I think this process logically makes sense, moving things to vector as forced. However I’m uncertain if this is the best approach. I’ve debated going the other direction and selecting everything to vector instruction, and having an optimization pass move parts to scalars. This is what the AMD compiler does. There are different trade offs, but one advantage is you immediately have something resembling a legal program to begin with.</div><div class=""><br class=""></div><div class="">-Matt</div></body></html>