<div dir="ltr">Also, we have several similar instructions in different parts of the architecture (ie data computation vs memory computation, which have some overlapping register sets) and we take a similar approach to solve this issue. We break down GEPs in the IR and use intrinsics to help define the mapping (for example, IR add node might go to exu add or memory addressing, but the pattern is the same). In the DAG we mark these nodes as index operations and then use code in tablegen to map to either, for example, a data add or memory add.</div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 14, 2017 at 11:22 AM, Ryan Taylor <span dir="ltr"><<a href="mailto:ryta1203@gmail.com" target="_blank">ryta1203@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>We support hardware loops. </div><div><br></div><div>Our solution for this is two fold, we do both IR level and MI level passes. The IR does most of the loop recognition and similar GEP transformations and we utilize intrinsics to 'outline' the loop. We use the MI to construct the instruction given the intrinsics and set register classes, etc... </div><div><br></div><div>For us there just isn't a reasonable way to get all the info we need in the backend and it's best to use the abstracted mem computation (GEP).</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 14, 2017 at 10:44 AM, Krzysztof Parzyszek <span dir="ltr"><<a href="mailto:kparzysz@codeaurora.org" target="_blank">kparzysz@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On 6/14/2017 9:27 AM, Ryan Taylor wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

   Is this partly due to hardware loops not being common? I'm curious, we do something very similar for similar reasons.<br>

</blockquote>

<br></span>

The original motivation was to avoid recomputation these parts of the address that were shared between different GEPs. Hexagon has a very large number of complex instructions, and these instructions could span various parts of the address calculation. This makes it prohibitively difficult to extract the common parts after instruction selection.<br>

<br>

It also places the common GEPs as far out in the loop nest as possible (in the outermost region with respect to invariance). In addition to that, it will put a GEP before each load and store that is expected to be fully folded into an addressing mode of the load/store. As of now, it only does that for indexed modes (i.e. base reg + immediate offset), but Hexagon has a bunch more that could be exploited as well.<br>

<br>

We don't do anything specifically related to hardware loops until much later in the codegen. What situations do you have in your target?<div class="m_2251008993623953815HOEnZb"><div class="m_2251008993623953815h5"><br>

<br>

-Krzysztof<br>

<br>

-- <br>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation<br>

</div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div>