<div dir="ltr"><div>hi Philip</div><div><br></div>yes, I submitted 2 issues about iRCE, 49012 and 49014.<div>I don't know if I misuse the pass, I have no comprehensive understanding about this pass and its background. just take some time to dive the code to find the reason.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 13 May 2021 at 02:45, Philip Reames <<a href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 5/11/21 7:41 PM, Jie He wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">yes, but current lowering deopt implementation
would generate a statepoint IR which currently only supports
X86-64, as mentioned in GC documentation in LLVM.</div>
</blockquote>
I believe this is supported on at least AArch64 if memory serves.<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>iRCE doesn't reply on GCed language, I remember wrong. but
it's not smart right now, can't handle bounds check well like
java RCE did.</div>
</div>
</blockquote>
Er, I think you're either misunderstanding or need to clarify your
point. IRCE does exactly the standard pre/main/post loop technique
which was used in C2 back in the day. LoopPred does the widening
transformation. Do you have a particular case in mind you're
thinking of?<br>
<blockquote type="cite"><br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, 11 May 2021 at 23:04,
Philip Reames <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>This is incorrect. <br>
</p>
<p>IRCE's current sole known user happens to be a compiler
for a GCed language, but there is no (intentional)
dependence on that fact. It should work on arbitrary IR.
<br>
</p>
<p>Loop predication (the form in IndVars) triggers for
arbitrary IR. The separate pass depends on semantics of
guards which is related to deopt semantics, but *not* GC.
<br>
</p>
<p>Philip<br>
</p>
<div>On 5/11/21 7:17 AM, Jie He wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">as I know, current IRCE implementation
relies on some preconditions. it's intended to language
runtime with garbage collection, not for loop
vectorization.
<div>the same is true for loop predication, which is
also helpful for eliminating condition check within a
loop.</div>
<div><br>
</div>
<div>Jie He</div>
<div>B.R</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, 11 May 2021 at
20:50, Jingu Kang via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="EN-US">
<div>
<p class="MsoNormal">Hi Philip,</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">I have extended your
suggestion slightly more as below.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">
newbound1 = min(n, c)</p>
<p class="MsoNormal">
newbound2 = max(n, c)</p>
<p class="MsoNormal"> while (iv < n)
{ while(iv < newbound1) {</p>
<p class="MsoNormal">
A A</p>
<p class="MsoNormal"> if (iv <
c) B</p>
<p class="MsoNormal">
B C</p>
<p class="MsoNormal">
C }</p>
<p class="MsoNormal">
} iv = newbound1</p>
<p class="MsoNormal">
while (iv < newbound2) {</p>
<p class="MsoNormal">
A</p>
<p class="MsoNormal">
C</p>
<p class="MsoNormal">
}</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">I have implemented a simple
pass to split bound of loop, which has
conditional branch with IV, as above example. <a href="https://reviews.llvm.org/D102234" target="_blank">https://reviews.llvm.org/D102234</a>
It is initial version. If possible, please
review it.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Thanks</p>
<p class="MsoNormal">JinGu Kang</p>
<p class="MsoNormal"> </p>
<div style="border-top:none;border-right:none;border-bottom:none;border-left:1.5pt solid blue;padding:0cm 0cm 0cm 4pt">
<div>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0cm 0cm">
<p class="MsoNormal"><b>From:</b> Jingu Kang
<<a href="mailto:Jingu.Kang@arm.com" target="_blank">Jingu.Kang@arm.com</a>>
<br>
<b>Sent:</b> 04 May 2021 12:45<br>
<b>To:</b> Philip Reames <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>>;
Jingu Kang <<a href="mailto:Jingu.Kang@arm.com" target="_blank">Jingu.Kang@arm.com</a>><br>
<b>Cc:</b> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<b>Subject:</b> RE: [llvm-dev] Enabling
IRCE pass or Adding something similar in
the pipeline of new pass manager</p>
</div>
</div>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Philip, I appreciate your
kind comments.</p>
<p><span>>In this example, forming the full
pre/main/post loop structure of IRCE is
overkill. Instead, we could simply restrict
the loop bounds in the following manner:</span></p>
<pre>><a href="http://loop.ph" target="_blank">loop.ph</a>:</pre>
<pre>> ;; Warning: psuedo code, might have edge conditions wrong</pre>
<pre>> %c = icmp sgt %iv, %n</pre>
<pre>> %min = umax(%n, %a)</pre>
<pre>> br i1 %c, label %exit, label %<a href="http://loop.ph" target="_blank">loop.ph</a></pre>
<pre>> </pre>
<pre>>loop.ph.split:</pre>
<pre>> br label %loop</pre>
<pre>> </pre>
<pre>>loop:</pre>
<pre>> %iv = phi i64 [ %inc, %loop ], [ 1, %<a href="http://loop.ph" target="_blank">loop.ph</a> ]</pre>
<pre>> %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv </pre>
<pre>> %val = load i64, i64* %src.arrayidx</pre>
<pre>> %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv </pre>
<pre>> store i64 %val, i64* %dst.arrayidx</pre>
<pre>> %inc = add nuw nsw i64 %iv, 1</pre>
<pre>> %cond = icmp eq i64 %inc, %min</pre>
<pre>> br i1 %cond, label %exit, label %loop</pre>
<pre>> </pre>
<pre>>exit:</pre>
<pre>> ret void</pre>
<pre>>}</pre>
<pre>> </pre>
<pre>>I'm not quite sure what to call this transform, but it's not IRCE. If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splitting</pre>
<p class="MsoNormal"> </p>
<p class="MsoNormal">I agree with you. If the
llvm community is ok to accept above approach
as a pass or a part of a certain pass, I would
be happy to implement it because I am aiming
to handle this case with llvm upstream.</p>
<p class="MsoNormal"> </p>
<p><span>>Another way to frame this special
case might be to recognize the conditional
block can be inverted into an early exit.
(Reasoning: %iv is strictly increasing,
condition is monotonic, path if not taken
has no observable effect) Consider:</span></p>
<pre>><a href="http://loop.ph" target="_blank">loop.ph</a>:</pre>
<pre>> br label %loop</pre>
<pre>> </pre>
<pre>>loop:</pre>
<pre>> %iv = phi i64 [ %inc, %for.inc ], [ 1, %<a href="http://loop.ph" target="_blank">loop.ph</a> ]</pre>
<pre>> %cmp = icmp sge i64 %iv, %a</pre>
<pre>> br i1 %cmp, label %exit, label %for.inc</pre>
<pre>> </pre>
<pre>>for.inc:</pre>
<pre>> %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv </pre>
<pre>> %val = load i64, i64* %src.arrayidx</pre>
<pre>> %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv </pre>
<pre>> store i64 %val, i64* %dst.arrayidx</pre>
<pre>> %inc = add nuw nsw i64 %iv, 1</pre>
<pre>> %cond = icmp eq i64 %inc, %n</pre>
<pre>> br i1 %cond, label %exit, label %loop</pre>
<pre>> </pre>
<pre>>exit:</pre>
<pre>> ret void</pre>
<pre>>}</pre>
<p><span>>Once that's done, the multiple exit
vectorization work should vectorize this
loop. Thinking about it, I really like this
variant. </span></p>
<p class="MsoNormal"> I have not looked at the
multiple exit vectorization work yet but it
looks we could consider the inverted condition
as early exit’s condition.</p>
<p><span>>The costing here seems quite off.
I have not looked at how the vectorize costs
predicated loads on hardware without
predication, but needing to scalarize a
conditional VF-times and form a vector again
does not have a cost of 3 million. This
could definitely be improved.</span></p>
<p class="MsoNormal">I agree with you.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Additionally, if possible,
I would like to suggest to enable or add
transformations in order to help
vectorization. For example, as removing
conditional branch inside loop, we could split
a loop with dependency, which blocks
vectorization, into vectorizable loop and
non-vectorizable one using transformations
like loop distribution. I am not sure why
these features have not been enabled as
default on pass manager but it would make more
loops vectorizable.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Thanks</p>
<p class="MsoNormal">JinGu Kang</p>
</div>
</div>
</div>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">Best Regards<br>
He Jie 何杰</div>
</blockquote>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">Best Regards<br>
He Jie 何杰</div>
</blockquote>
</div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature">Best Regards<br>He Jie 何杰</div>