<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>If we do load combining at the IR level, one thing we'll need to
give some thought to is atomicity. Combining two atomic loads
into a wider (legal) atomic load is not a reversible
transformation given our current specification.</p>
<p>I've been thinking about a concept I've been tentatively calling
"element wise atomicity" which would make this a reversible
transform by representing the load as either a vector or struct
load on which the atomicity requirement was clearly expressed on
the individual elements, not the vector/struct itself. (We have
no way to spell that today.)</p>
<p>Not having load combining be a reversible transformation is
really unfortunate. Consider this small example:</p>
<p>store i32 0, a+4<br>
load i32, a<br>
load i32, a+4</p>
<p>If you happen to visit the load pair first, you really want to be
able to split them again once you discover the store. <br>
</p>
<p>Note that the same theoretical problem exists in
MI/SelectionDAG. We just do so much less memory optimization
there we get away with mostly not caring. :)</p>
<p>Philip<br>
</p>
<div class="moz-cite-prefix">On 9/24/2019 4:50 PM, Artur Pilipenko
via llvm-dev wrote:<br>
</div>
<blockquote type="cite"
cite="mid:6ED8F220-12AD-4AE0-BF55-18D3D845515C@azul.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Implementing load combine/widening on the IR level sounds like a
reasonable enhancement to me. Although, we (Azul) don't have any
plans of doing that in the near future.
<div class=""><br class="">
</div>
<div class="">
<div class="">Artur<br class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On 12 Sep 2019, at 00:58, Paweł Bylica <<a
href="mailto:chfast@gmail.com" class=""
moz-do-not-send="true">chfast@gmail.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">
<div class="">Ok, thanks.</div>
<div class=""><br class="">
</div>
<div class="">Are there any plans to reintroduce it on
the IR level? I'm not confident this is strictly
necessary, but in some cases not having load
widening ends up really bad.</div>
<div class="">Like in the case where vectorizer tries
to do something about it:</div>
<div class=""><a href="https://godbolt.org/z/60RuEw"
class="" moz-do-not-send="true">https://godbolt.org/z/60RuEw</a></div>
<div class=""><a
href="https://bugs.llvm.org/show_bug.cgi?id=42708"
class="" moz-do-not-send="true">https://bugs.llvm.org/show_bug.cgi?id=42708</a></div>
<div class=""><br class="">
</div>
<div class="">At the current state I'm forced to use
memset() to express uint64 load from an array of
bytes.</div>
<div class=""><br class="">
</div>
<div class="">// P.<br class="">
</div>
</div>
<br class="">
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Sep 12, 2019
at 4:22 AM Artur Pilipenko <<a
href="mailto:apilipenko@azul.com" class=""
moz-do-not-send="true">apilipenko@azul.com</a>>
wrote:<br class="">
</div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div style="overflow-wrap: break-word;" class="">Hi,
<div class=""><br class="">
</div>
<div class="">Load widening/combine for the
original pattern was implemented in DAG
combiner. See:</div>
<div class=""><a
href="https://github.com/llvm/llvm-project/blob/23bbeb52f392d88bf0d0527392a7a11561ee09c0/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L6872"
target="_blank" class=""
moz-do-not-send="true">https://github.com/llvm/llvm-project/blob/23bbeb52f392d88bf0d0527392a7a11561ee09c0/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L6872</a></div>
<div class=""><br class="">
</div>
<div class="">Later a similar transform was
introduced fro stores:</div>
<div class=""><a
href="https://github.com/llvm/llvm-project/blob/23bbeb52f392d88bf0d0527392a7a11561ee09c0/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L6700"
target="_blank" class=""
moz-do-not-send="true">https://github.com/llvm/llvm-project/blob/23bbeb52f392d88bf0d0527392a7a11561ee09c0/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L6700</a></div>
<div class=""><br class="">
</div>
<div class="">There is no load widening on the IR
level as of now.</div>
<div class=""><br class="">
</div>
<div class="">Artur<br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On 11 Sep 2019, at 04:47,
Paweł Bylica <<a
href="mailto:chfast@gmail.com"
target="_blank" class=""
moz-do-not-send="true">chfast@gmail.com</a>>
wrote:</div>
<br
class="gmail-m_8191889820642647725Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">
<div class="">Hi,</div>
<div class=""><br class="">
</div>
<div class="">Can I ask what is the
status of load widening. It seems
there is no load widening on IR at
all.</div>
<div class=""><br class="">
</div>
<div class="">// Paweł<br class="">
</div>
</div>
<br class="">
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Wed, Oct 5, 2016 at 1:49 PM Artur
Pilipenko via llvm-dev <<a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" class=""
moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
wrote:<br class="">
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
Philip and I talked about this is
person. Given the fact that load
widening in presence of atomics is
irreversible transformation we agreed
that we don't want to do this early.
For now it can be implemented as a
peephole optimization over machine IR.
MIR is preferred here because X86
backend does GEP reassociation at MIR
level and it can make information
about addresses being adjacent
available.<br class="">
<br class="">
Inline cost of the original pattern is
a valid concern and we might want to
revisit our decision later. But in
order to do widening earlier we need
to have a way to undo this
transformation.<br class="">
<br class="">
I’m going to try implementing MIR
optimization but not in the immediate
future.<br class="">
<br class="">
Artur<br class="">
<br class="">
> On 28 Sep 2016, at 19:32, Artur
Pilipenko <<a
href="mailto:apilipenko@azulsystems.com"
target="_blank" class=""
moz-do-not-send="true">apilipenko@azulsystems.com</a>>
wrote:<br class="">
> <br class="">
> One of the arguments for doing
this earlier is inline cost perception
of the original pattern. Reading
i32/i64 by bytes look much more
expensive than it is and can prevent
inlining of interesting function.<br
class="">
> <br class="">
> Inhibiting other optimizations
concern can be addressed by careful
selection of the pattern we’d like to
match. I limit the transformation to
the case when all the individual have
no uses other than forming a wider
load. In this case it’s less likely to
loose information during this
transformation.<br class="">
> <br class="">
> I didn’t think of atomicity
aspect though.<br class="">
> <br class="">
> Artur<br class="">
> <br class="">
>> On 28 Sep 2016, at 18:50,
Philip Reames <<a
href="mailto:listmail@philipreames.com"
target="_blank" class=""
moz-do-not-send="true">listmail@philipreames.com</a>>
wrote:<br class="">
>> <br class="">
>> There's a bit of additional
context worth adding here...<br
class="">
>> <br class="">
>> Up until very recently, we
had a form of widening implemented in
GVN. We decided to remove this in
<a
href="https://reviews.llvm.org/D24096"
rel="noreferrer" target="_blank"
class="" moz-do-not-send="true">
https://reviews.llvm.org/D24096</a>
precisely because its placement in the
pass pipeline was inhibiting other
optimizations. There's also a major
problem with doing widening at the IR
level which is that widening a pair of
atomic loads into a single wider
atomic load can not be undone. This
creates a major pass ordering problem
of its own.<br class="">
>> <br class="">
>> At this point, my general
view is that widening transformations
of any kind should be done very late.
Ideally, this is something the backend
would do, but doing it as a CGP like
fixup pass over the IR is also
reasonable.<br class="">
>> <br class="">
>> With that in mind, I feel
both the current placement of
LoadCombine (within the inliner
iteration) and the proposed
InstCombine rule are undesirable.<br
class="">
>> <br class="">
>> Philip<br class="">
>> <br class="">
>> <br class="">
>> On 09/28/2016 08:22 AM, Artur
Pilipenko wrote:<br class="">
>>> Hi,<br class="">
>>> <br class="">
>>> I'm trying to optimize a
pattern like this into a single i16
load:<br class="">
>>> %1 = bitcast i16* %pData
to i8*<br class="">
>>> %2 = load i8, i8* %1,
align 1<br class="">
>>> %3 = zext i8 %2 to i16<br
class="">
>>> %4 = shl nuw i16 %3, 8<br
class="">
>>> %5 = getelementptr
inbounds i8, i8* %1, i16 1<br class="">
>>> %6 = load i8, i8* %5,
align 1<br class="">
>>> %7 = zext i8 %6 to i16<br
class="">
>>> %8 = shl nuw nsw i16 %7,
0<br class="">
>>> %9 = or i16 %8, %4<br
class="">
>>> <br class="">
>>> I came across load
combine pass which is motivated by
virtualliy the same pattern. Load
combine optimizes the pattern by
combining adjacent loads into one load
and lets the rest of the optimizer
cleanup the rest. From what I see on
the initial review for load combine (<a
href="https://reviews.llvm.org/D3580" rel="noreferrer" target="_blank"
class="" moz-do-not-send="true">https://reviews.llvm.org/D3580</a>)
it was not enabled by default because
it caused some performance
regressions. It's not very surprising,
I see how this type of widening can
obscure some facts for the rest of the
optimizer.<br class="">
>>> <br class="">
>>> I can't find any
backstory for this pass, why was it
chosen to optimize the pattern in
question in this way? What is the
current status of this pass?<br
class="">
>>> <br class="">
>>> I have an alternative
implementation for it locally. I
implemented an instcombine rule
similar to recognise bswap/bitreverse
idiom. It relies on collectBitParts
(Local.cpp) to determine the origin of
the bits in a given or value. If all
the bits are happen to be loaded from
adjacent locations it replaces the or
with a single load or a load plus
bswap.<br class="">
>>> <br class="">
>>> If the alternative
approach sounds reasonable I'll post
my patches for review.<br class="">
>>> <br class="">
>>> Artur<br class="">
>> <br class="">
> <br class="">
<br class="">
_______________________________________________<br class="">
LLVM Developers mailing list<br
class="">
<a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" class=""
moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br
class="">
<a
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer" target="_blank"
class="" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br
class="">
</blockquote>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
</body>
</html>