<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Comments inlined…
<div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On 16 Dec 2016, at 07:00, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" class="">chandlerc@gmail.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">
<div class="gmail_quote">
<div dir="ltr" class="">On Tue, Dec 13, 2016 at 6:31 AM Artur Pilipenko via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" class="">llvm-commits@lists.llvm.org</a>> wrote:<br class="">
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Author: apilipenko<br class="gmail_msg">
Date: Tue Dec 13 08:21:14 2016<br class="gmail_msg">
New Revision: 289538<br class="gmail_msg">
<br class="gmail_msg">
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=289538&view=rev" rel="noreferrer" class="gmail_msg" target="_blank">
http://llvm.org/viewvc/llvm-project?rev=289538&view=rev</a><br class="gmail_msg">
Log:<br class="gmail_msg">
[DAGCombiner] Match load by bytes idiom and fold it into a single load<br class="gmail_msg">
</blockquote>
<div class=""><br class="">
</div>
<div class="">Really cool transform, but unfortunately, this patch has a bad bug in it that is a bit tricky to trigger. Consider the test case:<br class="">
</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">target triple = "x86_64-unknown-linux-gnu"</div>
<div class=""><br class="">
</div>
<div class="">define void @crash(i8* %src1, i8* %src2, i64* %dst) {</div>
<div class="">entry:</div>
<div class=""> %load1 = load i8, i8* %src1, align 1</div>
<div class=""> %conv46 = zext i8 %load1 to i32</div>
<div class=""> %shl47 = shl i32 %conv46, 56</div>
<div class=""> %or55 = or i32 %shl47, 0</div>
<div class=""> %load2 = load i8, i8* %src2, align 1</div>
<div class=""> %conv57 = zext i8 %load2 to i32</div>
<div class=""> %shl58 = shl i32 %conv57, 32</div>
<div class=""> %or59 = or i32 %or55, %shl58</div>
<div class=""> %or74 = or i32 %or59, 0</div>
<div class=""> %conv75 = sext i32 %or74 to i64</div>
<div class=""> store i64 %conv75, i64* %dst, align 8</div>
<div class=""> ret void</div>
<div class="">}</div>
</div>
<div class=""><br class="">
</div>
<div class="">This will crash when it tries to back and end iterator past the beginning of a vector. You need to check for bad shift widths.</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">However, there are several other serious problems with the patch that need to be addressed so I'm going to revert it for now to unblock folks (we're hitting this in real code) and let you post an updated patch for review. Please get the updated
patch re-reviewed, as I think this needs some more detailed review around data structures and algorithms used. See my (very rough) comments below...<br class="">
</div>
<div class=""> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
==============================================================================<br class="gmail_msg">
--- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)<br class="gmail_msg">
+++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Tue Dec 13 08:21:14 2016<br class="gmail_msg">
@@ -20,6 +20,7 @@<br class="gmail_msg">
#include "llvm/ADT/SetVector.h"<br class="gmail_msg">
#include "llvm/ADT/SmallBitVector.h"<br class="gmail_msg">
#include "llvm/ADT/SmallPtrSet.h"<br class="gmail_msg">
+#include "llvm/ADT/SmallSet.h"<br class="gmail_msg">
#include "llvm/ADT/Statistic.h"<br class="gmail_msg">
#include "llvm/Analysis/AliasAnalysis.h"<br class="gmail_msg">
#include "llvm/CodeGen/MachineFrameInfo.h"<br class="gmail_msg">
@@ -375,6 +376,7 @@ namespace {<br class="gmail_msg">
unsigned PosOpcode, unsigned NegOpcode,<br class="gmail_msg">
const SDLoc &DL);<br class="gmail_msg">
SDNode *MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL);<br class="gmail_msg">
+ SDValue MatchLoadCombine(SDNode *N);<br class="gmail_msg">
SDValue ReduceLoadWidth(SDNode *N);<br class="gmail_msg">
SDValue ReduceLoadOpStoreWidth(SDNode *N);<br class="gmail_msg">
SDValue splitMergedValStore(StoreSDNode *ST);<br class="gmail_msg">
@@ -3969,6 +3971,9 @@ SDValue DAGCombiner::visitOR(SDNode *N)<br class="gmail_msg">
if (SDNode *Rot = MatchRotate(N0, N1, SDLoc(N)))<br class="gmail_msg">
return SDValue(Rot, 0);<br class="gmail_msg">
<br class="gmail_msg">
+ if (SDValue Load = MatchLoadCombine(N))<br class="gmail_msg">
+ return Load;<br class="gmail_msg">
+<br class="gmail_msg">
// Simplify the operands using demanded-bits information.<br class="gmail_msg">
if (!VT.isVector() &&<br class="gmail_msg">
SimplifyDemandedBits(SDValue(N, 0)))<br class="gmail_msg">
@@ -4340,6 +4345,277 @@ struct BaseIndexOffset {<br class="gmail_msg">
};<br class="gmail_msg">
} // namespace<br class="gmail_msg">
<br class="gmail_msg">
+namespace {<br class="gmail_msg">
+/// Represents the origin of an individual byte in load combine pattern. The<br class="gmail_msg">
+/// value of the byte is either unknown, zero or comes from memory.<br class="gmail_msg">
+struct ByteProvider {<br class="gmail_msg">
+ enum ProviderTy {<br class="gmail_msg">
+ Unknown,<br class="gmail_msg">
+ ZeroConstant,<br class="gmail_msg">
+ Memory<br class="gmail_msg">
+ };<br class="gmail_msg">
+<br class="gmail_msg">
+ ProviderTy Kind;<br class="gmail_msg">
</blockquote>
<div class=""><br class="">
</div>
<div class="">Putting this before a pointer means you'll have almost all padding bits here.. =[</div>
<div class=""> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+ // Load and ByteOffset are set for Memory providers only.<br class="gmail_msg">
+ // Load represents the node which loads the byte from memory.<br class="gmail_msg">
+ // ByteOffset is the offset of the byte in the value produced by the load.<br class="gmail_msg">
+ LoadSDNode *Load;<br class="gmail_msg">
+ unsigned ByteOffset;<br class="gmail_msg">
</blockquote>
<div class=""><br class="">
</div>
<div class="">Most of the providers are not loads though? This data structure seems really inefficient for the common case.</div>
<div class=""><br class="">
</div>
<div class="">Also, ByteOffset seems completely redundant. It is always set to the same value as the index into the vector below. Can you eliminate it or avoid it somehow?</div>
</div>
</div>
</div>
</blockquote>
ByteOffset is initially set to the same value as the index into vector, but then the vector can be shuffled by shifts. For example, the chain:</div>
<div>shl (load i32* %p), 16</div>
<div>produces providers:</div>
<div>{ zero, zero, load p[2], p[3] }</div>
<div><br class="">
</div>
<div>In the upcoming change I’m going to handle bswap as a part of combine pattern. It will also shuffle the offsets.</div>
<div><br class="">
</div>
<div>Given that ByteOffset is needed I don’t immediately see who to compress the structure.</div>
<div>
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class="gmail_quote">
<div class=""><br class="">
</div>
<div class="">Then, could you use a PointerIntPair so that this whole thing becomes the size of a pointer? That would seem much more efficient.</div>
<div class=""><br class="">
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+/// Recursively traverses the expression collecting the origin of individual<br class="gmail_msg">
+/// bytes of the given value. For all the values except the root of the<br class="gmail_msg">
+/// expression verifies that it doesn't have uses outside of the expression.<br class="gmail_msg">
+const Optional<SmallVector<ByteProvider, 4> ><br class="gmail_msg">
</blockquote>
<div class=""><br class="">
</div>
<div class="">Please use clang-format with the latest LLVM settings. We don't need the space before the closing '>' any more.</div>
<div class=""> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+collectByteProviders(SDValue Op, bool CheckNumberOfUses = false) {<br class="gmail_msg">
+ if (CheckNumberOfUses && !Op.hasOneUse())<br class="gmail_msg">
+ return None;<br class="gmail_msg">
+<br class="gmail_msg">
+ unsigned BitWidth = Op.getScalarValueSizeInBits();<br class="gmail_msg">
+ if (BitWidth % 8 != 0)<br class="gmail_msg">
+ return None;<br class="gmail_msg">
+ unsigned ByteWidth = BitWidth / 8;<br class="gmail_msg">
</blockquote>
<div class=""><br class="">
</div>
<div class="">Do you want to bound the size you consider here? Especially as this is in the backend you should know the maximal size you can fuse together... But maybe if you make this more efficient you can handle the full width of any integer.</div>
</div>
</div>
</div>
</blockquote>
I bound the type of the root of the chain in DAGCombiner::MatchLoadCombine, but I allow intermediate parts of the chain to be any width. <br class="">
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class="gmail_quote">
<div class=""><br class="">
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+<br class="gmail_msg">
+ switch (Op.getOpcode()) {<br class="gmail_msg">
+ case ISD::OR: {<br class="gmail_msg">
+ auto LHS = collectByteProviders(Op->getOperand(0),<br class="gmail_msg">
+ /*CheckNumberOfUses=*/true);<br class="gmail_msg">
+ auto RHS = collectByteProviders(Op->getOperand(1),<br class="gmail_msg">
+ /*CheckNumberOfUses=*/true);<br class="gmail_msg">
+ if (!LHS || !RHS)<br class="gmail_msg">
+ return None;<br class="gmail_msg">
+<br class="gmail_msg">
+ auto OR = [](ByteProvider LHS, ByteProvider RHS) {<br class="gmail_msg">
+ if (LHS == RHS)<br class="gmail_msg">
+ return LHS;<br class="gmail_msg">
+ if (LHS.Kind == ByteProvider::Unknown ||<br class="gmail_msg">
+ RHS.Kind == ByteProvider::Unknown)<br class="gmail_msg">
+ return ByteProvider::getUnknown();<br class="gmail_msg">
+ if (LHS.Kind == ByteProvider::Memory && RHS.Kind == ByteProvider::Memory)<br class="gmail_msg">
+ return ByteProvider::getUnknown(); </blockquote>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+<br class="gmail_msg">
+ if (LHS.Kind == ByteProvider::Memory)<br class="gmail_msg">
+ return LHS;<br class="gmail_msg">
+ else<br class="gmail_msg">
+ return RHS;<br class="gmail_msg">
+ };<br class="gmail_msg">
+<br class="gmail_msg">
+ SmallVector<ByteProvider, 4> Result(ByteWidth);<br class="gmail_msg">
+ for (unsigned i = 0; i < LHS->size(); i++)<br class="gmail_msg">
+ Result[i] = OR(LHS.getValue()[i], RHS.getValue()[i]);<br class="gmail_msg">
</blockquote>
<div class=""><br class="">
</div>
<div class="">This is really inefficient.</div>
<div class=""><br class="">
</div>
<div class="">You build the entire tree for each layer even if you throw it away again.</div>
</div>
</div>
</div>
</blockquote>
<div>Do you mean cases when some of the bytes turn to unknown? Once we get providers for one operand we might know that specific bytes will turn into unknown but we still need to collect providers for other bytes. Currently collectByteProviders does it for
all bytes in the value. One of the option is to provide a mask for collectByteProviders to specify which bytes are actually needed. </div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class="gmail_quote">
<div class="">And because you recurse infinitely upwards, it is quadratic for tall chains.</div>
<div class=""><br class="">
</div>
<div class="">For combines that recurse upward to not be quadratic in nature, they must:</div>
<div class="">- Combine away the nodes they recurse upward through so we don't re-visit them, or</div>
<div class="">- Have a constant factor limit on the depth of recursion</div>
</div>
</div>
</div>
</blockquote>
collectByteProviders doesn’t follow nodes which have more than one use (the first check in this function). I don’t want to combine patterns individual parts of which are used somewhere else. The side effect of it is collectByteProviders never visits the same
node more than once.</div>
<div>
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class="gmail_quote">
<div class=""><br class="">
</div>
<div class="">And to make matters worse when you have a load of more than one byte, you build the information for the load redundantly, once for each byte.</div>
</div>
</div>
</div>
</blockquote>
Can you explain? collectByteProviders collects providers for all the bytes in the value. To handle or I collect providers for all bytes for both sides of the operation and then analyze the result.<br class="">
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class="gmail_quote">
<div class=""><br class="">
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+<br class="gmail_msg">
+ return Result;<br class="gmail_msg">
+ }<br class="gmail_msg">
+ case ISD::SHL: {<br class="gmail_msg">
+ auto ShiftOp = dyn_cast<ConstantSDNode>(Op->getOperand(1));<br class="gmail_msg">
+ if (!ShiftOp)<br class="gmail_msg">
+ return None;<br class="gmail_msg">
+<br class="gmail_msg">
+ uint64_t BitShift = ShiftOp->getZExtValue();<br class="gmail_msg">
+ if (BitShift % 8 != 0)<br class="gmail_msg">
+ return None;<br class="gmail_msg">
+ uint64_t ByteShift = BitShift / 8;<br class="gmail_msg">
+<br class="gmail_msg">
+ auto Original = collectByteProviders(Op->getOperand(0),<br class="gmail_msg">
+ /*CheckNumberOfUses=*/true);<br class="gmail_msg">
+ if (!Original)<br class="gmail_msg">
+ return None;<br class="gmail_msg">
+<br class="gmail_msg">
+ SmallVector<ByteProvider, 4> Result;<br class="gmail_msg">
+ Result.insert(Result.begin(), ByteShift, ByteProvider::getZero());<br class="gmail_msg">
+ Result.insert(Result.end(), Original->begin(),<br class="gmail_msg">
+ std::prev(Original->end(), ByteShift));<br class="gmail_msg">
+ assert(Result.size() == ByteWidth && "sanity");<br class="gmail_msg">
+ return Result;<br class="gmail_msg">
</blockquote>
<div class=""><br class="">
</div>
<div class="">This is again really inefficient.</div>
<div class=""><br class="">
</div>
<div class="">First, you build the full Original set of things, but then throw away the tail. And second you are copying things from small vector to small vector everywhere here.</div>
</div>
</div>
</div>
</blockquote>
The same idea with the mask of required bytes might work here. <br class="">
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class="gmail_quote">
<div class=""><br class="">
</div>
<div class="">This really feel like it should be linearly filling in the provider for each byte of the bytes that survived to be used, with no copying or other movement. That way you can only recurse through the byte providers you actually care about, etc.
Does that make sense?</div>
</div>
</div>
</div>
</blockquote>
The tradeoff of this approach is that we need to scan the whole tree for each individual byte.</div>
<div>
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class="gmail_quote">
<div class=""><br class="">
</div>
<div class="">Or maybe some other algorithm. But fundamentally, I think this recursive walk will need to be significantly re-worked.</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+ // Calculate byte providers for the OR we are looking at<br class="gmail_msg">
+ auto Res = collectByteProviders(SDValue(N, 0));<br class="gmail_msg">
+ if (!Res)<br class="gmail_msg">
+ return SDValue();<br class="gmail_msg">
+ auto &Bytes = Res.getValue();<br class="gmail_msg">
+ unsigned ByteWidth = Bytes.size();<br class="gmail_msg">
+ assert(VT.getSizeInBits() == ByteWidth * 8 && "sanity");<br class="gmail_msg">
+<br class="gmail_msg">
+ auto LittleEndianByteAt = [](unsigned BW, unsigned i) { return i; };<br class="gmail_msg">
+ auto BigEndianByteAt = [](unsigned BW, unsigned i) { return BW - i - 1; };<br class="gmail_msg">
+<br class="gmail_msg">
+ Optional<BaseIndexOffset> Base;<br class="gmail_msg">
+ SDValue Chain;<br class="gmail_msg">
+<br class="gmail_msg">
+ SmallSet<LoadSDNode *, 8> Loads;<br class="gmail_msg">
+ LoadSDNode *FirstLoad = nullptr;<br class="gmail_msg">
+<br class="gmail_msg">
+ // Check if all the bytes of the OR we are looking at are loaded from the same<br class="gmail_msg">
+ // base address. Collect bytes offsets from Base address in ByteOffsets.<br class="gmail_msg">
+ SmallVector<int64_t, 4> ByteOffsets(ByteWidth);<br class="gmail_msg">
+ for (unsigned i = 0; i < ByteWidth; i++) {<br class="gmail_msg">
+ // All the bytes must be loaded from memory<br class="gmail_msg">
+ if (Bytes[i].Kind != ByteProvider::Memory)<br class="gmail_msg">
+ return SDValue();<br class="gmail_msg">
</blockquote>
<div class=""><br class="">
</div>
<div class="">This again seems especially wasteful. Above you carefully built a structure tracking which bytes were zeroed and which came from memory, but you don't need all of that. You just need the bytes from memory and you could early exit if they aren't
from memory.</div>
<div class=""><br class="">
</div>
<div class="">Alternatively, you could implement something that does the full load and mask off the things that need to be zero. But that would be a more complex cost model and tradeoff space so it might not be worth it.</div>
</div>
</div>
</div>
</blockquote>
I’m going to use zero byte information to handle partially available values. Now the transform only handles the patterns when all the bytes are known to be loaded from memory. But I’d like to support cases when we know that the higher bytes are zeros and lower
bytes are read from memory. </div>
<div><br class="">
</div>
<div>Artur</div>
<div>
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class="gmail_quote">
<div class=""> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
==============================================================================<br class="gmail_msg">
--- llvm/trunk/test/CodeGen/ARM/load-combine-big-endian.ll (added)<br class="gmail_msg">
+++ llvm/trunk/test/CodeGen/ARM/load-combine-big-endian.ll Tue Dec 13 08:21:14 2016<br class="gmail_msg">
@@ -0,0 +1,234 @@<br class="gmail_msg">
+; RUN: llc < %s -mtriple=armeb-unknown | FileCheck %s<br class="gmail_msg">
+; RUN: llc < %s -mtriple=arm64eb-unknown | FileCheck %s --check-prefix=CHECK64<br class="gmail_msg">
+<br class="gmail_msg">
+; i8* p; // p is 4 byte aligned<br class="gmail_msg">
+; ((i32) p[0] << 24) | ((i32) p[1] << 16) | ((i32) p[2] << 8) | (i32) p[3]<br class="gmail_msg">
+define i32 @load_i32_by_i8_big_endian(i32*) {<br class="gmail_msg">
+; CHECK-LABEL: load_i32_by_i8_big_endian:<br class="gmail_msg">
+; CHECK: ldr r0, [r0]<br class="gmail_msg">
+; CHECK-NEXT: mov pc, lr<br class="gmail_msg">
+<br class="gmail_msg">
+; CHECK64-LABEL: load_i32_by_i8_big_endian:<br class="gmail_msg">
+; CHECK64: ldr w0, [x0]<br class="gmail_msg">
+; CHECK64-NEXT: ret<br class="gmail_msg">
+ %2 = bitcast i32* %0 to i8*<br class="gmail_msg">
+ %3 = load i8, i8* %2, align 4<br class="gmail_msg">
</blockquote>
<div class=""><br class="">
</div>
<div class="">Please *always* use named values in your tests. otherwise making minor updates is incredibly painful as you have to renumber all subsequent values.</div>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>