[PATCH] D28637: [PPC] Inline expansion of memcmp

Thu Mar 9 11:06:47 PST 2017

syzaara added inline comments.

================
Comment at: lib/CodeGen/CodeGenPrepare.cpp:2233
+    Builder.Insert(NewBr);
+    return;
+  }
----------------
efriedma wrote:
> syzaara wrote:
> > efriedma wrote:
> > > Hmm... in isOnlyUsedInZeroEqualityComparison mode, it probably makes sense to perform more loads per block, to reduce the number of branches.
> > I'm not sure what you mean here. We have the branches for doing an early exit so we don't need to continue loading more bytes when we find a difference.
> A couple loads hitting the cache are cheaper than a branch in cases where the branch isn't predictable... but maybe not a big deal either way.
I'm still not sure how this reduces the number of branches. We would still need to compare each of the loads to see which one had the difference. 

================
Comment at: lib/CodeGen/CodeGenPrepare.cpp:2269
+  Value *Subtract = Builder.CreateSub(And1, And2);
+  Value *Res = Builder.CreateSExtOrTrunc(Subtract, Builder.getInt32Ty());
+
----------------
efriedma wrote:
> Would `bswap(Src1) > bswap(Src2) ? 1 : -1` be shorter?
On power, having a load followed by a bswap will get converted to a load byte reversed hardware instruction. But I'm not sure if other targets will have a good way to handle bswap within gprs. It could introduce a longer sequence. Also this way, we are adding more branches with the comparison. 

https://reviews.llvm.org/D28637