[PATCH] D39415: [ARMISelLowering] Better handling of NEON load/store for sequential memory regions

Mon Oct 30 07:26:54 PDT 2017

rengolin added inline comments.

================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:11064
+
+  while (true) {
+    for (SDNode::use_iterator UI = Addr.getNode()->use_begin(),
----------------
evgeny777 wrote:
> rengolin wrote:
> > I'm worried about this... there should be a guaranteed exit condition here.
> What about reducing number of iterations to, let's say, 16? This should be more than enough for most of cases. Also the effect of compilation time increase should become limited.
I was hoping for an approach that didn't need O(n^2) even if lower bound. If your increment is 16x, then the loop will be 16x less efficient than previously. Granted, it's doing more, but that's a big impact that I'd rather avoid.

But I have been away from isel for quite a while now, so I'll wait for some current isel developers (@qcolombet @rovka) to shed some light.

================
Comment at: test/CodeGen/ARM/misched-fusion-aes.ll:79
 ; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QC]]
+; CHECK: aese.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}
 ; CHECK: aese.8 [[QD:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
----------------
evgeny777 wrote:
> rengolin wrote:
> > why would these change?
> IR is compiled into a mixture of aesXX and vldXX instructions which are being scheduled differently after this patch is applied. I checked manually that parameters passed to each aesXX instruction are identical (though I can't guarantee that I didn't make any mistake, of course :)
Right, I expected as much. And you can't just ignore those lines because the QD, QE etc. will not match.

Hopefully, simplifying the IR could make the extra AES instructions unnecessary and we end up with a static pattern.

Repository:
  rL LLVM

https://reviews.llvm.org/D39415