[PATCH] D27803: [ARM] GlobalISel: Load i1, i8 and i16 args from stack

Wed Jan 25 03:21:52 PST 2017

rengolin accepted this revision.
rengolin added inline comments.
This revision is now accepted and ready to land.

================
Comment at: lib/Target/ARM/ARMCallLowering.cpp:134-142
+    if (VA.getLocInfo() == CCValAssign::SExt ||
+        VA.getLocInfo() == CCValAssign::ZExt)
+      // If the argument is zero- or sign-extended by the caller, its size
+      // becomes 4 bytes, so that's what we should load.
+      AddDefaultPred(MIRBuilder.buildInstr(ARM::LDRi12)
+                         .addDef(ValVReg)
+                         .addUse(Addr)
----------------
rovka wrote:
> rengolin wrote:
> > rovka wrote:
> > > t.p.northover wrote:
> > > > Is this necessary? Just because the caller stored  4 bytes doesn't mean we have to load 4. You might make some kind of efficiency argument if you expect most arithmetic to be done on i32 (as in C code), but I don't think this really helps there anyway because the G_SEXT/G_ZEXT will still exist in the MIR.
> > > > 
> > > > Also, if this really is necessary the `MMO` is misrecording the size of the memory operation.
> > > Well, actually we're not generating a G_ZEXT or G_SEXT here... Should we? 
> > > 
> > > It looks like the simplest thing to do, but OTOH it's the caller's job to generate the extension, so if we did anything here we'd just be duplicating the caller's work (and introducing a potential source of bugs).
> > > 
> > > I'm not sure what the best course of action is: 
> > > * Keep the size and generate the G_LOAD + G_ZEXT/G_SEXT instructions
> > > * Generate G_LOAD with size 4
> > > * Generate LDRi12 directly as in this patch
> > > 
> > > Are we breaking any assumptions if we choose one of the last 2? Would that be premature optimization?
> > > 
> > > 
> > I believe we should start with generating the right load (B/H/W) and Z/S-extend.
> > 
> > It may duplicate the caller's work (since it was their job), but it's also safer if the caller is not following the ABI.
> > 
> > Why would that matter, well, because we support multiple ABIs, and I don't claim to know all of them by heart.
> > 
> > So, we implement the safe option now, and later on optimise for specific ABIs if we can guarantee it'll be safe there, too.
> I'm not sure I understand why you think it's safer to duplicate the caller's work. Wherever there's duplication, there's the possibility of going out-of-sync between the two sides, and I don't think guarding against the caller not following the ABI is worth that risk. Also, by doing the extension to 32 bits we're already assuming *something* about the ABI. Are you thinking about something in particular about an ABI that would break if we just load those 32 bits? FWIW, the current instruction selection just loads 32 bits for every calling convention supported by ARM (except for ghccc, which just asserts).
I was under the impression that Tim had knowledge of some ABI (Apple) that didn't follow that rule. Re-reading his comment now it seems it was just a point to optimisation, which should not be our worry for now.

If the current selector does this, than **not** changing behaviour is the way to go. Bonus points for following the ABI. :)

https://reviews.llvm.org/D27803