[llvm-dev] Load combine pass

Thu Sep 29 10:56:44 PDT 2016

Hi David,

David Chisnall wrote:
 > Nope, we’re not using the address sanitiser.  Our architecture 
supports byte-granularity bounds checking in hardware.

I mentioned address sanitizer since (then) I thought your architecture
would have to prohibit the same kinds of transforms that address
sanitizer has to prohibit.

However, on second thought, I think I have a counter-example to my
statement above -- I suppose your architecture only checks bounds and
not that the location being loaded from is initialized?

 > Note that even without this, for pure MIPS code without our
 > extensions, load widening generates significantly worse code than when
 > it doesn’t happen.  I’m actually finding it difficult to come up with
 > a microarchitecture where a 16-bit load followed by an 8-bit load from
 > the same cache line would give worse performance than a 32-bit load, a
 > mask and a shift.  In an in-order design, it’s more instructions to do
 > the same work, and therefore slower.  In an out-of-order design, the
 > two loads within the cache line will likely be dispatched
 > simultaneously and you’ll have less pressure on the register rename
 > engine.

That makes sense, but what do you think of Artur's suggestion of
catching only the obvious patterns?  That is, catching only cases like

   i16* ptr = ...
   i32 val = ptr[0] | (ptr[1] << 16);

==> // subject to endianess

   i16* ptr = ...
   i32 val = *(i32*) ptr;

To me that seems like a win (or at least, not a loss) on any
architecture.  However, I will admit that I've only ever worked on x86
so I have a lot of blind spots here.

Thanks!
-- Sanjoy