[PATCH] D124406: [X86] Use indirect addressing for high 2GB of x32 address space

Mon Apr 25 13:19:14 PDT 2022

craig.topper added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:1696
+    if (Subtarget->isTarget64BitILP32() && !isUInt<31>(Val) &&
+        !AM.hasBaseOrIndexReg())
+      return true;
----------------
hvdijk wrote:
> efriedma wrote:
> > hvdijk wrote:
> > > efriedma wrote:
> > > > The reasoning here seems strange.  For example, suppose I write `void f(int a) { ((char*)0x80000000)[a] = a; }`.  That has a base register, but sign-extension is wrong.
> > > > 
> > > > I guess you're trying to allow negative pointer offsets here, but I think SelectionDAG is throwing away the distinction you need here.  (At the IR level, it's easy to distinguish between the base of a GEP and the offset.)
> > > That does the right thing, that's okay to allow. That results in `movb %dil, -2147483648(%edi)`, and the fact that `%edi` is part of the address means `%edi - 2147483648` is calculated as a 32-bit value, and then zero-extended, so it calculates the exact same thing as `%edi + 2147483648` would if it were possible to do that directly. x86 is weird.
> > Oh, wait, nevermind, I think I see what you mean.  The issue isn't the register; it's the prefix indicating 32-bit addressing, and that only gets emitted if there's a register operand.  
> > 
> > It seems like you should be able to fix the encoding somehow; the prefix has nothing to do with the operands.  Or maybe there's some reason you can't... but in that case, please add a comment explaining.
> Huh, that seems to actually work when I try it, but it's something that GCC doesn't generate, something that GNU objdump disassembles confusingly as something involving %eiz, and something that llvm-objdump disassembles as if there is no address size override. Will work on that, I didn't think that was possible, thanks for the pointer.
I think %eiz mean there is a SIB byte but the index field in the SIB byte is 0b100(no index) and there is a shorter encoding that doesn't use a SIB byte. It's there to disambiguate two encodings that would otherwise print the same string.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124406/new/

https://reviews.llvm.org/D124406