[PATCH] D158197: [PowerPC][lld] Account for additional X-Forms -> D-Form/DS-Forms load/stores when relaxing initial-exec to local-exec

Wed Aug 23 21:46:21 PDT 2023

amyk added inline comments.

================
Comment at: lld/ELF/Arch/PPC64.cpp:944
+          error("unrecognized instruction for IE to LE R_PPC64_TLS");
+        finalInstr = (dFormOp | (read32(loc) & 0x03FFFFFC));
+        finalReloc = R_PPC64_TPREL16_LO_DS;
----------------
nemanjai wrote:
> amyk wrote:
> > sfertile wrote:
> > > nemanjai wrote:
> > > > sfertile wrote:
> > > > > nemanjai wrote:
> > > > > > amyk wrote:
> > > > > > > nemanjai wrote:
> > > > > > > > Using `0x03FFFFFC` rather than `0x03FFFFFF` is presumably because the two least significant bits of the DS-Form must not be overwritten. While I think that makes sense, I have a bit of a concern with using a different value there:
> > > > > > > > 1. We always had DS-Forms there (`LD/STD`) and didn't treat them specially
> > > > > > > > 2. Presumably, having either of those bits set in the incoming instruction is an indication that something has gone wrong with whatever produced the code, so masking out the bits just hides the issue
> > > > > > > > 
> > > > > > > > Perhaps it would be appropriate to set the `finalInstr` the same way regardless of how we set the `dFormOp` and just have an assert (or possibly a fatal_error or warning is even better) in the DS-Form section that those bits are unset.
> > > > > > > 
> > > > > > > 
> > > > > > > I'd like to understand this suggestion a bit more.
> > > > > > > 
> > > > > > > Do you mean, we should just do:
> > > > > > > ```
> > > > > > > if (dFormOp == 0) { // Expecting a DS-Form instruction.
> > > > > > >   dFormOp = getPPCDSFormOp(secondaryOp);
> > > > > > >   if (dFormOp == 0)
> > > > > > >     error("unrecognized instruction for IE to LE R_PPC64_TLS");
> > > > > > >         
> > > > > > >     // Check if the last two bits are unset. If set, we `errorOrWarn()`
> > > > > > > 
> > > > > > >     finalReloc = R_PPC64_TPREL16_LO_DS;
> > > > > > > } else
> > > > > > >     finalReloc = R_PPC64_TPREL16_LO;
> > > > > > >   write32(loc, (dFormOp | (read32(loc) & 0x03FFFFFF)));
> > > > > > >   relocateNoSym(loc + offset, finalReloc, val);
> > > > > > > }
> > > > > > > ```
> > > > > > > 
> > > > > > > In any case, yes, if I understand correctly, `0x03FFFFFF` grabs bits 6-31 in the X-Form instruction, while `0x03FFFFFC` grabs bits 6-29, leaving the last two bits. 
> > > > > > > 
> > > > > > > Also, to give a bit more context behind why I am treating the DS-Forms differently:
> > > > > > > 
> > > > > > > I discussed the use of use of `0x03FFFFFC` in the DS-Form case with a few others before putting up the patch. 
> > > > > > > This all came about because I was using a different relocation for DS-Form `R_PPC64_TPREL16_LO_DS`.
> > > > > > > I found that when I used the different relocation with the original masking in the DS-Form case:
> > > > > > > ```
> > > > > > >   write32(loc, (dFormOp | (read32(loc) & 0x03FFFFFF)));
> > > > > > >   relocateNoSym(loc + offset, R_PPC64_TPREL16_LO_DS, val);
> > > > > > > ```
> > > > > > > 
> > > > > > > For the following situation:
> > > > > > > ```
> > > > > > > test8:
> > > > > > >   addis 4, 2, l at got@tprel at ha
> > > > > > >   ld 4, l at got@tprel at l(4)
> > > > > > >   stdx 3, 4, l at tls
> > > > > > > 
> > > > > > > .section .tdata,"awT", at progbits
> > > > > > > .globl l
> > > > > > > l:
> > > > > > > .quad 55
> > > > > > > ```
> > > > > > > I'd get `stq` instead of `std` like we would expect.
> > > > > > > ```
> > > > > > > Disassembly of section .text:
> > > > > > > 
> > > > > > > 0000000010010200 <test8>:
> > > > > > > 10010200:      	nop
> > > > > > > 10010204:      	addis 4, 13, 0
> > > > > > > 10010208:      	stq 3, -28672(4)
> > > > > > > ```
> > > > > > > Presumably this is because `std` and `stq` share the same primary opcode, and the other thing that differs are the last two bits (00 for `std` whereas 10 for `stq`), so we found that `0x03FFFFFC` works with `R_PPC64_TPREL16_LO_DS`.
> > > > > > > 
> > > > > > > I don't quite remember the exact discussion that we also had, but I think we were saying before that `LD/STD` perhaps just happened to have worked when we did `0x03FFFFFF` with `R_PPC64_TPREL16_LO` in the past, since the last two bits in these instructions are both 00.
> > > > > > > 
> > > > > > > 
> > > > > > That `stq` vs. `std` issue is a bit alarming. Where do the low bits come from? The variable `l` is aligned at 8 bytes.
> > > > > >Where do the low bits come from?
> > > > > 
> > > > > From the `stdx` instruction. The XO field is 149 so bit 30 is 1, and bit 31 is indeterminate but those are typically zeroed, so we have `2` in the last 2 bits. Using `read32(loc) & 0x03FFFFFF` will leave those bits in, and then relocating with `R_PPC64_TPREL16_LO_DS` we end preserving the non-zero value in the 'XO' bits of the instruction.
> > > > > 
> > > > > I can see why you might want to preserve what we were doing before because it was working for LD/STD but this assumption is wrong - 
> > > > > 
> > > > > > Presumably, having either of those bits set in the incoming instruction is an indication that something has gone wrong with whatever produced the code, so masking out the bits just hides the issue
> > > > > 
> > > > > The stdx instruction is the counter example. Thats why we updated the mask to reflect we are actually working with a DS-form instruction.
> > > > Ah, ok. This makes sense. Just out of curiosity, why aren't we masking out the XO field completely? It seems odd that we are masking out the bottom two bits (one bit from XO and one reserved bit) but we are leaving the rest of the XO field intact.
> > > Yeah that's a good point. I think it was probably working because the relocate function would end up masking out the relocated bits but we should update the mask used here to reflect exactly what bits we want to extract from the existing instruction.
> > Sure. I will update it so that the entire XO field is masked out. This also works.
> I'm sorry Amy, but the purpose of my comment was to common up the masks between the D-Form and DS-Form updated instructions. If I am not mistaken, regardless of what the input instruction was, we want to mask out bits 21-31. I suppose we really want to mask out all the bits that will be replaced by the D/DS field, so presumably it should be bits 16-31.
Ah! Good point. Sorry, I had misunderstood your previous comment and thought it only applied to the DS-Forms.
What you said makes sense, and I've updated it once again.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158197/new/

https://reviews.llvm.org/D158197