[PATCH] D120887: The [2/2] Fix mangle problem when variable used in inline asm （non-rip for ARR[BaseReg+IndexReg+..]）

Sun Mar 6 19:50:32 PST 2022

pengfei added inline comments.

================
Comment at: llvm/lib/MC/MCParser/AsmParser.cpp:5990
+      return false;
+  }
+
----------------
xiangzhangllvm wrote:
> xiangzhangllvm wrote:
> > xiangzhangllvm wrote:
> > > pengfei wrote:
> > > > xiangzhangllvm wrote:
> > > > > pengfei wrote:
> > > > > > xiangzhangllvm wrote:
> > > > > > > efriedma wrote:
> > > > > > > > If I'm understanding correctly, this is trying to detect two adjacent expressions in a specific form, and then transform one of them so they can be concatenated?  This is way too much magic.  If we aren't parsing expressions correctly, please fix the initial parse; don't fudge the expressions afterwards.
> > > > > > > I think you understand a part of it.
> > > > > > > Here the instruction parsing are finished. The Operands info written into AsmStrRewrites[].
> > > > > > > Let me take a example:
> > > > > > > Both follow cases will get same Operands info into AsmStrRewrites[i] (ARR) and AsmStrRewrites[i+1] ([edx+eax*4])
> > > > > > > Case 1: 
> > > > > > > ```
> > > > > > >   mov     eax,ARR[edx+eax*4]
> > > > > > > ```
> > > > > > > Case 2:
> > > > > > > ```
> > > > > > >  mov     eax,ARR ;   call  dword ptr[edx+eax*4]
> > > > > > > ```
> > > > > > > So here code (line 5967-5990) is just to distinguish the 2 case.
> > > > > > > 
> > > > > > > For Case 1, we can not let ARR be a RIP related address, because it already has base reg and index reg. (We can't use 3 regs for a mem address)
> > > > > > > For Case 2, it is no problem let ARR use RIP, the [edx+eax*4] are in another instruction.
> > > > > > I agree with @efriedma and think the use of `:P` is hacky. I wrote an experiment patch which seems more concise D121072.
> > > > > I have explained that efriedma's understand is not correct. There is not **concatenated **them, just distinguish. PLS refer my previous explain.
> > > > > > and then transform one of them **so they can be concatenated**? This is way too much magic. If we ...
> > > > > 
> > > > > D121072 is still try to replace the Global name in IR/MIR, that is not good way to let follow process see it. (for example Mangler), and no reason do it on codegen. Using Modifer ":P", let all process see it, and let exsiting code to handle it, in my eye, is better.
> > > > I think you misunderstood efriedma's point. He meant you should distinguish during parsing instead of distinguishing again.
> > > > Besides, I think you missed the case when ARR inside [], e.g, `[gVar + ecx + ebx]`. I agreed with him the logic is magic and not easy to understand.
> > > > D121072 intends do the mangling at front end. No following process is needed. I see front end do mangling sometime, so I think we can do it there.
> > > > D121072 intends do the mangling at front end. No following process is needed. 
> > > We can't make sure "No following process is needed. "
> > > One problem I can see is a symbol both used for lea and call. Call may @plt (or some others), but lea not. 
> > > 
> > > In fact, D121072 is only the idea of do part print job at codegen. If it can be replaced in MIR why it can be directly repalced in IR ? Just try getSymbol (which will use mangler name)
> > > That is not a good arch for IR/MIR. All the operands should be visible to IR/MIR.
> > > 
> > > 
> > > Besides, I think you missed the case when ARR inside [], e.g, [gVar + ecx + ebx]
> > Let me refine the code to cover this case, 
> > BTW, that is not a good/standard way to get address in this way.
> I did a test in my local.
> I think you mean the line 13 is fail. The fail is directly a “syntax error”， it is not supported by inlineasm before too. (No relation with this patch)
> 
> ```
>   1 // -fasm-blocks -S -emit-llvm
>   2 int ARR[10];
>   3 int ARR2[10];
>   4 int main()
>   5 {
>   6         int x;
>   7         ARR[6] = 7;
>   8     __asm {
>   9         mov      eax, 1
>  10         mov      ecx, 20
>  11         mov eax, DWORD PTR ARR[ecx + eax * 4]     // OK
>  12         mov eax, DWORD PTR [ARR + ecx + eax * 4]  // OK
>  13 //        mov eax, DWORD PTR ARR2[ARR + ecx + eax * 4]  // Err: cannot use more than one symbol in memory operand
>  14         mov DWORD PTR [x], eax
>  15     }
>  16 }
> ```
> One problem I can see is a symbol both used for lea and call. Call may @plt (or some others), but lea not.

I think you mean the problem in D109407? Seems the patch is still arguable. I think we can solve one by one. Since it's already fail, I don't think it's a problem for this one.

> I think you mean the line 13 is fail.

No, I meant line 12. Why is it OK? I didn't find you handle it here. How about ARR is the last one, e.g., `[ecx + eax * 4 + ARR]` ?

My point is you almost do the parsing logic here again. It's redundant and fragile. If there's any update in the parser code, people is not aware they should update here too.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120887/new/

https://reviews.llvm.org/D120887