[LLVMdev] Be Careful with Positionally-Encoded Operands (AArch64, Mips, AMDGPU, etc.)

Thu Mar 13 08:39:48 PDT 2014

On Thu, Mar 13, 2014 at 10:01:32AM -0500, Hal Finkel wrote:
> ----- Original Message -----
> > From: "Tom Stellard" <tom at stellard.net>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> > Sent: Thursday, March 13, 2014 9:46:22 AM
> > Subject: Re: [LLVMdev] Be Careful with Positionally-Encoded Operands (AArch64, Mips, AMDGPU, etc.)
> > 
> > On Thu, Mar 13, 2014 at 03:21:47AM -0500, Hal Finkel wrote:
> > > Hello,
> > > 
> > > Some of the backends seem to be combining positional and named
> > > operands when defining some instructions such that some of the
> > > positional operands overlap with some of the named operands. I
> > > suspect this is not intentional; here's an example:
> > > 
> > > AArch64 has the following instruction definition:
> > > 
> > > SMULHxxx {
> > >   field bits<32> Inst = { 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, Rm{4},
> > >   Rm{3}, Rm{2}, Rm{1}, Rm{0}, 0, Ra{4}, Ra{3}, Ra{2}, Ra{1},
> > >   Ra{0}, Rn{4}, Rn{3}, Rn{2}, Rn{1}, Rn{0}, Rd{4}, Rd{3}, Rd{2},
> > >   Rd{1}, Rd{0} };
> > >   ...
> > >   dag OutOperandList = (outs GPR64:$Rd);
> > >   dag InOperandList = (ins GPR64:$Rn, GPR64:$Rm);
> > >   string AsmString = "smulh	$Rd, $Rn, $Rm";
> > >   list<dag> Pattern = [(set i64:$Rd, (mulhs i64:$Rn, i64:$Rm))];
> > >   ...
> > >   bits<5> Rd = { ?, ?, ?, ?, ? };
> > >   bits<5> Rn = { ?, ?, ?, ?, ? };
> > >   bits<5> Rm = { ?, ?, ?, ?, ? };
> > >   bits<5> Ra = { ?, ?, ?, ?, ? };
> > > ...
> > > }
> > > 
> > > So how do the operands in OutOperandList and InOperandList get
> > > mapped to the variables used to encode the instruction
> > > (Rd,Rn,Rm,Ra)? The first three match by name (as they appear
> > > explicitly in OutOperandList and InOperandList), but what about
> > > Ra? Ra contributes to defining the bits in Inst, and because there
> > > is, by default, no overlap checking, it also gets mapped to the
> > > first operand: GPR64:$Rd. The result, from
> > > AArch64GenMCCodeEmitter.inc is:
> > > 
> > >     case AArch64::SMULHxxx:
> > >     case AArch64::UMULHxxx: {
> > >       // op: Rd
> > >       op = getMachineOpValue(MI, MI.getOperand(0), Fixups, STI);
> > >       Value |= op & UINT64_C(31);
> > >       // op: Rn
> > >       op = getMachineOpValue(MI, MI.getOperand(1), Fixups, STI);
> > >       Value |= (op & UINT64_C(31)) << 5;
> > >       // op: Rm
> > >       op = getMachineOpValue(MI, MI.getOperand(2), Fixups, STI);
> > >       Value |= (op & UINT64_C(31)) << 16;
> > >       // op: Ra
> > >       op = getMachineOpValue(MI, MI.getOperand(0), Fixups, STI);
> > >       Value |= (op & UINT64_C(31)) << 10;
> > >       Value = fixMulHigh(MI, Value, STI);
> > >       break;
> > >     }
> > > 
> > > This may be correct (I have no idea), but even if it is, it seems
> > > like an odd way to get this behavior: depending on the fact that,
> > > after operands are matched by name, the remaining operands are
> > > matched by position such that they might overlap with those
> > > operands matched by name.
> > > 
> > > In any case, I believe that the only in-tree target that still
> > > really needs the positional operand support is PowerPC (because
> > > the named-operand matching scheme still can't deal with the
> > > complex operands that PPC uses for handling memory operands), and
> > > PowerPC needs the behavior that named and positional operands are
> > > disjoint so that we can start transitioning toward using named
> > > mapping, so I've added a new TableGen target bit to disable this
> > > problematic overlapping operand behavior
> > > (noNamedPositionallyEncodedOperands) in r203767.
> > > 
> > > This means that in lib/Target/PowerPC/PPC.td, we have:
> > > 
> > > def PPCInstrInfo : InstrInfo {
> > > ...
> > > 
> > >   let noNamedPositionallyEncodedOperands = 1;
> > > 
> > > ...
> > > }
> > > 
> > > I'd like those maintaining the current backends (especially
> > > AArch64, Mips, AMDGPU, which I know to be problematic in this
> > > regard) to try setting this and: a) fix those definitions that are
> > > problematic or b) explain to me that the current behavior is
> > > useful.
> > >
> > 
> > Hi Hal,
> > 
> > Thanks for working on this, I have run into several problems with the
> > positional encodings and I would be happy to see it go away.
> >  However,
> > I was under the impression that positional encoding was the only way
> > to
> > encode custom operand types that map to multiple machine operands.
> > See for example the MEMxi Operand sub-class in
> > R600/R600Instructions.td.
> 
> Yes, this is the remaining use case for the positional scheme. We need to invent something to replace it (maybe this should be something as simple as mapping '.' -> '_'). Thoughts?
> 

Do you mean something like this:

def MEMxi : Operand<iPTR> {
  let MIOperandInfo = (ops REG:$ptr, i32imm:$index);
}

def Instr {
let InOperandList = (ins MEMxi:$src);

field bits<32> Inst;
field bits<8> src_ptr;   //$src.$ptr
field bits<8> src_index; //$src.$index

let Inst{7-0} = src_ptr;
let Inst{15-8} = src_index;
}

This seems like a good idea to me.

-Tom

>  -Hal
> 
> > 
> > Thanks,
> > Tom
> > 
> > > Thanks again,
> > > Hal
> > > 
> > > --
> > > Hal Finkel
> > > Assistant Computational Scientist
> > > Leadership Computing Facility
> > > Argonne National Laboratory
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory