[LLVMdev] Symbol folding with MC

Tue Apr 26 10:18:35 PDT 2011

Hello,

On Apr 26, 2011, at 6:30 AM, Borja Ferrer wrote:

> Hello, I have some questions regarding folding operations with symbols during the instruction print stage with MC. At the moment I'm working with global symbols but i guess that other symbol types should be equivalent. 
> 
> My first question is how can i negate the address of a symbol?
> 
> Consider this piece of code:
> char g_var[80];
> char foo(int a) { return g_var[a]; }
> 
> this gets compiles into something like (in pseudo asm):
> addi a, g_var
> load retreg, a
> 
> but i dont have an add with immediate instruction so i have to do the following
> subi a, -g_var // negate g_var addr
> load retreg, a
> 
> A solution I thought could be passing a target flag indicating that a negation is needed when lowering the machineinstr into a MCInst, and adding a MCExpr to negate the symbol. But I want to know if there's a better way to do this, instead of delaying it to the stage of MCInst lowering.
> 

These sorts of constraints are normally enforced at prior to lowering to MC. Doing them directly as part of instruction selection as much as possible is good (the ARM target has examples of this for using ADD/SUB immediate instructions). For example, don't express in the target .td file(s) that you have an add-immediate instruction if you actually don't, but do add patterns for the operation using the subtract-immediate instruction. For symbolic immediate references, you're correct that the expression on the operand will include the negation.

MC is designed such that it should always represent legal instructions, and only legal instructions. That includes things like register operands being legal for the instruction, immediates being in range, etc.. There's (currently) no verification pass for those constraints, but that's the idea, so waiting 'til after MC lowering to check for and transform the instructions is not preferable and likely to break if/when we add such a verification pass.

If your target has properties that make it impossible to do this at instruction selection time, I would suggest a late machine function pass that will scan for and transform the instructions as necessary. This would all be at the MachineInstr level before lowering to MC.

> The other questions is how to fold single and complex operations on symbols, say we have something like:
> 
> unsigned int g_var[80];
> unsigned int foo() { return (unsigned int)&g_var[0] & 0x1234; }
> 
> Currently this moves the g_var address into a register and then performs the and operation, but i want this to be done at compilation time, so we have something like:
> 
> move retreg, (g_val & 0x1234)
> 

For many targets this isn't legal, as the object file format used can't represent those sorts of expressions in a relocation. It sounds like your situation is different, though.

> Without touching anything else only additions get folded, but this could be expanded into other operations like or, xor, shifts, etc.. A more complex case would be combining operations in a single statement. So my question is how to achieve this. As an idea I've thought of using a pseudo instruction that takes an operand depending of the instruction to fold, then expand this pseudo instr into the real move instruction by setting a target flag depending on the operation to fold, and in the MCInst lower stage create a MCExpr depending on these flags, but this has the problem that it can't handle more than one operation per statement.

A custom lowering or a target DAG combine would likely be your best bet.

Regards,
  Jim