[LLVMdev] Data/Address registers

Ivan Llopard ivanllopard at gmail.com
Thu Mar 15 01:55:31 PDT 2012


Le 14/03/2012 18:38, Jim Grosbach a écrit :
> On Mar 14, 2012, at 7:07 AM, Ivan Llopard wrote:
>
>> Le 07/03/2012 17:36, Jim Grosbach a écrit :
>>> On Mar 7, 2012, at 6:23 AM, Ivan Llopard<ivanllopard at gmail.com>   wrote:
>>>
>>>> Hi Jim,
>>>>
>>>> Thanks for your response.
>>>>
>>>> Le 06/03/2012 22:54, Jim Grosbach a écrit :
>>>>> Hi Ivan,
>>>>> On Mar 3, 2012, at 4:48 AM, Ivan Llopard<ivanllopard at gmail.com>    wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm facing a problem in llvm while porting it to a new target and I'll
>>>>>> need some support.
>>>>>> We have 2 kind of register, one for general purposes (i.e. arithmetic,
>>>>>> comparisons, etc.) and the other for memory addressing.
>>>>> OK. Separate register classes should be able to handle this.
>>>>>
>>>>>> Cross copies are not allowed (no data path).
>>>>> You mean you can't copy directly from a general purpose register to an address register? That's an unfortunate architectural quirk. You may have to write some interesting and potentially ugly code in copyPhysReg() to handle that.
>>>>>
>>>> Actually, I can't copy them in any way, it's just impossible :-/.
>>> Do you have load/store instructions for each register class? Worst case you could do a push/pop pair on the stack. It's really, really important that there be a way, even a very expensive way, to do this.
>> I'm curious, why is it so important ? We are trying hard to avoid this kind of situations.
> Sometimes the allocator, and other bits, will need to do a cross-class copy. It's assumed that a value can be copied between register classes for which that value type is legal. The coalescer will then go through and try hard to get rid of any copies that aren't actually needed.
>
> Specifically, as I understand it, there needs to be a way to copy between any two register classes for which the same ValueType is legal.

Ok, I understand, thanks.

>>>>>> We use clang 3.0 to produce assembler code.
>>>>>> Because both registers have the same size and type (i16), I don't know
>>>>>> what would be the best solution to distinguish them in order to match
>>>>>> the right instructions.
>>>>> The register classes should take care of this.
>>>> I tried but IMO the matching rule should be context-dependent, i.e. an i16 addition should match machine additions with operands being either data registers or address registers depending on its usage. Even if I look at index operands of load/stores (into the DAG) to match target's addressing modes, I can't assume that some operations are not being used for something else than basic arithmetics (like comparisons which are not supported for address regs). Is it still possible to get ride of this with register classes ?
>>> It should be, yes. For a contrived example of a simple add-immediate instruction for each:
>>>
>>> def ADD_address_reg: myBaseInstrClass<(outs ADDR_REG:$dst), (ins ADDR_REG:$src, i32imm:$imm), [(set ADDR_REG:$dst, (add ADDR_REG:$dst, i32imm:$imm)]>;
>>> def ADD_general_reg: myBaseInstrClass<(outs GPR:$dst), (ins GPR:$src, i32imm:$imm),  [(set GPR:$dst, (add GPR:$dst, i32imm:$imm)]>;
>>>
>>> Likewise, other operations that can target either register class should have a variant for each. ISel will choose the appropriate one based on the rest of the operands.
>> Thanks for your advice Jim, I did what you said it but it didn't work and I have no clue what is going wrong. I can't realize where register classes are matched in order to pick the right instructions. I couldn't find a trace of register classes in the instruction selection process.
>> I have these patterns defined so far:
>>
>> def AADDMri {    // Instruction MephInstr AGInstr
>>   dag OutOperandList = (outs AGRegs:$dst);
>>   dag InOperandList = (ins AGRegs:$a, i16imm:$b);
>>   list<dag>  Pattern = [(set AGRegs:$dst, (add AGRegs:$a, imm:$b))];
>>>> }
>>
>> def DADDri {    // Pattern Pat
>>   dag PatternToMatch = (add LSubRegs:$a, imm:$b);
>>   list<dag>  ResultInstrs = [(asrsat (asextr (sextr iRSubRegs:$a), (XLoadImm imm:$b)), (i16 0))];
>> }
>>
>> where asrsat has LSubRegs as its output operand. Both patterns have the same complexity and they are located at different scopes. For these two patterns, tblgen is producing the following isel opcodes:
>>
>> /*3244*/      /*Scope*/ 20, /*->3265*/
>> /*3245*/        OPC_RecordChild1, // #1 = $b
>> /*3246*/        OPC_MoveChild, 1,
>> /*3248*/        OPC_CheckOpcode, TARGET_VAL(ISD::Constant),
>> /*3251*/        OPC_MoveParent,
>> /*3252*/        OPC_CheckType, MVT::i16,
>> /*3254*/        OPC_EmitConvertToTarget, 1,
>> /*3256*/        OPC_MorphNodeTo, TARGET_VAL(ME::AADDMri), 0,
>>
>> and in the same logic chain of pattern checking, DADDri comes right after AADDMri (with Scope changes in the middle)
>>
>> /*3285*/      OPC_RecordChild0, // #0 = $a
>> /*3286*/      OPC_RecordChild1, // #1 = $b
>> /*3287*/      OPC_Scope, 42, /*->3331*/ // 2 children in Scope
>> /*3289*/        OPC_MoveChild, 1,
>> /*3291*/        OPC_CheckOpcode, TARGET_VAL(ISD::Constant),
>> /*3294*/        OPC_MoveParent,
>> /*3295*/        OPC_CheckType, MVT::i16,
>> /*3297*/        OPC_EmitNode, TARGET_VAL(ME::sextr), 0,
>>                     1/*#VTs*/, MVT::i64, 1/*#Ops*/, 0,  // Results = #2
>> /*3305*/        OPC_EmitConvertToTarget, 1,
> Huh. I would have expected OPC_EmitRegister here. Probably something different in your target causing this. I don't anticipate that it'll cause a problem, though, as there's the CheckType bits to keep things sane.
>
>> /*3307*/        OPC_EmitNodeXForm, 0, 3, // XLoadImm
>> ...
>>
>> AADDMri supersedes DADDri (the same checks are performed). It's worth to note that the result is used by another instruction which has LSubRegs as its source operand and I got copy instructions added by the iselector to meet this requirement.
> Hmm.. OK. So it's correctly understanding the class requirements of the instruction, just not doing what we want in order to meet them. I'm suspecting TableGen isn't as ambitious as one would hope in this regard. That is, defining separate instructions w/ the different register classes is a necessary, but not sufficient, condition to getting where you want to go.
>
> ISel is being driven by the ValueType, which is in turn mapped to a register class to use for that value type by default. When instructions need a different register class, regalloc will insert copies to satisfy the constraint. That is, isel is driving the register class selection. I'd thought there was at least some information flowing the other direction, but it looks like I was mistaken.

I wonder if the isel can't have another opcode(s) to check also for 
register class consistencies.

For example:

a = op1 b, c
d = op2 a, f

where op2 is meant to match mop2. It will be nice to have op1 matching 2 
different machine instructions depending on a's register class (mop1a or 
mop1b). Because llvm have a bottom-up ISel, when it reachs "a", "d" will 
be already selected and "a" will have a well defined regclass (let's say 
A). It would be interesting to be able to choose between mop1a (Aa) and 
mop1b (Ab) depending on its register class cardinality (taking the lower 
one) while satisfying A<=Aa and A<=Ab (inclusion relationship). It will 
make the isel more accurate and will reduce regclass cross-copies. If 
they are still needed, the regalloc will do the job either way. What do 
you think ?

>> I really would like to know why this is happening. It's like tblgen is not taking into account the register class assignations of both instructions :-/.
> Well, the differences are taken into account, because we're seeing the copy inserted to handle them. There's just insufficient effort made to avoid the copy entirely.
>
> Now, that's all fine and good, but doesn't directly help you solve your original problem.
>
> The more I think about it, this is effectively a heuristically based problem, as there's no 100% "right" answer. Consider the following contrived example:
> define i16 @foo(i16* %ptr, i16 %a) nounwind ssp {
>    %1 = getelementptr inbounds i16* %ptr, i16 %a
>    %2 = ptrtoint i16* %1 to i16
>    store i16 %2, i16* %ptr, align 4
>    ret i16 %2
> }
>
> The same intermediate value (%1) is being used here both as a generic i16 and as a pointer value. Which register class should be used to compute the value? There will be a cross-class copy instruction either way.
>
> I think you may be stuck having smart custom lowering for all the operations you want to work on whichever register class isn't the default for i16. That and/or or have a custom target pass that runs before register allocation to go through and clean things up, changing which instructions are used based on context. Personally, I'd probably go with the latter. Get your target basically working using the (expensive) copies first. Then start building up smarts to make the generated code efficient, not just correct. For example, a simple pattern to look for is to identify loads or stores where the address is coming from a copy of a value computed by a chain of arithmetic instructions and the values defined by those instructions have no other uses outside just computing the address. You can trivially swap those instructions (and the register classes of the operands) with the versions that operate on address registers and get rid of the copy. Honestly, that alone will likely be good enough for most cases.

Thanks for the idea!

Regards,
Ivan

> Regards,
>    Jim
>
>
>> Ivan
>>
>>>> I can make a pass before ISel to annotate the code identifying those registers which are only used for addressing (by doing a simple data-flow analysis), can it help ISelector later ?
>>>> Because I could not find how to get metadata from the DAG to drive matching rules or lowering phases, is it possible ? How is metadata transferred to the DAG, where should I look for it ?
>>>>
>>> Metadata should not be necessary for this. In general, metadata should never be used for anything that's required information, only for optional information. I.e., if it's stripped out of the IR, the backend should still generate correct code.
>>>
>>> -Jim
>>>
>>>> Ivan
>>>>
>>>>>> Moreover, the standard pointer arithmetic is not
>>>>>> enough for us (we need to support modulo operations also).
>>>>>> I thought that I could manually match every arithmetic operation while
>>>>>> matching the addressing mode but it doesn't work because intermediate
>>>>>> results are sometimes reused for other purposes (e.g. comparisons).
>>>>> I suggest getting things working correctly first and then coming back to things like this as an optimization.
>>>>>
>>>>>> Do I need to add another type to clang/llvm ?
>>>>>>
>>>>> Unlikely.
>>>>>
>>>>> Regards,
>>>>>   Jim
>>>>>
>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Ivan
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> LLVMdev at cs.uiuc.edu           http://llvm.cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev





More information about the llvm-dev mailing list