[llvm-dev] [GlobalISel][MIPS] Legality and instruction combining

Fri Sep 14 09:31:27 PDT 2018

> On 14 Sep 2018, at 06:50, pavram <Petar.Avramovic at rt-rk.com> wrote:
> 
> 
> 
> 
> Hi Daniel, 
> 
> On 13.09.2018. 19:32, Daniel Sanders wrote: 
>> Could you clarify what you mean here? The new legalizer info can define this with: 
>>     getActionDefinitionsBuilder(G_SELECT).clampScalar(1, s32, s32) 
>> so I'm guessing you mean that code to mutate the G_SELECT is currently missing 
> Yes, LegalizerHelper::widenScalar widens only TypeIdx==0, it doesn't do that for TypeIdx==1. Is it intentionally implemented this way? 

No, we just haven't needed to implement the TypeIdx==1 case so far. Up until recently we didn't have a means to test things that weren't in use by a target. We decided it was better for the compiler to fail than to submit untested/untestable code so there's quite a few gaps like this.

>>> b) Is the plan to sometimes let s1 as legal type and ignore it later? 
>> I'm not sure what you mean here 
>> 
> For example lets look at AArch64 G_SELECT: 
>   getActionDefinitionsBuilder(G_SELECT) 
>       .legalFor({{s32, s1}, {s64, s1}, {p0, s1}}) 
>       .clampScalar(0, s32, s64) 
>       .widenScalarToNextPow2(0); 
> 
> In this case LLT of operand 1 (s1) in G_SELECT has size 1, and corresponding register class in selected instruction has size 32 (that is $src1 in AArch64::ANDSWri, it has GPR32 regsiter class). 
> For that reason s1 is practically ignored and silently converted to s32.

Oh, I see what you mean. This is also a viable option but it requires that bits 1-31 don't affect the results of legal operations no matter what value they are. This makes little difference to something like G_ADD, but things like G_LSHR (that one isn't really an issue for s1, but applies for s2-s30) have to ensure they insert leading zeros even if the bits in the container are 1's. Of course, you don't have to declare the difficult operations to be legal and can let them widenScalar up to a type that's easy to deal with.

For Mips, this approach is probably ok so long as G_TRUNC defines bits 1-31 and any other s1-producing operations that are legal maintain those bits to keep them at the value they would be if you had only just done the G_TRUNC. I expect you only want G_TRUNC/G_ANYEXT/G_SEXT/G_ZEXT, G_ICMP/G_FCMP, and G_OR/G_XOR/G_AND in which case it's easy to achieve this.

> We could get similar result if we have: 
> 
>   getActionDefinitionsBuilder(G_SELECT) 
>       .legalFor({{s32, s32}, {s64, s32}, {p0, s32}}) 
>       .clampScalar(0, s32, s64) 
>       .clampScalar(1, s32, s32) 
>       .widenScalarToNextPow2(0); 
> 
> And in this case sizes of LLTs and register classes would be the same. 
> Implementation for G_ICMP on AArch64 is very similar to second described option for G_SELECT. 
> Is there a reason for different implementation of G_SELECT and G_ICMP on AArch64? Are there some general ruses to determine which of the two given options is better in which case?

At the moment, it largely comes down to preference. I don't have any real evidence to back this up but I expect the latter will be the better optimized in the long run since maintaining the excess bits requires instructions and these are explicit in the GMIR. This means we can optimize it and decline to do it whenever it doesn't matter. For example:
	%2(s2) = G_ADD %0(s2), %1(s2)
	%4(s2) = G_ADD %2(s2), %3(s2)
	%5(s2) = G_LSHR %4(s2), i2 1
has to either preserve bits 2-31 between each operation just in case one that cares about it comes along, or define bits 2-31 just before an operation that cares about it even if they're already correct. This costs six and instructions. Whereas:
	%3(s32) = G_AND %0(s2), i32 3
	%4(s32) = G_AND %1(s2), i32 3
	%5(s32) = G_AND %2(s2), i32 3
	%6(s32) = G_ADD %3(s32), %4(s32)
	%7(s32) = G_AND %6(s32), i32 3
	%8(s32) = G_ADD %5(s32), %7(s32)
	%9(s32) = G_AND %8(s32), i32 3
	%10(s32) = G_LSHR %9(s32), i2 1
	%11(s32) = G_AND %9(s32), i32 3
can optimize to:
	%6(s32) = G_ADD %0(s32), %1(s32)
	%8(s32) = G_ADD %2(s32), %6(s32)
	%10(s32) = G_LSHR %8(s32), i2 1
	%11(s32) = G_AND %9(s32), i32 1
which costs only one and instruction.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180914/27912afc/attachment.html>