[PATCH] D62423: [globalisel][legalizer] Attempt to write down the minimal legalization rules

Wed Jun 19 06:13:14 PDT 2019

Petar.Avramovic added inline comments.

================
Comment at: llvm/docs/GlobalISel.rst:422
+
+Minimum Rules For Scalars
+"""""""""""""""""""""""""
----------------
dsanders wrote:
> Petar.Avramovic wrote:
> > dsanders wrote:
> > > Petar.Avramovic wrote:
> > > > Here we are assuming that set of available integer scalars is the same as set of all available floating point scalars. This does not have to be the case. 
> > > > e.g. "-mtriple=mipsel-linux-gnu" "-mtriple=i386-linux-gnu -mattr=+sse2"(http://lists.llvm.org/pipermail/llvm-dev/2017-July/114930.html) have i32, float/double available but i64 is not available.
> > > > G_FPTRUNC is legal for {s32, s64} but G_TRUNC is not (for -mtriple=mipsel-linux-gnu I would say that it is unsupported since there are no register classes that hold less then 32 bits, and would expect artifact combiner to deal with G_TRUNC).
> > > > 
> > > > Here we are assuming that set of available integer scalars is the same as set of all available floating point scalars. This does not have to be the case. 
> > > > e.g. "-mtriple=mipsel-linux-gnu" "-mtriple=i386-linux-gnu -mattr=+sse2"(http://lists.llvm.org/pipermail/llvm-dev/2017-July/114930.html) have i32,
> > > > float/double available but i64 is not available.
> > > 
> > > No, the interpretation of the bits as being integer or floating point is irrelevant to this. An s64 containing a floating point value is still required to be truncatable with G_TRUNC. For example, this is necessary to feed it into an s32 store instruction. Similarly, an s32 containing half of a floating point value is still required to be any-extendable with G_ANYEXT in order to compose it with the other half to construct a full value.
> > > 
> > > > G_FPTRUNC is legal for {s32, s64} but G_TRUNC is not (for -mtriple=mipsel-linux-gnu I would say that it is unsupported since there are no register > classes that hold less then 32 bits, and would expect artifact combiner to deal with G_TRUNC).
> > > 
> > > G_TRUNC must be legal from s64 to s32 for a target to be guaranteed a means to store the s64 to memory (MIPS would implement it using mfc1). You're unlikely to actually use that path since MIPS you also have legal operations that make the guaranteed path redundant (e.g. sdc1)
> > OK, let's add rules defined here to ones that already exist.
> > MIPS will then often have to select sequences of move instructions because Legalizer did not perform desired combine/narrow scalar because type information was not available.
> > 
> > Where could we perform desired legalization/combine ? Assume that we are able to find out what needs to be combined/legalized with narrow scalar.
> > Maybe target custom postLegalizer Legalizer pass or target custom postRegBankSelect Legalizer pass. It would narrow scalar i64 G_LOAD but would not touch f64 G_LOAD.
> > Or to let targets to put together a InstList/ArtifactList during RegBankSelect and forward these lists to legalize (function that would hold  main do/while loop  from Legalizer.cpp) at the end of regBankSelect, this way we don't have to process all instructions just ones that need to be re-legalized. Here we need different LegalizerInfo object than one used in Legalizer pass.
> You seem to be folding the job of RegBankSelect (or a custom pass) into the legalizer.
> 
> Using your G_LOAD example, the availability of ldc1 makes s64 G_LOAD legal because it's possible to handle it any s64 G_LOAD that occurs. It might not be the most efficient thing to do in a given context but efficiency is not the legalizers job. The legalizer is merely tasked with eliminating things that are (or might be) impossible to handle in subsequent passes. As a side note, the artifact combiner doesn't really belong in the legalizer but exists as a concession to prevent the legalizer blowing up the size of the GMIR too much. It's intentionally limited to combining cases that the legalizer often emits around widenScalar/narrowScalar and we don't want the scope of the artifact combiner to increase because it shouldn't really be there at all.
> 
> Binding the s64 G_LOAD to a register bank in the most efficient way is the job of the RegBankAllocator. Note that this isn't purely a matter of assigning a register bank to a vreg, it can also change the instructions.
> You have two main implementation choices in RegBankAllocator and the choice will depend on the cost of the alternatives. The simplest option is to bind it to the only register bank capable of doing the s64 G_LOAD (FPRB on MIPS). This will encourage the RegBankAllocator to bind the data dependent operations to FPRB and emit a copy when it has to use GPRB (or doing so would require fewer copies). This isn't the ideal option though as it forces all 64-bit loads through the FPU and the mfhc1/mfc1/ldc1 costs more than lw/lw.
> The better option is to give the RegBankAllocator a choice on how to bind s64 G_LOAD so that if it chooses GPRB it costs two loads plus any cross bank move cost caused by data dependencies and if it chooses FPRB it costs one load plus any cross bank move cost. This gives it the freedom to make the right decision for the context:
> * If you're just copying data from one memory location to another then (assuming s64 G_STORE is handled similarly) then it will bind to FPRB and use ldc1/sdc1 to copy the data which beats lw/sw/lw/sw.
> * If you're doing floating point operations on the data then it will generally bind to FPRB and use something like ldc1/fadd.d which beats lw/lw/mthc1/mtc1/fadd.d
> * If you're doing bit twiddling on floating point data (e.g. fneg vs xor 0x800000000000000, or fabs vs and 0x7fffffffffffffff) then it will have a choice. It will most likely pick FPRB and ldc1/fneg.d rather than lw/lw/xor 0x80000000 if there's other data-dependent floating point operations nearby but the costs are fairly close so the decision can go with GPRB in the right context
> * If you're doing integer operations on the data then it will generally bind to GPRB and use something like lw/lw/setlt/add/add/add (for a narrowScalar'd s64 G_ADD).
> 
> In summary, trying to legalize i64 G_LOAD and f64 G_LOAD differently is bad because it's not the type of the data that determines what the most efficient instructions are and whichever decision you make you're locking the code generator out of selecting the best code. It's really the operations you perform on the data that are important as in some cases, i64 is best handled with floating point instructions and in others f64 is best handled with integer instructions.
Thanks for the explanation.
Following up comment above, let's assume that we figured out that most efficient solution is to modify instruction (e.g. turn s64 G_LOAD into two s32 G_LOADs). What I meant by legalize is that changing instruction in e.g. `applyMappingImpl' is equivalent to applying some of legalization / artifact combine rules. This way we duplicate some of the code from Legalizer and legalize/combine some instructions manually. If e.g. G_LOAD needs to be turned into lw/lw/G_MERGE there is a G_UNMERGE somewhere waiting, dealing with them is equivalent to narrowScalar for G_LOAD + MERGE/UNMERGE combine.
It looks nicer to collect all instructions that have to be "relegalized" and forward them to legalizer with RegBankLegalizerInfo (this one has narrowScalar for G_LOAD from s64 to s32, the one from Legalizer has G_LOAD legal for both s64 and s32 because of Minimum Rule Set). Is this job for RegBankSelect or Target custom Pass (Legalizer2) ?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62423/new/

https://reviews.llvm.org/D62423