[PATCH] D60441: [X86] Make _Int instructions the preferred instructon for the assembly parser and disassembly parser to remove inconsistencies between VEX and EVEX.

Tue Apr 9 09:38:50 PDT 2019

craig.topper added a comment.

In D60441#1459597 <https://reviews.llvm.org/D60441#1459597>, @lebedev.ri wrote:

> > Many of our instructions have both a _Int form used by intrinsics and a form used by other IR constructs.
>
> Is there any documentation, a comment, a mail thread somewhere that explains why this is the way it is?
>  I.e. why are those `_Int` variants need to exist? (are they temporary, or to stay forever)
>
> The mca(?) regression is troubling.

The _Int instructions use VR128 regclasses while the non _Int versions use the smaller FR32/FR64 register classes. For unary operations like cvtss2sd, the X86 hardware definition reads two source registers, one of them determines the input for the operation that is being performed, the other register is just used to define the final upper bits. For the _Int versions we model this with 2 inputs. For the non _Int version we only model one of the inputs for the legacy SSE encoding. For the VEX encoding we do model both inputs and set one to IMPLICIT_DEF. We have to do this for VEX since the operands are "tied" so the register allocator must assign a register for the implicit_def. For the SSE instructions we do have a late pass that knows these instructions have a "partialRegUpdate" and will insert a dependency breaking XOR based on how long its been since that register was last written. For VEX we try to reassign the register to the oldest register we can find and if that doesn't work we use an XOR to break the dependency.

For binary instructions like addss the lower bits are calculated by adding the lower bits of both sources. The upper bits of the output are defined by upper bits of the first source register. For _Int we use VR128 for both sources and the destination. For non _Int we use FR32/FR64 for both sources and the output.

If we were to merge them it would require a bunch of COPY_TO_REGCLASS conversions to be added to the isel patterns. I fear it would have weird effects on the coalescer and how the register allocate calculates spill slot sizes. For pure scalar float code coming from C not using intrinsics woudl be pessimized. After isel the instructions would produce a VR128, it would be copied to FR32 and then it would be copied back to VR128 for the next instruction. The register coalescer would merge those copies and only VR128 would exist. Then any spills would use a 128-bit spill slot even though we don't care about the upper bits.

I do think we should look into fixing the unary non _Int instructions to list their pass through input and assign it to IMPLICIT_DEF like we do for VEX.

The llvm-mca change does reflect the data the scheduler would see when the user used intrinsics. So its not exactly a "regression". Its showing an existing difference in modeling between the _Int and non _Int instructions. Even though they use the same encoding and the same hardware.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60441/new/

https://reviews.llvm.org/D60441