[llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Mon Nov 28 06:38:52 PST 2016

For ISel, we can write .ll -> .mir tests that check the EVEX flavor is correctly selected.
For example, ‘VADDPDZ256rm’ and ‘VADDPDYrm’ are two instructions that can be differentiated in machine IR , but are both emitted as ‘VADDPD’ in machine assembly.
I did not put this suggestion to test, but I believe it should work.

From: Craig Topper [mailto:craig.topper at gmail.com]
Sent: Thursday, November 24, 2016 16:31
To: Demikhovsky, Elena <elena.demikhovsky at intel.com>
Cc: Haber, Gadi <gadi.haber at intel.com>; llvm-dev at lists.llvm.org; Rackover, Zvi <zvi.rackover at intel.com>
Subject: Re: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

test/MC/X86 goes thorugh the AsmParser. That's a different path than isel. I'm worried about not being able to see cases where isel is missing a pattern and causes us to still select a VEX instruction. I've fixed many such cases recently and I'm sure there are still more. Since simple tests don't use the larger register set, the encoding is the only way we can tell what isel is doing.

~Craig

On Thu, Nov 24, 2016 at 12:20 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>> wrote:
> I would like a command line option to disable this optimization. That way tests can still verify that EVEX instructions came out of isel by using -show-mc-encoding.

I think that keeping tests compatibility is not a reason for an additional “llc” flag. We check encoding in test/MC/X86 dir.
Is there any option to report-out from llc in non-debug mode? It should be an option to control internals of  llc process..

-           Elena

From: Haber, Gadi
Sent: Thursday, November 24, 2016 09:28
To: Craig Topper <craig.topper at gmail.com<mailto:craig.topper at gmail.com>>; Hal Finkel <hfinkel at anl.gov<mailto:hfinkel at anl.gov>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>>; Rackover, Zvi <zvi.rackover at intel.com<mailto:zvi.rackover at intel.com>>
Subject: RE: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

Thanx. This makes sense.
Note that there are many tests, mostly under test/CodeGen/X86, that are affected by this optimization and I had to modify them as they include a check of the generated encoding.
If we add such a disabling opt flag, should we now keep two sets of tests? One for the optimization on and one when it is disabled?

Thanx!
Gadi.

From: Craig Topper [mailto:craig.topper at gmail.com]
Sent: Wednesday, November 23, 2016 18:13
To: Haber, Gadi <gadi.haber at intel.com<mailto:gadi.haber at intel.com>>; Hal Finkel <hfinkel at anl.gov<mailto:hfinkel at anl.gov>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with VEX encoding

I would like a command line option to disable this optimization. That way tests can still verify that EVEX instructions came out of isel by using -show-mc-encoding.

On Wed, Nov 23, 2016 at 5:01 AM Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

________________________________
From: "Gadi via llvm-dev Haber" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Sent: Wednesday, November 23, 2016 5:50:42 AM
Subject: [llvm-dev] RFC: code size reduction in X86 by replacing EVEX with        VEX encoding

Hi All.

This is an RFC for a proposed target specific X86 optimization for reducing code size in the encoding of AVX-512 instructions when possible.

When the AVX512F instruction set was introduced in X86 it included additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31.
In order to encode the new registers of 16-31 and the additional instructions, a new encoding prefix called EVEX, which extends the existing VEX encoding, was introduced as shown below:

The EVEX encoding format:
            EVEX Opcode ModR/M [SIB] [Disp32] / [Disp8*N] [Immediate]
# of bytes: 4    1      1      1      4       / 1         1

The existing VEX encoding format:
            [VEX]   OPCODE ModR/M [SIB] [DISP]   [IMM]
# of bytes: 0,2,3   1      1      0,1   0,1,2,4  0,1

Note that the EVEX prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes.
Consequently, for the SKX architecture, many instructions that use only the lower registers of XMM0-XMM15 or YMM0-YMM15, can be encoded by either the EVEX or the VEX format. For such cases, using the VEX encoding results in a code size reduction of ~2 bytes even though it is compiled with the AVX512F/AVX512VL features enabled.

For example: “vmovss  %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible encodings:

EVEX encoding (8 bytes long):
            62 f1 7e 08 11 44 84 08         vmovss  %xmm0, 32(%rsp,%rax,4)

VEX encoding (6 bytes long):
           c5 fa 11 44 84 20                      vmovss  %xmm0, 32(%rsp,%rax,4)

See reported Bugzilla bugs about this proposed optimization:
https://llvm.org/bugs/show_bug.cgi?id=23376
https://llvm.org/bugs/show_bug.cgi?id=29162

The proposed optimization implementation is to add a table of all EVEX opcodes that can be encoded via VEX in a new header file placed under lib/Target/X86.
A new pass is to be added at the pre-emit stage.
It might be better to have TableGen generate the mapping table for you instead of manually making a table yourself. TableGen has a feature that is specifically designed to make mapping tables like this. For examples, grep for InstrMapping in:

lib/Target/Hexagon/Hexagon.td
lib/Target/Mips/MipsDSPInstrFormats.td
lib/Target/Mips/MipsInstrFormats.td
lib/Target/Mips/Mips32r6InstrFormats.td
lib/Target/PowerPC/PPC.td
lib/Target/AMDGPU/SIInstrInfo.td
lib/Target/AMDGPU/R600Instructions.td
lib/Target/SystemZ/SystemZInstrFormats.td
lib/Target/Lanai/LanaiInstrInfo.td

I've used this feature a few times in the PowerPC backend, and it's quite convenient.

 -Hal

No need for special Opt flags, as it is always better to use the reduced VEX encoding when possible.

Thank you for any comments or questions that you may have.

Sincerely,

Gadi.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161128/792c964d/attachment.html>