[PATCH] D23108: Implemented 132/213/231 forms selection for X86-FMA3-AVX512 opcodes.
Vyacheslav Klochkov via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 3 00:15:21 PDT 2016
v_klochkov created this revision.
v_klochkov added reviewers: qcolombet, craig.topper, hfinkel.
v_klochkov added subscribers: DavidKreitzer, RKSimon, spatel, llvm-commits.
`Hello,
Please review the change-set that implements the commute transformation
(and thus better memory-folding and better register coalescing) for AVX512 FMA opcodes.
This is also a fix for https://llvm.org/bugs/show_bug.cgi?id=17229
If this change-set seems too big to reviewers I can split it into 2 parts:
- 2 new files with the new class Utils/X86InstrFMA3Info.[h,cpp]
- fix users of those new classes + update of the LIT tests.
Even if that happens, this 'Differential Revision' may be still useful as it shows
both - the new classes and how they are used.
1. New classes X86InstrFMA3Info and X86InstrFMA3Group.
=======================================
Currently there are 1584 FMA3 opcodes!
Having a huge switch {case <FMA1>: case <FMA2>: case <FMA1584>: }
in the method X86InstrInfo::isFMA3() seems just wrong.
Also, there are several other places where all those opcodes would have to be listed,
for example, in X86InstrInfo::getFMA3OpcodeToCommuteOperands().
Those opcodes would also have to be in MemoryFoldTable3[] and MemoryFoldTable4[] arrays.
Finally, some other users of FMA3 opcodes are being implemented, they also would need to
list FMA3 opcodes as well, classify and group them (register vs memory forms, check k-masked/k-zero-masked, etc).
The new classes X86InstrFMA3Info and X86InstrFMA3Group:
- List all existing FMA3 opcodes in one place,
- Classify them by adding special attributes (IsIntrinsic, IsKMergeMasked, IsKZeroMasked, etc).
- Collect relative FMA3 opcodes in groups of 6 opcodes:
{FMA132r, FMA213r, FMA231r, FMA132m, FMA213m, FMA231m}
or in groups of 3 register or 3 memory-form opcodes if group of 6 cannot be formed,
for example only memory forms are available for FMAs loading/broadcasting one of operands from memory:
{FMADD132PSZmb, FMADD213PSZmb, FMADD231PSZmb}.
- Provide useful methods like isFMA3(), getFMA3Group(), isIntrinsic(),
isKMasked(), isKMergeMasked(), isZeroMasked(), get{Reg,Mem}{132,213,231}Opcode(), etc.
It also implements an iterator for walking through all register-form opcodes having memory-form equivalents.
The class X86InstrFMA3Info is a sort of singletons.
Only one object of the class is created per LLVM invocation.
It contains DenseMap<unsigned, X86InstrFMA3Group *> which maps Opcodes to their families,
i.e. groups of 6 opcodes, for example:
FMADD213SSr_Int -> {FMADD132SSr_Int, FMADD213SSr_Int, FMADD231SSr_Int,
FMADD132SSr_Int, FMADD213SSr_Int, FMADD231SSr_Int,
IsIntrinsic};
The map is initialized once with the first request to the X86InstrFMA3Info class,
then it is possible to get a const reference to groups of FMAs (represented by X86InstrFMA3Group)
using an opcode from such group:
static const X86InstrFMA3Group *getFMA3Group(unsigned Opcode);
I used the method llvm::call_once() to ensure that the method initGroupsOnce() is called only once.
2. Other changes.
=================
X86InstrAVX512.td:
- Added isCommutable flag to more opcodes.
Even K-masked and K-zero-masked FMAs opcodes are commutable now.
X86InstrInfo.cpp:
- Removed the long list of FMA3 opcodes from MemoryFoldTable3[],
and added reg-to-mem mappings for 3 and 4 operand FMAs using
the new iterators provided by X86InstrFMA3Info class.
- Removed isFMA3() method having a huge switch-statement.
- Added support for k-masked and k-zero-masked FMAs.
- Changed the getFMA3OpcodeToCommuteOperands().
This method is again a member of X86InstrInfo and the 1st operand
is MachineInstr &MI (instead of unsigned Opcode). The reason is that
when we start analyzing the users of FMAs we need a MachineInstr and
need the method to be a member of X86InstrInfo.
avx512-fma.ll, avx512-fma-intrinsics.ll, avx512bwvl-intrinsics.ll:
- Updated 3 LIT tests.
The new code has less instructions because of better memory-folding and better register coalescing.
Thank you,
Vyacheslav Klochkov (v_klochkov)
`
https://reviews.llvm.org/D23108
Files:
llvm/lib/Target/X86/Utils/CMakeLists.txt
llvm/lib/Target/X86/Utils/X86InstrFMA3Info.cpp
llvm/lib/Target/X86/Utils/X86InstrFMA3Info.h
llvm/lib/Target/X86/X86InstrAVX512.td
llvm/lib/Target/X86/X86InstrInfo.cpp
llvm/lib/Target/X86/X86InstrInfo.h
llvm/test/CodeGen/X86/avx512-fma-intrinsics.ll
llvm/test/CodeGen/X86/avx512-fma.ll
llvm/test/CodeGen/X86/avx512bwvl-intrinsics.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D23108.66619.patch
Type: text/x-patch
Size: 98933 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160803/80efb8ef/attachment.bin>
More information about the llvm-commits
mailing list