[PATCH] D23108: Implemented 132/213/231 forms selection for X86-FMA3-AVX512 opcodes.

Vyacheslav Klochkov via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 3 00:15:21 PDT 2016


v_klochkov created this revision.
v_klochkov added reviewers: qcolombet, craig.topper, hfinkel.
v_klochkov added subscribers: DavidKreitzer, RKSimon, spatel, llvm-commits.

`Hello,
 
Please review the change-set that implements the commute transformation
(and thus better memory-folding and better register coalescing) for AVX512 FMA opcodes.

This is also a fix for https://llvm.org/bugs/show_bug.cgi?id=17229
 
If this change-set seems too big to reviewers I can split it into 2 parts:
 - 2 new files with the new class Utils/X86InstrFMA3Info.[h,cpp]
 - fix users of those new classes + update of the LIT tests.
Even if that happens, this 'Differential Revision' may be still useful as it shows
both - the new classes and how they are used.


1. New classes X86InstrFMA3Info and X86InstrFMA3Group.
=======================================
Currently there are 1584 FMA3 opcodes!
Having a huge switch {case <FMA1>: case <FMA2>: case <FMA1584>: } 
in the method X86InstrInfo::isFMA3() seems just wrong.

Also, there are several other places where all those opcodes would have to be listed,
for example, in X86InstrInfo::getFMA3OpcodeToCommuteOperands().
Those opcodes would also have to be in MemoryFoldTable3[] and MemoryFoldTable4[] arrays.

Finally, some other users of FMA3 opcodes are being implemented, they also would need to
list FMA3 opcodes as well, classify and group them (register vs memory forms, check k-masked/k-zero-masked, etc).

The new classes X86InstrFMA3Info and X86InstrFMA3Group:
- List all existing FMA3 opcodes in one place,
- Classify them by adding special attributes (IsIntrinsic, IsKMergeMasked, IsKZeroMasked, etc).
- Collect relative FMA3 opcodes in groups of 6 opcodes:
    {FMA132r, FMA213r, FMA231r, FMA132m, FMA213m, FMA231m}
  or in groups of 3 register or 3 memory-form opcodes if group of 6 cannot be formed,
  for example only memory forms are available for FMAs loading/broadcasting one of operands from memory:
    {FMADD132PSZmb, FMADD213PSZmb, FMADD231PSZmb}.
- Provide useful methods like isFMA3(), getFMA3Group(), isIntrinsic(), 
  isKMasked(), isKMergeMasked(), isZeroMasked(), get{Reg,Mem}{132,213,231}Opcode(), etc.
  It also implements an iterator for walking through all register-form opcodes having memory-form equivalents.

The class X86InstrFMA3Info is a sort of singletons. 
Only one object of the class is created per LLVM invocation. 
It contains DenseMap<unsigned, X86InstrFMA3Group *> which maps Opcodes to their families,
i.e. groups of 6 opcodes, for example:
FMADD213SSr_Int -> {FMADD132SSr_Int, FMADD213SSr_Int, FMADD231SSr_Int, 
                    FMADD132SSr_Int, FMADD213SSr_Int, FMADD231SSr_Int,
					IsIntrinsic};
The map is initialized once with the first request to the X86InstrFMA3Info class,
then it is possible to get a const reference to groups of FMAs (represented by X86InstrFMA3Group) 
using an opcode from such group:
  static const X86InstrFMA3Group *getFMA3Group(unsigned Opcode);
I used the method llvm::call_once() to ensure that the method initGroupsOnce() is called only once.
  
2. Other changes.
=================
X86InstrAVX512.td:
  - Added isCommutable flag to more opcodes.
    Even K-masked and K-zero-masked FMAs opcodes are commutable now.

X86InstrInfo.cpp:
  - Removed the long list of FMA3 opcodes from MemoryFoldTable3[],
    and added reg-to-mem mappings for 3 and 4 operand FMAs using
    the new iterators provided by X86InstrFMA3Info class.
  - Removed isFMA3() method having a huge switch-statement.
  - Added support for k-masked and k-zero-masked FMAs.
  - Changed the getFMA3OpcodeToCommuteOperands(). 
    This method is again a member of X86InstrInfo and the 1st operand 
	is MachineInstr &MI (instead of unsigned Opcode). The reason is that
	when we start analyzing the users of FMAs we need a MachineInstr and
	need the method to be a member of X86InstrInfo.

avx512-fma.ll, avx512-fma-intrinsics.ll, avx512bwvl-intrinsics.ll:
  - Updated 3 LIT tests. 
    The new code has less instructions because of better memory-folding and better register coalescing.

Thank you,
Vyacheslav Klochkov (v_klochkov)
`

https://reviews.llvm.org/D23108

Files:
  llvm/lib/Target/X86/Utils/CMakeLists.txt
  llvm/lib/Target/X86/Utils/X86InstrFMA3Info.cpp
  llvm/lib/Target/X86/Utils/X86InstrFMA3Info.h
  llvm/lib/Target/X86/X86InstrAVX512.td
  llvm/lib/Target/X86/X86InstrInfo.cpp
  llvm/lib/Target/X86/X86InstrInfo.h
  llvm/test/CodeGen/X86/avx512-fma-intrinsics.ll
  llvm/test/CodeGen/X86/avx512-fma.ll
  llvm/test/CodeGen/X86/avx512bwvl-intrinsics.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D23108.66619.patch
Type: text/x-patch
Size: 98933 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160803/80efb8ef/attachment.bin>


More information about the llvm-commits mailing list