[PATCH] D23909: [X86] Remove DenseMap for storing FMA3 grouping information
Vyacheslav Klochkov via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 30 15:34:29 PDT 2016
v_klochkov added a comment.
`Craig,
Your comment regarding using the base opcode byte was a good surprise for me.
I did not realize that it is possible to use it and that it is always available in TSFlags.
I put FMA3 opcodes into a table.
The 9,a,b - columns are the senior 4 bits of the opcode byte.
The 6,7,8,9,a,b,c,d,e,f - rows are the lower 4 bits of the opcode byte.
For example, fmadd132ps opcode has the base opcode byte = 0x98.
9 a b
6 fmaddsub132ps/pd fmaddsub213ps/pd fmaddsub231ps/pd
7 fmsubadd132ps/pd fmsubadd213ps/pd fmsubadd231ps/pd
8 fmadd132ps/pd fmadd213ps/pd fmadd231ps/pd
9 fmadd132ss/sd fmadd213ss/sd fmadd231ss/sd
a fmsub132ps/pd fmsub213ps/pd fmsub231ps/pd
b fmsub132ss/sd fmsub213ss/sd fmsub231ss/sd
c fnmadd132ps/pd fnmadd213ps/pd fnmadd231ps/pd
d fnmadd132ss/sd fnmadd213ss/sd fnmadd231ss/sd
e fnmsub132ps/pd fnmsub213ps/pd fnmsub231ps/pd
f fnmsub132ss/sd fnmsub213ss/sd fnmsub231ss/sd
It would be possible to have functions something similar to these below:
(I did not check there that the 'Opcode' is FMA3.
// If this implementation is not safe, then we could have the same binary search through static arrays with FMA opcodes.
bool isFMA3(uint8_t Opcode) {
<Do some additional checks here to avoid possible(?) overlapping
with other/future opcodes having the same opcode byte, but different prefixes/attributes.
uint8_t LowB = Opcode && 0xf;
uint8_t HighB = Opcode >> 8;
return HighB >= 0x9 && HighB <= 0xb &&
LowB >= 0x6 /* && LowB <= 0xf*/;
}
bool isFMA3Form132(uint8_t Opcode) { return (Opcode >> 8) == 0x9; }
bool isFMA3Form213(uint8_t Opcode) { return (Opcode >> 8) == 0xa; }
bool isFMA3Form231(uint8_t Opcode) { return (Opcode >> 8) == 0xb; }
bool isFMA3Scalar(uint8_t Opcode) {
uint8_t B = Opcode & 0x0f;
return B == 0x9 || B == 0xb || B == 0xd || B == 0xf;
}
bool isFMA3_FMADD(uint8_t Opcode) {
uint8_t B = Opcode & 0xf;
return B == 0x8 || B == 0x9;
}
bool isFMA3_FMSUB(uint8_t Opcode) {
uint8_t B = Opcode & 0xf;
return B == 0xa || B == 0xb;
}
bool isFMA3_FNMADD(uint8_t Opcode) {
uint8_t B = Opcode & 0xf;
return B == 0xc || B == 0xd;
}
bool isFMA3_FNMSUB(uint8_t Opcode) {
uint8_t B = Opcode & 0xf;
return B == 0xe || B == 0xf;
}
bool isFMA3_FMADDSUB(uint8_t Opcode) { return Opcode & 0xf) == 0x6; }
bool isFMA3_FMSUBADD(uint8_t Opcode) { return Opcode & 0xf) == 0x7; }
With such solution you would not need to add those 2 bits to TSFlags for 132/213/231 forms.
We also would have the methods that can distinguish 132 vs 213 vs 231
and FMA vs FMS vs FNMA vs NMMS vs FMADDSUB vs FMSUBADD.
Unfortunately, I still would have quite long linear search for methods like this:
unsigned getFMA213Opcode(bool IsEVEX, MVT VT) {
// !!! Linear search through about 300-400 opcodes in FMA3RegOpcodes table.
Opc = <iterate through entries of FMA3RegOpcodes table>;
uint8_t BaseOpcode = ...getBaseOpcodeFor(TSFlags);
if (isFMA3_FMADD(BaseOpcode) &&
<check HasEVEX and VT here>)
return Opc;
There are at least 2 ways how to workaround that:
1. Reduce the number of entries in FMA3RegOpcodes by having 1 static array of structures like I mentioned in the previous comment (i.e. describing bigger FMA families/groups, where 1 group would include {VEX}x{Reg,Mem} + {EVEX}x{Reg,Mem,KReg,KMem,KZReg,KZMem,Broadcast,KBroadcast,KZBroadcast,Round,KRound,KZRound}.
2. Do some sort of hashing in my new FMA optimization:
// Just the idea.
unsigned FMAOpc = <iterate through FMA3RegOpcodes>;
uint8_t FMAOpcByte = <extract opcode byte from TSFlags>;
if (isFMA3FMADD(FMAOpcByte))
FMADDOperations[FMADDOperationsIndex++] = <current index in FMA3RegOpcodes>;
else if (isFMA3FMSUB(FMAOpcByte))
FMADDOperations[FMSUBOperationsIndex++] = <current index in FMA3RegOpcodes>;
else if (isFMA3FMSUB(FMAOpcByte))
FMADDOperations[FMSUBOperationsIndex++] = <current index in FMA3RegOpcodes>;
...
Even with (2), I think having bigger FMA3 groups may be useful in future.
For example, it would be convenient if/when need to add support for broadcast+operation folding at Peephole opt:
t1 = vbroadcastss <mem>
t2 = VFMADD213PSZr a, b, t1;
-->
t2 = VFMADD213PSZmb a, b, [broadcast <mem>]
Bigger groups would just easily return VFMADD213PSZmb for VFMADD213PSZr.
Thank you,
Slava
`
https://reviews.llvm.org/D23909
More information about the llvm-commits
mailing list