[PATCH] D23909: [X86] Remove DenseMap for storing FMA3 grouping information

Vyacheslav Klochkov via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 30 15:34:29 PDT 2016

v_klochkov added a comment.


Your comment regarding using the base opcode byte was a good surprise for me.
I did not realize that it is possible to use it and that it is always available in TSFlags.

I put FMA3 opcodes into a table.
The 9,a,b - columns are the senior 4 bits of the opcode byte.
The 6,7,8,9,a,b,c,d,e,f - rows are the lower 4 bits of the opcode byte.
For example, fmadd132ps opcode has the base opcode byte = 0x98.

  			9					a					b
  		6	fmaddsub132ps/pd	fmaddsub213ps/pd	fmaddsub231ps/pd
  		7	fmsubadd132ps/pd	fmsubadd213ps/pd	fmsubadd231ps/pd
  		8	fmadd132ps/pd		fmadd213ps/pd		fmadd231ps/pd
  		9	fmadd132ss/sd		fmadd213ss/sd		fmadd231ss/sd
  		a	fmsub132ps/pd		fmsub213ps/pd		fmsub231ps/pd
  		b	fmsub132ss/sd		fmsub213ss/sd		fmsub231ss/sd
  		c	fnmadd132ps/pd		fnmadd213ps/pd		fnmadd231ps/pd
  		d	fnmadd132ss/sd		fnmadd213ss/sd		fnmadd231ss/sd
  		e	fnmsub132ps/pd		fnmsub213ps/pd		fnmsub231ps/pd
  		f	fnmsub132ss/sd		fnmsub213ss/sd		fnmsub231ss/sd

It would be possible to have functions something similar to these below:
(I did not check there that the 'Opcode' is FMA3.

  		// If this implementation is not safe, then we could have the same binary search through static arrays with FMA opcodes.
  		bool isFMA3(uint8_t Opcode) { 
  		<Do some additional checks here to avoid possible(?) overlapping
  		with other/future opcodes having the same opcode byte, but different prefixes/attributes.
  		uint8_t LowB = Opcode && 0xf;
  		uint8_t HighB = Opcode >> 8;
  		return HighB >= 0x9 && HighB <= 0xb &&
  				LowB >= 0x6 /* && LowB <= 0xf*/;
  		bool isFMA3Form132(uint8_t Opcode) { return (Opcode >> 8) == 0x9; }
  		bool isFMA3Form213(uint8_t Opcode) { return (Opcode >> 8) == 0xa; }
  		bool isFMA3Form231(uint8_t Opcode) { return (Opcode >> 8) == 0xb; }
  		bool isFMA3Scalar(uint8_t Opcode) {
  		uint8_t B = Opcode & 0x0f;
  		return B == 0x9 || B == 0xb || B == 0xd || B == 0xf;
  		bool isFMA3_FMADD(uint8_t Opcode) {
  		uint8_t B = Opcode & 0xf;
  		return B == 0x8 || B == 0x9;
  		bool isFMA3_FMSUB(uint8_t Opcode) {
  		uint8_t B = Opcode & 0xf;
  		return B == 0xa || B == 0xb;
  		bool isFMA3_FNMADD(uint8_t Opcode) {
  		uint8_t B = Opcode & 0xf;
  		return B == 0xc || B == 0xd;
  		bool isFMA3_FNMSUB(uint8_t Opcode) {
  		uint8_t B = Opcode & 0xf;
  		return B == 0xe || B == 0xf;
  		bool isFMA3_FMADDSUB(uint8_t Opcode) { return Opcode & 0xf) == 0x6; }
  		bool isFMA3_FMSUBADD(uint8_t Opcode) { return Opcode & 0xf) == 0x7; }

With such solution you would not need to add those 2 bits to TSFlags for 132/213/231 forms.

We also would have the methods that can distinguish 132 vs 213 vs 231

Unfortunately, I still would have quite long linear search for methods like this:

  		unsigned getFMA213Opcode(bool IsEVEX, MVT VT) {
  		// !!! Linear search through about 300-400 opcodes in FMA3RegOpcodes table.
  		Opc = <iterate through entries of FMA3RegOpcodes table>;
  		uint8_t BaseOpcode = ...getBaseOpcodeFor(TSFlags);
  		if (isFMA3_FMADD(BaseOpcode) &&
  			<check HasEVEX and VT here>)
  		  return Opc;

There are at least 2 ways how to workaround that:

1. Reduce the number of entries in FMA3RegOpcodes by having 1 static array of structures like I mentioned in the previous comment (i.e. describing bigger FMA families/groups, where 1 group would include {VEX}x{Reg,Mem} + {EVEX}x{Reg,Mem,KReg,KMem,KZReg,KZMem,Broadcast,KBroadcast,KZBroadcast,Round,KRound,KZRound}.

2. Do some sort of hashing in my new FMA optimization:

  		// Just the idea.
  		unsigned FMAOpc = <iterate through FMA3RegOpcodes>;
  		uint8_t FMAOpcByte = <extract opcode byte from TSFlags>;
  		if (isFMA3FMADD(FMAOpcByte))
  			FMADDOperations[FMADDOperationsIndex++] = <current index in FMA3RegOpcodes>;
  		else if (isFMA3FMSUB(FMAOpcByte))
  			FMADDOperations[FMSUBOperationsIndex++] = <current index in FMA3RegOpcodes>;
  		else if (isFMA3FMSUB(FMAOpcByte))
  			FMADDOperations[FMSUBOperationsIndex++] = <current index in FMA3RegOpcodes>;

Even with (2), I think having bigger FMA3 groups may be useful in future.
For example, it would be convenient if/when need to add support for broadcast+operation folding at Peephole opt:
	t1 = vbroadcastss <mem>
	t2 = VFMADD213PSZr a, b, t1;
	t2 = VFMADD213PSZmb a, b, [broadcast <mem>]
Bigger groups would just easily return VFMADD213PSZmb for VFMADD213PSZr.

Thank you,


More information about the llvm-commits mailing list