[PATCH] prevent folding a scalar FP load into a packed logical FP instruction (PR22371)
spatel at rotateright.com
Fri Feb 13 08:34:56 PST 2015
Comment at: lib/Target/X86/X86InstrFragmentsSIMD.td:374
@@ +373,3 @@
+def loadf32_128 : PatFrag<(ops node:$ptr),
+ (bitconvert (v4f32 (scalar_to_vector (loadf32 node:$ptr))))>;
+def loadf64_128 : PatFrag<(ops node:$ptr),
> spatel wrote:
> > qcolombet wrote:
> > > This is not valid, is it?
> > >
> > > When this matches we will read 128-bit from the memory, i.e., pass what we do for the load32. Aren’t we?
> > >
> > > Something correct would be load128 -> extract element.
> > > Though I do not think that happens a lot…
> > I don't understand; we only want to match a 32-bit load that has been extended to fit in the 128-bit register, right? Isn't that what loadf32 guarantees? Perhaps this should be a zero-extend rather than scalar_to_vector though?
> Well I may certainly misread the uses of loadf32_128, but does not this is used to fold the load in the related operation, thus we read 128-bit in memory, don't we?
I think I understand now: this is only working because it's difficult to match the more complicated pattern. If we somehow managed to match that pattern, then we'd trigger the same bug all over again.
So the fundamental problem is that we're creating instruction definitions that don't exist in x86 reality. I see 2 potential fixes:
(1) Pass a null / void memory operand pattern to the existing multiclass:
defm PS : sse12_fp_packed<opc, !strconcat(OpcodeStr, "ps"), OpNode, FR32,
f32, f128mem, [how to specify that nothing works here?] , SSEPackedSingle, itins>, PS;
(2) Write a new multiclass that just has a register-register variant; no reg-mem option is possible.
More information about the llvm-commits