[PATCH] D26790: [X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load from being folded multiple times

Thu Nov 17 12:17:07 PST 2016

craig.topper added a comment.

The min/max intrinsics patterns in tablegen try to use sse_load_f32 and sse_load_f64 which use this function to look for a load that is zero extended from f32/f64 to v4f32/v2f64 or a scalar_to_vector from a f32/f64 to v4f32/v2f64. The intrinsics themselves takes a v4f32/v2f64. I ultimately I want to extend this function to also allow a regular v4f32/v2f64 load as well. Currently those cases are folded later using the folding tables, but isel should have been able to get it right without the peephole.

Another possible fix is to lower the instrinsics to a scalar max SDNode with inserts and extracts around it like this   (insert_vector_elt src1 (X86max (extract_vector_elt src1, 0), (extract_vector_elt src2, 0)), 0)    Then pattern match it back to the min/max intrinsic instructions. This would be equivalent to how clang emits the FADD/FSUB/FMUL/FDIV intrinsics.  We would need to do this for every pattern that currently uses sse_load_f32/f64. This would probably also fix PR31032 so maybe its worth doing?

I was also planning to fix AVX-512 to use sse_load_f32/f64 for all the instructions that are equivalent to SSE/AVX instructions that are already using it.

https://reviews.llvm.org/D26790