[llvm] r241191 - [NVPTX] expand extload/truncstore for vectors of floats

Wed Jul 1 14:32:42 PDT 2015

Author: jingyue
Date: Wed Jul  1 16:32:42 2015
New Revision: 241191

URL: http://llvm.org/viewvc/llvm-project?rev=241191&view=rev
Log:
[NVPTX] expand extload/truncstore for vectors of floats

Summary:
According to PTX ISA:

For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type.

So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of.

As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like
ld.v2.f32 {%fd3, %fd4}, [%rd2]

This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated.

Patched by Gang Hu. 

Test Plan: Test case attached.

Reviewers: jingyue

Reviewed By: jingyue

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D10876

Added:
    llvm/trunk/test/CodeGen/NVPTX/extloadv.ll
Modified:
    llvm/trunk/lib/Target/NVPTX/NVPTXISelLowering.cpp

Modified: llvm/trunk/lib/Target/NVPTX/NVPTXISelLowering.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/NVPTX/NVPTXISelLowering.cpp?rev=241191&r1=241190&r2=241191&view=diff
==============================================================================

--- llvm/trunk/lib/Target/NVPTX/NVPTXISelLowering.cpp (original)
+++ llvm/trunk/lib/Target/NVPTX/NVPTXISelLowering.cpp Wed Jul  1 16:32:42 2015
@@ -206,7 +206,14 @@ NVPTXTargetLowering::NVPTXTargetLowering
   setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
   setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::f16, Expand);
   setLoadExtAction(ISD::EXTLOAD, MVT::f64, MVT::f32, Expand);
+  setLoadExtAction(ISD::EXTLOAD, MVT::v2f32, MVT::v2f16, Expand);
+  setLoadExtAction(ISD::EXTLOAD, MVT::v2f64, MVT::v2f16, Expand);
+  setLoadExtAction(ISD::EXTLOAD, MVT::v2f64, MVT::v2f32, Expand);
+  setLoadExtAction(ISD::EXTLOAD, MVT::v4f32, MVT::v4f16, Expand);
+  setLoadExtAction(ISD::EXTLOAD, MVT::v4f64, MVT::v4f16, Expand);
+  setLoadExtAction(ISD::EXTLOAD, MVT::v4f64, MVT::v4f32, Expand);
   // Turn FP truncstore into trunc + store.
+  // FIXME: vector types should also be expanded
   setTruncStoreAction(MVT::f32, MVT::f16, Expand);
   setTruncStoreAction(MVT::f64, MVT::f16, Expand);
   setTruncStoreAction(MVT::f64, MVT::f32, Expand);

Added: llvm/trunk/test/CodeGen/NVPTX/extloadv.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/NVPTX/extloadv.ll?rev=241191&view=auto
==============================================================================
--- llvm/trunk/test/CodeGen/NVPTX/extloadv.ll (added)
+++ llvm/trunk/test/CodeGen/NVPTX/extloadv.ll Wed Jul  1 16:32:42 2015
@@ -0,0 +1,15 @@
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_35 | FileCheck %s
+
+define void @foo(float* nocapture readonly %x_value, double* nocapture %output) #0 {
+  %1 = bitcast float* %x_value to <4 x float>*
+  %2 = load <4 x float>, <4 x float>* %1, align 16
+  %3 = fpext <4 x float> %2 to <4 x double>
+; CHECK-NOT: ld.v2.f32 {%fd{{[0-9]+}}, %fd{{[0-9]+}}}, [%rd{{[0-9]+}}];
+; CHECK:  cvt.f64.f32
+; CHECK:  cvt.f64.f32
+; CHECK:  cvt.f64.f32
+; CHECK:  cvt.f64.f32
+  %4 = bitcast double* %output to <4 x double>*
+  store <4 x double> %3, <4 x double>* %4
+  ret void
+}