[llvm] r211939 - [NVPTX] Fix handling of ldg/ldu intrinsics.

Jingyue Wu jingyue at google.com
Wed Aug 27 19:21:18 PDT 2014


Hi Justin,

I sent out a patch (http://reviews.llvm.org/D5093) that makes the alignment
an argument to ldu/ldg intrinsics. PTAL.

Thanks,
Jingyue

On Mon Aug 25 2014 at 9:30:57 AM Jingyue Wu <jingyue at google.com> wrote:

> I agree. That would cut off the issue of metadata being removed by
> optimizations.
>
> Does the current NVPTX backend use or check the alignment at all? I tried,
> for example, using llvm.nvvm.ldg.global.f.v4f32.p0v4f32 with alignment 4,
> and the emitted PTX still uses ld.global.nc.v4.f32 even though the
> alignment is not 4 * sizeof(float).
>
> Jingyue
>
> On Fri Aug 22 2014 at 8:54:47 AM Justin Holewinski <jholewinski at nvidia.com>
> wrote:
>
>>  Hi Jingyue,
>>
>>
>>  I agree that this is fragile and needs to be reworked.  "Officially",
>> the ldu/ldg intrinsics are not meant to be used by front-ends (they do not
>> exist in the NVVM IR spec).  In libnvvm, they are generated automatically
>> through IR transformations just before PTX code generation (and after all
>> of the 'opt'-level transformations).  However, for upstream NVPTX, it makes
>> a lot of sense to expose these intrinsics so users can generate them,
>> especially since the ldg/ldu transforamtion passes are not upstream.  I
>> believe the proper way to handle these would be to make the alignment an
>> explicit argument, similar to llvm.memset/llvm.memcpy/llvm.memmove.  Do
>> you agree?
>>
>>
>>
>>  ------------------------------
>> *From:* Jingyue Wu <jingyue at google.com>
>> *Sent:* Thursday, August 21, 2014 6:59 PM
>> *To:* Justin Holewinski
>> *Cc:* llvm-commits at cs.uiuc.edu; Eli Bendersky; Mark Heffernan; Cong Hou
>> *Subject:* Re: [llvm] r211939 - [NVPTX] Fix handling of ldg/ldu
>> intrinsics.
>>
>>  Hi Justin,
>>
>> The assertion on NVPTXISelLowering.cpp:3305
>>
>>  assert(I.hasMetadataOtherThanDebugLoc() && "Must have alignment
>> metadata");
>>
>>  looks a little fragile. !align is extended metadata and the
>> optimization passes (e.g., SimplifyCFG) don't guarantee to preserve it.
>>
>>  I attached a reduced test case (hoist.ll) to demonstrate this issue. If
>> you run "opt -simplifycfg hoist.ll -S", you will see the !align metadata is
>> removed by HoistThenElseCodeToIf. Further running llc on the simplified
>> bitcode will hit the assertion error.
>>
>>  One way to fix this issue is of course having HositThenElseCodeToIf to
>> preserve the align metadata. But before I do that, I am curious whether we
>> should assume the align metadata always exists at first hand. Can you shed
>> some light on this, Justin?
>>
>>  Thanks much,
>> Jingyue
>>
>> On Fri Jun 27 2014 at 12:07:55 PM Justin Holewinski <
>> jholewinski at nvidia.com> wrote:
>>
>>> Author: jholewinski
>>> Date: Fri Jun 27 13:35:51 2014
>>> New Revision: 211939
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=211939&view=rev
>>> Log:
>>> [NVPTX] Fix handling of ldg/ldu intrinsics.
>>>
>>> The address space of the pointer must be global (1) for these
>>> intrinsics.  There must also be alignment metadata attached to the
>>> intrinsic calls, e.g.
>>>
>>> %val = tail call i32 @llvm.nvvm.ldu.i.global.i32.p1i32(i32
>>> addrspace(1)* %ptr), !align !0
>>>
>>> !0 = metadata !{i32 4}
>>>
>>> Added:
>>>     llvm/trunk/test/CodeGen/NVPTX/ldu-ldg.ll
>>> Modified:
>>>     llvm/trunk/include/llvm/IR/IntrinsicsNVVM.td
>>>     llvm/trunk/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
>>>     llvm/trunk/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
>>>     llvm/trunk/lib/Target/NVPTX/NVPTXISelLowering.cpp
>>>     llvm/trunk/lib/Target/NVPTX/NVPTXIntrinsics.td
>>>     llvm/trunk/test/CodeGen/NVPTX/ldu-i8.ll
>>>     llvm/trunk/test/CodeGen/NVPTX/ldu-reg-plus-offset.ll
>>>
>>> Modified: llvm/trunk/include/llvm/IR/IntrinsicsNVVM.td
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/
>>> IR/IntrinsicsNVVM.td?rev=211939&r1=211938&r2=211939&view=diff
>>> ============================================================
>>> ==================
>>> --- llvm/trunk/include/llvm/IR/IntrinsicsNVVM.td (original)
>>> +++ llvm/trunk/include/llvm/IR/IntrinsicsNVVM.td Fri Jun 27 13:35:51
>>> 2014
>>> @@ -796,26 +796,25 @@ def llvm_anyi64ptr_ty     : LLVMAnyPoint
>>>
>>>
>>>  // Generated within nvvm. Use for ldu on sm_20 or later
>>> -// @TODO: Revisit this, Changed LLVMAnyPointerType to LLVMPointerType
>>>  def int_nvvm_ldu_global_i : Intrinsic<[llvm_anyint_ty],
>>> -  [LLVMPointerType<LLVMMatchType<0>>], [IntrReadMem, NoCapture<0>],
>>> +  [LLVMAnyPointerType<LLVMMatchType<0>>], [IntrReadMem, NoCapture<0>],
>>>    "llvm.nvvm.ldu.global.i">;
>>>  def int_nvvm_ldu_global_f : Intrinsic<[llvm_anyfloat_ty],
>>> -  [LLVMPointerType<LLVMMatchType<0>>], [IntrReadMem, NoCapture<0>],
>>> +  [LLVMAnyPointerType<LLVMMatchType<0>>], [IntrReadMem, NoCapture<0>],
>>>    "llvm.nvvm.ldu.global.f">;
>>>  def int_nvvm_ldu_global_p : Intrinsic<[llvm_anyptr_ty],
>>> -  [LLVMPointerType<LLVMMatchType<0>>], [IntrReadMem, NoCapture<0>],
>>> +  [LLVMAnyPointerType<LLVMMatchType<0>>], [IntrReadMem, NoCapture<0>],
>>>    "llvm.nvvm.ldu.global.p">;
>>>
>>>  // Generated within nvvm. Use for ldg on sm_35 or later
>>>  def int_nvvm_ldg_global_i : Intrinsic<[llvm_anyint_ty],
>>> -  [LLVMPointerType<LLVMMatchType<0>>], [IntrReadMem, NoCapture<0>],
>>> +  [LLVMAnyPointerType<LLVMMatchType<0>>], [IntrReadMem, NoCapture<0>],
>>>    "llvm.nvvm.ldg.global.i">;
>>>  def int_nvvm_ldg_global_f : Intrinsic<[llvm_anyfloat_ty],
>>> -  [LLVMPointerType<LLVMMatchType<0>>], [IntrReadMem, NoCapture<0>],
>>> +  [LLVMAnyPointerType<LLVMMatchType<0>>], [IntrReadMem, NoCapture<0>],
>>>    "llvm.nvvm.ldg.global.f">;
>>>  def int_nvvm_ldg_global_p : Intrinsic<[llvm_anyptr_ty],
>>> -  [LLVMPointerType<LLVMMatchType<0>>], [IntrReadMem, NoCapture<0>],
>>> +  [LLVMAnyPointerType<LLVMMatchType<0>>], [IntrReadMem, NoCapture<0>],
>>>    "llvm.nvvm.ldg.global.p">;
>>>
>>>  // Use for generic pointers
>>>
>>> Modified: llvm/trunk/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/NV
>>> PTX/NVPTXISelDAGToDAG.cpp?rev=211939&r1=211938&r2=211939&view=diff
>>> ============================================================
>>> ==================
>>> --- llvm/trunk/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp (original)
>>> +++ llvm/trunk/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp Fri Jun 27
>>> 13:35:51 2014
>>> @@ -141,7 +141,7 @@ SDNode *NVPTXDAGToDAGISel::Select(SDNode
>>>    case NVPTXISD::LDGV4:
>>>    case NVPTXISD::LDUV2:
>>>    case NVPTXISD::LDUV4:
>>> -    ResNode = SelectLDGLDUVector(N);
>>> +    ResNode = SelectLDGLDU(N);
>>>      break;
>>>    case NVPTXISD::StoreV2:
>>>    case NVPTXISD::StoreV4:
>>> @@ -167,6 +167,9 @@ SDNode *NVPTXDAGToDAGISel::Select(SDNode
>>>    case ISD::INTRINSIC_WO_CHAIN:
>>>      ResNode = SelectIntrinsicNoChain(N);
>>>      break;
>>> +  case ISD::INTRINSIC_W_CHAIN:
>>> +    ResNode = SelectIntrinsicChain(N);
>>> +    break;
>>>    case NVPTXISD::Tex1DFloatI32:
>>>    case NVPTXISD::Tex1DFloatFloat:
>>>    case NVPTXISD::Tex1DFloatFloatLevel:
>>> @@ -273,6 +276,21 @@ SDNode *NVPTXDAGToDAGISel::Select(SDNode
>>>    return SelectCode(N);
>>>  }
>>>
>>> +SDNode *NVPTXDAGToDAGISel::SelectIntrinsicChain(SDNode *N) {
>>> +  unsigned IID = cast<ConstantSDNode>(N->getOperand(1))->getZExtValue()
>>> ;
>>> +  switch (IID) {
>>> +  default:
>>> +    return NULL;
>>> +  case Intrinsic::nvvm_ldg_global_f:
>>> +  case Intrinsic::nvvm_ldg_global_i:
>>> +  case Intrinsic::nvvm_ldg_global_p:
>>> +  case Intrinsic::nvvm_ldu_global_f:
>>> +  case Intrinsic::nvvm_ldu_global_i:
>>> +  case Intrinsic::nvvm_ldu_global_p:
>>> +    return SelectLDGLDU(N);
>>> +  }
>>> +}
>>> +
>>>  static unsigned int getCodeAddrSpace(MemSDNode *N,
>>>                                       const NVPTXSubtarget &Subtarget) {
>>>    const Value *Src = N->getMemOperand()->getValue();
>>> @@ -990,22 +1008,101 @@ SDNode *NVPTXDAGToDAGISel::SelectLoadVec
>>>    return LD;
>>>  }
>>>
>>> -SDNode *NVPTXDAGToDAGISel::SelectLDGLDUVector(SDNode *N) {
>>> +SDNode *NVPTXDAGToDAGISel::SelectLDGLDU(SDNode *N) {
>>>
>>>    SDValue Chain = N->getOperand(0);
>>> -  SDValue Op1 = N->getOperand(1);
>>> +  SDValue Op1;
>>> +  MemSDNode *Mem;
>>> +  bool IsLDG = true;
>>> +
>>> +  // If this is an LDG intrinsic, the address is the third operand. Its
>>> its an
>>> +  // LDG/LDU SD node (from custom vector handling), then its the second
>>> operand
>>> +  if (N->getOpcode() == ISD::INTRINSIC_W_CHAIN) {
>>> +    Op1 = N->getOperand(2);
>>> +    Mem = cast<MemIntrinsicSDNode>(N);
>>> +    unsigned IID = cast<ConstantSDNode>(N->getOpe
>>> rand(1))->getZExtValue();
>>> +    switch (IID) {
>>> +    default:
>>> +      return NULL;
>>> +    case Intrinsic::nvvm_ldg_global_f:
>>> +    case Intrinsic::nvvm_ldg_global_i:
>>> +    case Intrinsic::nvvm_ldg_global_p:
>>> +      IsLDG = true;
>>> +      break;
>>> +    case Intrinsic::nvvm_ldu_global_f:
>>> +    case Intrinsic::nvvm_ldu_global_i:
>>> +    case Intrinsic::nvvm_ldu_global_p:
>>> +      IsLDG = false;
>>> +      break;
>>> +    }
>>> +  } else {
>>> +    Op1 = N->getOperand(1);
>>> +    Mem = cast<MemSDNode>(N);
>>> +  }
>>> +
>>>    unsigned Opcode;
>>>    SDLoc DL(N);
>>>    SDNode *LD;
>>> -  MemSDNode *Mem = cast<MemSDNode>(N);
>>>    SDValue Base, Offset, Addr;
>>>
>>> -  EVT EltVT = Mem->getMemoryVT().getVectorElementType();
>>> +  EVT EltVT = Mem->getMemoryVT();
>>> +  if (EltVT.isVector()) {
>>> +    EltVT = EltVT.getVectorElementType();
>>> +  }
>>>
>>>    if (SelectDirectAddr(Op1, Addr)) {
>>>      switch (N->getOpcode()) {
>>>      default:
>>>        return nullptr;
>>> +    case ISD::INTRINSIC_W_CHAIN:
>>> +      if (IsLDG) {
>>> +        switch (EltVT.getSimpleVT().SimpleTy) {
>>> +        default:
>>> +          return nullptr;
>>> +        case MVT::i8:
>>> +          Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i8avar;
>>> +          break;
>>> +        case MVT::i16:
>>> +          Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i16avar;
>>> +          break;
>>> +        case MVT::i32:
>>> +          Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i32avar;
>>> +          break;
>>> +        case MVT::i64:
>>> +          Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i64avar;
>>> +          break;
>>> +        case MVT::f32:
>>> +          Opcode = NVPTX::INT_PTX_LDG_GLOBAL_f32avar;
>>> +          break;
>>> +        case MVT::f64:
>>> +          Opcode = NVPTX::INT_PTX_LDG_GLOBAL_f64avar;
>>> +          break;
>>> +        }
>>> +      } else {
>>> +        switch (EltVT.getSimpleVT().SimpleTy) {
>>> +        default:
>>> +          return nullptr;
>>> +        case MVT::i8:
>>> +          Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i8avar;
>>> +          break;
>>> +        case MVT::i16:
>>> +          Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i16avar;
>>> +          break;
>>> +        case MVT::i32:
>>> +          Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i32avar;
>>> +          break;
>>> +        case MVT::i64:
>>> +          Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i64avar;
>>> +          break;
>>> +        case MVT::f32:
>>> +          Opcode = NVPTX::INT_PTX_LDU_GLOBAL_f32avar;
>>> +          break;
>>> +        case MVT::f64:
>>> +          Opcode = NVPTX::INT_PTX_LDU_GLOBAL_f64avar;
>>> +          break;
>>> +        }
>>> +      }
>>> +      break;
>>>      case NVPTXISD::LDGV2:
>>>        switch (EltVT.getSimpleVT().SimpleTy) {
>>>        default:
>>> @@ -1101,6 +1198,55 @@ SDNode *NVPTXDAGToDAGISel::SelectLDGLDUV
>>>        switch (N->getOpcode()) {
>>>        default:
>>>          return nullptr;
>>> +      case ISD::INTRINSIC_W_CHAIN:
>>> +        if (IsLDG) {
>>> +          switch (EltVT.getSimpleVT().SimpleTy) {
>>> +          default:
>>> +            return nullptr;
>>> +          case MVT::i8:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i8ari64;
>>> +            break;
>>> +          case MVT::i16:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i16ari64;
>>> +            break;
>>> +          case MVT::i32:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i32ari64;
>>> +            break;
>>> +          case MVT::i64:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i64ari64;
>>> +            break;
>>> +          case MVT::f32:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_f32ari64;
>>> +            break;
>>> +          case MVT::f64:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_f64ari64;
>>> +            break;
>>> +          }
>>> +        } else {
>>> +          switch (EltVT.getSimpleVT().SimpleTy) {
>>> +          default:
>>> +            return nullptr;
>>> +          case MVT::i8:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i8ari64;
>>> +            break;
>>> +          case MVT::i16:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i16ari64;
>>> +            break;
>>> +          case MVT::i32:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i32ari64;
>>> +            break;
>>> +          case MVT::i64:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i64ari64;
>>> +            break;
>>> +          case MVT::f32:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_f32ari64;
>>> +            break;
>>> +          case MVT::f64:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_f64ari64;
>>> +            break;
>>> +          }
>>> +        }
>>> +        break;
>>>        case NVPTXISD::LDGV2:
>>>          switch (EltVT.getSimpleVT().SimpleTy) {
>>>          default:
>>> @@ -1190,6 +1336,55 @@ SDNode *NVPTXDAGToDAGISel::SelectLDGLDUV
>>>        switch (N->getOpcode()) {
>>>        default:
>>>          return nullptr;
>>> +      case ISD::INTRINSIC_W_CHAIN:
>>> +        if (IsLDG) {
>>> +          switch (EltVT.getSimpleVT().SimpleTy) {
>>> +          default:
>>> +            return nullptr;
>>> +          case MVT::i8:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i8ari;
>>> +            break;
>>> +          case MVT::i16:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i16ari;
>>> +            break;
>>> +          case MVT::i32:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i32ari;
>>> +            break;
>>> +          case MVT::i64:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i64ari;
>>> +            break;
>>> +          case MVT::f32:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_f32ari;
>>> +            break;
>>> +          case MVT::f64:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_f64ari;
>>> +            break;
>>> +          }
>>> +        } else {
>>> +          switch (EltVT.getSimpleVT().SimpleTy) {
>>> +          default:
>>> +            return nullptr;
>>> +          case MVT::i8:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i8ari;
>>> +            break;
>>> +          case MVT::i16:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i16ari;
>>> +            break;
>>> +          case MVT::i32:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i32ari;
>>> +            break;
>>> +          case MVT::i64:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i64ari;
>>> +            break;
>>> +          case MVT::f32:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_f32ari;
>>> +            break;
>>> +          case MVT::f64:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_f64ari;
>>> +            break;
>>> +          }
>>> +        }
>>> +        break;
>>>        case NVPTXISD::LDGV2:
>>>          switch (EltVT.getSimpleVT().SimpleTy) {
>>>          default:
>>> @@ -1285,6 +1480,55 @@ SDNode *NVPTXDAGToDAGISel::SelectLDGLDUV
>>>        switch (N->getOpcode()) {
>>>        default:
>>>          return nullptr;
>>> +      case ISD::INTRINSIC_W_CHAIN:
>>> +        if (IsLDG) {
>>> +          switch (EltVT.getSimpleVT().SimpleTy) {
>>> +          default:
>>> +            return nullptr;
>>> +          case MVT::i8:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i8areg64;
>>> +            break;
>>> +          case MVT::i16:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i16areg64;
>>> +            break;
>>> +          case MVT::i32:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i32areg64;
>>> +            break;
>>> +          case MVT::i64:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i64areg64;
>>> +            break;
>>> +          case MVT::f32:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_f32areg64;
>>> +            break;
>>> +          case MVT::f64:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_f64areg64;
>>> +            break;
>>> +          }
>>> +        } else {
>>> +          switch (EltVT.getSimpleVT().SimpleTy) {
>>> +          default:
>>> +            return nullptr;
>>> +          case MVT::i8:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i8areg64;
>>> +            break;
>>> +          case MVT::i16:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i16areg64;
>>> +            break;
>>> +          case MVT::i32:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i32areg64;
>>> +            break;
>>> +          case MVT::i64:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i64areg64;
>>> +            break;
>>> +          case MVT::f32:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_f32areg64;
>>> +            break;
>>> +          case MVT::f64:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_f64areg64;
>>> +            break;
>>> +          }
>>> +        }
>>> +        break;
>>>        case NVPTXISD::LDGV2:
>>>          switch (EltVT.getSimpleVT().SimpleTy) {
>>>          default:
>>> @@ -1374,6 +1618,55 @@ SDNode *NVPTXDAGToDAGISel::SelectLDGLDUV
>>>        switch (N->getOpcode()) {
>>>        default:
>>>          return nullptr;
>>> +      case ISD::INTRINSIC_W_CHAIN:
>>> +        if (IsLDG) {
>>> +          switch (EltVT.getSimpleVT().SimpleTy) {
>>> +          default:
>>> +            return nullptr;
>>> +          case MVT::i8:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i8areg;
>>> +            break;
>>> +          case MVT::i16:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i16areg;
>>> +            break;
>>> +          case MVT::i32:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i32areg;
>>> +            break;
>>> +          case MVT::i64:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_i64areg;
>>> +            break;
>>> +          case MVT::f32:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_f32areg;
>>> +            break;
>>> +          case MVT::f64:
>>> +            Opcode = NVPTX::INT_PTX_LDG_GLOBAL_f64areg;
>>> +            break;
>>> +          }
>>> +        } else {
>>> +          switch (EltVT.getSimpleVT().SimpleTy) {
>>> +          default:
>>> +            return nullptr;
>>> +          case MVT::i8:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i8areg;
>>> +            break;
>>> +          case MVT::i16:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i16areg;
>>> +            break;
>>> +          case MVT::i32:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i32areg;
>>> +            break;
>>> +          case MVT::i64:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_i64areg;
>>> +            break;
>>> +          case MVT::f32:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_f32areg;
>>> +            break;
>>> +          case MVT::f64:
>>> +            Opcode = NVPTX::INT_PTX_LDU_GLOBAL_f64areg;
>>> +            break;
>>> +          }
>>> +        }
>>> +        break;
>>>        case NVPTXISD::LDGV2:
>>>          switch (EltVT.getSimpleVT().SimpleTy) {
>>>          default:
>>> @@ -1466,7 +1759,7 @@ SDNode *NVPTXDAGToDAGISel::SelectLDGLDUV
>>>    }
>>>
>>>    MachineSDNode::mmo_iterator MemRefs0 = MF->allocateMemRefsArray(1);
>>> -  MemRefs0[0] = cast<MemSDNode>(N)->getMemOperand();
>>> +  MemRefs0[0] = Mem->getMemOperand();
>>>    cast<MachineSDNode>(LD)->setMemRefs(MemRefs0, MemRefs0 + 1);
>>>
>>>    return LD;
>>>
>>> Modified: llvm/trunk/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/NV
>>> PTX/NVPTXISelDAGToDAG.h?rev=211939&r1=211938&r2=211939&view=diff
>>> ============================================================
>>> ==================
>>> --- llvm/trunk/lib/Target/NVPTX/NVPTXISelDAGToDAG.h (original)
>>> +++ llvm/trunk/lib/Target/NVPTX/NVPTXISelDAGToDAG.h Fri Jun 27 13:35:51
>>> 2014
>>> @@ -59,10 +59,11 @@ private:
>>>
>>>    SDNode *Select(SDNode *N) override;
>>>    SDNode *SelectIntrinsicNoChain(SDNode *N);
>>> +  SDNode *SelectIntrinsicChain(SDNode *N);
>>>    SDNode *SelectTexSurfHandle(SDNode *N);
>>>    SDNode *SelectLoad(SDNode *N);
>>>    SDNode *SelectLoadVector(SDNode *N);
>>> -  SDNode *SelectLDGLDUVector(SDNode *N);
>>> +  SDNode *SelectLDGLDU(SDNode *N);
>>>    SDNode *SelectStore(SDNode *N);
>>>    SDNode *SelectStoreVector(SDNode *N);
>>>    SDNode *SelectLoadParam(SDNode *N);
>>>
>>> Modified: llvm/trunk/lib/Target/NVPTX/NVPTXISelLowering.cpp
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/NV
>>> PTX/NVPTXISelLowering.cpp?rev=211939&r1=211938&r2=211939&view=diff
>>> ============================================================
>>> ==================
>>> --- llvm/trunk/lib/Target/NVPTX/NVPTXISelLowering.cpp (original)
>>> +++ llvm/trunk/lib/Target/NVPTX/NVPTXISelLowering.cpp Fri Jun 27
>>> 13:35:51 2014
>>> @@ -2363,22 +2363,62 @@ bool NVPTXTargetLowering::getTgtMemIntri
>>>
>>>    case Intrinsic::nvvm_ldu_global_i:
>>>    case Intrinsic::nvvm_ldu_global_f:
>>> -  case Intrinsic::nvvm_ldu_global_p:
>>> +  case Intrinsic::nvvm_ldu_global_p: {
>>>
>>>      Info.opc = ISD::INTRINSIC_W_CHAIN;
>>>      if (Intrinsic == Intrinsic::nvvm_ldu_global_i)
>>>        Info.memVT = getValueType(I.getType());
>>> -    else if (Intrinsic == Intrinsic::nvvm_ldu_global_p)
>>> +    else if(Intrinsic == Intrinsic::nvvm_ldu_global_p)
>>> +      Info.memVT = getPointerTy();
>>> +    else
>>>        Info.memVT = getValueType(I.getType());
>>> +    Info.ptrVal = I.getArgOperand(0);
>>> +    Info.offset = 0;
>>> +    Info.vol = 0;
>>> +    Info.readMem = true;
>>> +    Info.writeMem = false;
>>> +
>>> +    // alignment is available as metadata.
>>> +    // Grab it and set the alignment.
>>> +    assert(I.hasMetadataOtherThanDebugLoc() && "Must have alignment
>>> metadata");
>>> +    MDNode *AlignMD = I.getMetadata("align");
>>> +    assert(AlignMD && "Must have a non-null MDNode");
>>> +    assert(AlignMD->getNumOperands() == 1 && "Must have a single
>>> operand");
>>> +    Value *Align = AlignMD->getOperand(0);
>>> +    int64_t Alignment = cast<ConstantInt>(Align)->getZExtValue();
>>> +    Info.align = Alignment;
>>> +
>>> +    return true;
>>> +  }
>>> +  case Intrinsic::nvvm_ldg_global_i:
>>> +  case Intrinsic::nvvm_ldg_global_f:
>>> +  case Intrinsic::nvvm_ldg_global_p: {
>>> +
>>> +    Info.opc = ISD::INTRINSIC_W_CHAIN;
>>> +    if (Intrinsic == Intrinsic::nvvm_ldg_global_i)
>>> +      Info.memVT = getValueType(I.getType());
>>> +    else if(Intrinsic == Intrinsic::nvvm_ldg_global_p)
>>> +      Info.memVT = getPointerTy();
>>>      else
>>> -      Info.memVT = MVT::f32;
>>> +      Info.memVT = getValueType(I.getType());
>>>      Info.ptrVal = I.getArgOperand(0);
>>>      Info.offset = 0;
>>>      Info.vol = 0;
>>>      Info.readMem = true;
>>>      Info.writeMem = false;
>>> -    Info.align = 0;
>>> +
>>> +    // alignment is available as metadata.
>>> +    // Grab it and set the alignment.
>>> +    assert(I.hasMetadataOtherThanDebugLoc() && "Must have alignment
>>> metadata");
>>> +    MDNode *AlignMD = I.getMetadata("align");
>>> +    assert(AlignMD && "Must have a non-null MDNode");
>>> +    assert(AlignMD->getNumOperands() == 1 && "Must have a single
>>> operand");
>>> +    Value *Align = AlignMD->getOperand(0);
>>> +    int64_t Alignment = cast<ConstantInt>(Align)->getZExtValue();
>>> +    Info.align = Alignment;
>>> +
>>>      return true;
>>> +  }
>>>
>>>    case Intrinsic::nvvm_tex_1d_v4f32_i32:
>>>    case Intrinsic::nvvm_tex_1d_v4f32_f32:
>>>
>>> Modified: llvm/trunk/lib/Target/NVPTX/NVPTXIntrinsics.td
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/NV
>>> PTX/NVPTXIntrinsics.td?rev=211939&r1=211938&r2=211939&view=diff
>>> ============================================================
>>> ==================
>>> --- llvm/trunk/lib/Target/NVPTX/NVPTXIntrinsics.td (original)
>>> +++ llvm/trunk/lib/Target/NVPTX/NVPTXIntrinsics.td Fri Jun 27 13:35:51
>>> 2014
>>> @@ -1374,67 +1374,33 @@ def INT_PTX_SREG_WARPSIZE : F_SREG<"mov.
>>>  // Support for ldu on sm_20 or later
>>>  //-----------------------------------
>>>
>>> -def ldu_i8 : PatFrag<(ops node:$ptr), (int_nvvm_ldu_global_i
>>> node:$ptr), [{
>>> -  MemIntrinsicSDNode *M = cast<MemIntrinsicSDNode>(N);
>>> -  return M->getMemoryVT() == MVT::i8;
>>> -}]>;
>>> -
>>>  // Scalar
>>> -// @TODO: Revisit this, Changed imemAny to imem
>>> -multiclass LDU_G<string TyStr, NVPTXRegClass regclass, Intrinsic IntOp>
>>> {
>>> -  def areg: NVPTXInst<(outs regclass:$result), (ins Int32Regs:$src),
>>> -               !strconcat("ldu.global.", TyStr),
>>> -         [(set regclass:$result, (IntOp Int32Regs:$src))]>,
>>> Requires<[hasLDU]>;
>>> -  def areg64: NVPTXInst<(outs regclass:$result), (ins Int64Regs:$src),
>>> -               !strconcat("ldu.global.", TyStr),
>>> -         [(set regclass:$result, (IntOp Int64Regs:$src))]>,
>>> Requires<[hasLDU]>;
>>> - def avar:  NVPTXInst<(outs regclass:$result), (ins imem:$src),
>>> -               !strconcat("ldu.global.", TyStr),
>>> -                [(set regclass:$result, (IntOp (Wrapper
>>> tglobaladdr:$src)))]>,
>>> -                Requires<[hasLDU]>;
>>> - def ari :  NVPTXInst<(outs regclass:$result), (ins MEMri:$src),
>>> -               !strconcat("ldu.global.", TyStr),
>>> -         [(set regclass:$result, (IntOp ADDRri:$src))]>,
>>> Requires<[hasLDU]>;
>>> - def ari64 :  NVPTXInst<(outs regclass:$result), (ins MEMri64:$src),
>>> -               !strconcat("ldu.global.", TyStr),
>>> -         [(set regclass:$result, (IntOp ADDRri64:$src))]>,
>>> Requires<[hasLDU]>;
>>> -}
>>> -
>>> -multiclass LDU_G_NOINTRIN<string TyStr, NVPTXRegClass regclass, PatFrag
>>> IntOp> {
>>> +multiclass LDU_G<string TyStr, NVPTXRegClass regclass> {
>>>    def areg: NVPTXInst<(outs regclass:$result), (ins Int32Regs:$src),
>>>                 !strconcat("ldu.global.", TyStr),
>>> -         [(set regclass:$result, (IntOp Int32Regs:$src))]>,
>>> Requires<[hasLDU]>;
>>> +                      []>, Requires<[hasLDU]>;
>>>    def areg64: NVPTXInst<(outs regclass:$result), (ins Int64Regs:$src),
>>>                 !strconcat("ldu.global.", TyStr),
>>> -         [(set regclass:$result, (IntOp Int64Regs:$src))]>,
>>> Requires<[hasLDU]>;
>>> - def avar:  NVPTXInst<(outs regclass:$result), (ins imem:$src),
>>> +                        []>, Requires<[hasLDU]>;
>>> + def avar:  NVPTXInst<(outs regclass:$result), (ins imemAny:$src),
>>>                 !strconcat("ldu.global.", TyStr),
>>> -         [(set regclass:$result, (IntOp (Wrapper tglobaladdr:$src)))]>,
>>> -         Requires<[hasLDU]>;
>>> +                      []>, Requires<[hasLDU]>;
>>>   def ari :  NVPTXInst<(outs regclass:$result), (ins MEMri:$src),
>>>                 !strconcat("ldu.global.", TyStr),
>>> -         [(set regclass:$result, (IntOp ADDRri:$src))]>,
>>> Requires<[hasLDU]>;
>>> +                      []>, Requires<[hasLDU]>;
>>>   def ari64 :  NVPTXInst<(outs regclass:$result), (ins MEMri64:$src),
>>>                 !strconcat("ldu.global.", TyStr),
>>> -         [(set regclass:$result, (IntOp ADDRri64:$src))]>,
>>> Requires<[hasLDU]>;
>>> +                        []>, Requires<[hasLDU]>;
>>>  }
>>>
>>> -defm INT_PTX_LDU_GLOBAL_i8  : LDU_G_NOINTRIN<"u8 \t$result, [$src];",
>>> Int16Regs,
>>> -                                             ldu_i8>;
>>> -defm INT_PTX_LDU_GLOBAL_i16 : LDU_G<"u16 \t$result, [$src];", Int16Regs,
>>> -int_nvvm_ldu_global_i>;
>>> -defm INT_PTX_LDU_GLOBAL_i32 : LDU_G<"u32 \t$result, [$src];", Int32Regs,
>>> -int_nvvm_ldu_global_i>;
>>> -defm INT_PTX_LDU_GLOBAL_i64 : LDU_G<"u64 \t$result, [$src];", Int64Regs,
>>> -int_nvvm_ldu_global_i>;
>>> -defm INT_PTX_LDU_GLOBAL_f32 : LDU_G<"f32 \t$result, [$src];",
>>> Float32Regs,
>>> -int_nvvm_ldu_global_f>;
>>> -defm INT_PTX_LDU_GLOBAL_f64 : LDU_G<"f64 \t$result, [$src];",
>>> Float64Regs,
>>> -int_nvvm_ldu_global_f>;
>>> -defm INT_PTX_LDU_GLOBAL_p32 : LDU_G<"u32 \t$result, [$src];", Int32Regs,
>>> -int_nvvm_ldu_global_p>;
>>> -defm INT_PTX_LDU_GLOBAL_p64 : LDU_G<"u64 \t$result, [$src];", Int64Regs,
>>> -int_nvvm_ldu_global_p>;
>>> +defm INT_PTX_LDU_GLOBAL_i8  : LDU_G<"u8 \t$result, [$src];", Int16Regs>;
>>> +defm INT_PTX_LDU_GLOBAL_i16 : LDU_G<"u16 \t$result, [$src];",
>>> Int16Regs>;
>>> +defm INT_PTX_LDU_GLOBAL_i32 : LDU_G<"u32 \t$result, [$src];",
>>> Int32Regs>;
>>> +defm INT_PTX_LDU_GLOBAL_i64 : LDU_G<"u64 \t$result, [$src];",
>>> Int64Regs>;
>>> +defm INT_PTX_LDU_GLOBAL_f32 : LDU_G<"f32 \t$result, [$src];",
>>> Float32Regs>;
>>> +defm INT_PTX_LDU_GLOBAL_f64 : LDU_G<"f64 \t$result, [$src];",
>>> Float64Regs>;
>>> +defm INT_PTX_LDU_GLOBAL_p32 : LDU_G<"u32 \t$result, [$src];",
>>> Int32Regs>;
>>> +defm INT_PTX_LDU_GLOBAL_p64 : LDU_G<"u64 \t$result, [$src];",
>>> Int64Regs>;
>>>
>>>  // vector
>>>
>>> @@ -1504,65 +1470,40 @@ defm INT_PTX_LDU_G_v4f32_ELE
>>>  // Support for ldg on sm_35 or later
>>>  //-----------------------------------
>>>
>>> -def ldg_i8 : PatFrag<(ops node:$ptr), (int_nvvm_ldg_global_i
>>> node:$ptr), [{
>>> -  MemIntrinsicSDNode *M = cast<MemIntrinsicSDNode>(N);
>>> -  return M->getMemoryVT() == MVT::i8;
>>> -}]>;
>>> -
>>> -multiclass LDG_G<string TyStr, NVPTXRegClass regclass, Intrinsic IntOp>
>>> {
>>> -  def areg: NVPTXInst<(outs regclass:$result), (ins Int32Regs:$src),
>>> -               !strconcat("ld.global.nc.", TyStr),
>>> -         [(set regclass:$result, (IntOp Int32Regs:$src))]>,
>>> Requires<[hasLDG]>;
>>> -  def areg64: NVPTXInst<(outs regclass:$result), (ins Int64Regs:$src),
>>> -               !strconcat("ld.global.nc.", TyStr),
>>> -         [(set regclass:$result, (IntOp Int64Regs:$src))]>,
>>> Requires<[hasLDG]>;
>>> - def avar:  NVPTXInst<(outs regclass:$result), (ins imem:$src),
>>> -               !strconcat("ld.global.nc.", TyStr),
>>> -         [(set regclass:$result, (IntOp (Wrapper tglobaladdr:$src)))]>,
>>> -         Requires<[hasLDG]>;
>>> - def ari :  NVPTXInst<(outs regclass:$result), (ins MEMri:$src),
>>> -               !strconcat("ld.global.nc.", TyStr),
>>> -         [(set regclass:$result, (IntOp ADDRri:$src))]>,
>>> Requires<[hasLDG]>;
>>> - def ari64 :  NVPTXInst<(outs regclass:$result), (ins MEMri64:$src),
>>> -               !strconcat("ld.global.nc.", TyStr),
>>> -         [(set regclass:$result, (IntOp ADDRri64:$src))]>,
>>> Requires<[hasLDG]>;
>>> -}
>>> -
>>> -multiclass LDG_G_NOINTRIN<string TyStr, NVPTXRegClass regclass, PatFrag
>>> IntOp> {
>>> +multiclass LDG_G<string TyStr, NVPTXRegClass regclass> {
>>>    def areg: NVPTXInst<(outs regclass:$result), (ins Int32Regs:$src),
>>>                 !strconcat("ld.global.nc.", TyStr),
>>> -         [(set regclass:$result, (IntOp Int32Regs:$src))]>,
>>> Requires<[hasLDG]>;
>>> +                      []>, Requires<[hasLDG]>;
>>>    def areg64: NVPTXInst<(outs regclass:$result), (ins Int64Regs:$src),
>>>                 !strconcat("ld.global.nc.", TyStr),
>>> -         [(set regclass:$result, (IntOp Int64Regs:$src))]>,
>>> Requires<[hasLDG]>;
>>> - def avar:  NVPTXInst<(outs regclass:$result), (ins imem:$src),
>>> +                        []>, Requires<[hasLDG]>;
>>> + def avar:  NVPTXInst<(outs regclass:$result), (ins imemAny:$src),
>>>                 !strconcat("ld.global.nc.", TyStr),
>>> -         [(set regclass:$result, (IntOp (Wrapper tglobaladdr:$src)))]>,
>>> -        Requires<[hasLDG]>;
>>> +                      []>, Requires<[hasLDG]>;
>>>   def ari :  NVPTXInst<(outs regclass:$result), (ins MEMri:$src),
>>>                 !strconcat("ld.global.nc.", TyStr),
>>> -         [(set regclass:$result, (IntOp ADDRri:$src))]>,
>>> Requires<[hasLDG]>;
>>> +                      []>, Requires<[hasLDG]>;
>>>   def ari64 :  NVPTXInst<(outs regclass:$result), (ins MEMri64:$src),
>>>                 !strconcat("ld.global.nc.", TyStr),
>>> -         [(set regclass:$result, (IntOp ADDRri64:$src))]>,
>>> Requires<[hasLDG]>;
>>> +                        []>, Requires<[hasLDG]>;
>>>  }
>>>
>>>  defm INT_PTX_LDG_GLOBAL_i8
>>> -  : LDG_G_NOINTRIN<"u8 \t$result, [$src];",  Int16Regs, ldg_i8>;
>>> +  : LDG_G<"u8 \t$result, [$src];", Int16Regs>;
>>>  defm INT_PTX_LDG_GLOBAL_i16
>>> -  : LDG_G<"u16 \t$result, [$src];", Int16Regs,   int_nvvm_ldg_global_i>;
>>> +  : LDG_G<"u16 \t$result, [$src];", Int16Regs>;
>>>  defm INT_PTX_LDG_GLOBAL_i32
>>> -  : LDG_G<"u32 \t$result, [$src];", Int32Regs,   int_nvvm_ldg_global_i>;
>>> +  : LDG_G<"u32 \t$result, [$src];", Int32Regs>;
>>>  defm INT_PTX_LDG_GLOBAL_i64
>>> -  : LDG_G<"u64 \t$result, [$src];", Int64Regs,   int_nvvm_ldg_global_i>;
>>> +  : LDG_G<"u64 \t$result, [$src];", Int64Regs>;
>>>  defm INT_PTX_LDG_GLOBAL_f32
>>> -  : LDG_G<"f32 \t$result, [$src];", Float32Regs, int_nvvm_ldg_global_f>;
>>> +  : LDG_G<"f32 \t$result, [$src];", Float32Regs>;
>>>  defm INT_PTX_LDG_GLOBAL_f64
>>> -  : LDG_G<"f64 \t$result, [$src];", Float64Regs, int_nvvm_ldg_global_f>;
>>> +  : LDG_G<"f64 \t$result, [$src];", Float64Regs>;
>>>  defm INT_PTX_LDG_GLOBAL_p32
>>> -  : LDG_G<"u32 \t$result, [$src];", Int32Regs,   int_nvvm_ldg_global_p>;
>>> +  : LDG_G<"u32 \t$result, [$src];", Int32Regs>;
>>>  defm INT_PTX_LDG_GLOBAL_p64
>>> -  : LDG_G<"u64 \t$result, [$src];", Int64Regs,   int_nvvm_ldg_global_p>;
>>> +  : LDG_G<"u64 \t$result, [$src];", Int64Regs>;
>>>
>>>  // vector
>>>
>>>
>>> Modified: llvm/trunk/test/CodeGen/NVPTX/ldu-i8.ll
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> NVPTX/ldu-i8.ll?rev=211939&r1=211938&r2=211939&view=diff
>>> ============================================================
>>> ==================
>>> --- llvm/trunk/test/CodeGen/NVPTX/ldu-i8.ll (original)
>>> +++ llvm/trunk/test/CodeGen/NVPTX/ldu-i8.ll Fri Jun 27 13:35:51 2014
>>> @@ -2,13 +2,15 @@
>>>
>>>  target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i1
>>> 6:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v3
>>> 2:32:32-v64:64:64-v128:128:128-n16:32:64"
>>>
>>> -declare i8 @llvm.nvvm.ldu.global.i.i8(i8*)
>>> +declare i8 @llvm.nvvm.ldu.global.i.i8.p0i8(i8*)
>>>
>>>  define i8 @foo(i8* %a) {
>>>  ; Ensure we properly truncate off the high-order 24 bits
>>>  ; CHECK:        ldu.global.u8
>>>  ; CHECK:        cvt.u32.u16
>>>  ; CHECK:        and.b32         %r{{[0-9]+}}, %r{{[0-9]+}}, 255
>>> -  %val = tail call i8 @llvm.nvvm.ldu.global.i.i8(i8* %a)
>>> +  %val = tail call i8 @llvm.nvvm.ldu.global.i.i8.p0i8(i8* %a), !align
>>> !0
>>>    ret i8 %val
>>>  }
>>> +
>>> +!0 = metadata !{i32 4}
>>>
>>> Added: llvm/trunk/test/CodeGen/NVPTX/ldu-ldg.ll
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> NVPTX/ldu-ldg.ll?rev=211939&view=auto
>>> ============================================================
>>> ==================
>>> --- llvm/trunk/test/CodeGen/NVPTX/ldu-ldg.ll (added)
>>> +++ llvm/trunk/test/CodeGen/NVPTX/ldu-ldg.ll Fri Jun 27 13:35:51 2014
>>> @@ -0,0 +1,40 @@
>>> +; RUN: llc < %s -march=nvptx -mcpu=sm_20 | FileCheck %s
>>> +
>>> +
>>> +declare i8 @llvm.nvvm.ldu.global.i.i8.p1i8(i8 addrspace(1)* %ptr)
>>> +declare i32 @llvm.nvvm.ldu.global.i.i32.p1i32(i32 addrspace(1)* %ptr)
>>> +declare i8 @llvm.nvvm.ldg.global.i.i8.p1i8(i8 addrspace(1)* %ptr)
>>> +declare i32 @llvm.nvvm.ldg.global.i.i32.p1i32(i32 addrspace(1)* %ptr)
>>> +
>>> +
>>> +; CHECK: func0
>>> +define i8 @func0(i8 addrspace(1)* %ptr) {
>>> +; ldu.global.u8
>>> +  %val = tail call i8 @llvm.nvvm.ldu.global.i.i8.p1i8(i8 addrspace(1)*
>>> %ptr), !align !0
>>> +  ret i8 %val
>>> +}
>>> +
>>> +; CHECK: func1
>>> +define i32 @func1(i32 addrspace(1)* %ptr) {
>>> +; ldu.global.u32
>>> +  %val = tail call i32 @llvm.nvvm.ldu.global.i.i32.p1i32(i32
>>> addrspace(1)* %ptr), !align !0
>>> +  ret i32 %val
>>> +}
>>> +
>>> +; CHECK: func2
>>> +define i8 @func2(i8 addrspace(1)* %ptr) {
>>> +; ld.global.nc.u8
>>> +  %val = tail call i8 @llvm.nvvm.ldg.global.i.i8.p1i8(i8 addrspace(1)*
>>> %ptr), !align !0
>>> +  ret i8 %val
>>> +}
>>> +
>>> +; CHECK: func3
>>> +define i32 @func3(i32 addrspace(1)* %ptr) {
>>> +; ld.global.nc.u32
>>> +  %val = tail call i32 @llvm.nvvm.ldg.global.i.i32.p1i32(i32
>>> addrspace(1)* %ptr), !align !0
>>> +  ret i32 %val
>>> +}
>>> +
>>> +
>>> +
>>> +!0 = metadata !{i32 4}
>>>
>>> Modified: llvm/trunk/test/CodeGen/NVPTX/ldu-reg-plus-offset.ll
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> NVPTX/ldu-reg-plus-offset.ll?rev=211939&r1=211938&r2=211939&view=diff
>>> ============================================================
>>> ==================
>>> --- llvm/trunk/test/CodeGen/NVPTX/ldu-reg-plus-offset.ll (original)
>>> +++ llvm/trunk/test/CodeGen/NVPTX/ldu-reg-plus-offset.ll Fri Jun 27
>>> 13:35:51 2014
>>> @@ -7,9 +7,9 @@ define void @reg_plus_offset(i32* %a) {
>>>  ; CHECK:        ldu.global.u32  %r{{[0-9]+}}, [%r{{[0-9]+}}+32];
>>>  ; CHECK:        ldu.global.u32  %r{{[0-9]+}}, [%r{{[0-9]+}}+36];
>>>    %p2 = getelementptr i32* %a, i32 8
>>> -  %t1 = call i32 @llvm.nvvm.ldu.global.i.i32(i32* %p2), !align !1
>>> +  %t1 = call i32 @llvm.nvvm.ldu.global.i.i32.p0i32(i32* %p2), !align !1
>>>    %p3 = getelementptr i32* %a, i32 9
>>> -  %t2 = call i32 @llvm.nvvm.ldu.global.i.i32(i32* %p3), !align !1
>>> +  %t2 = call i32 @llvm.nvvm.ldu.global.i.i32.p0i32(i32* %p3), !align !1
>>>    %t3 = mul i32 %t1, %t2
>>>    store i32 %t3, i32* %a
>>>    ret void
>>> @@ -17,5 +17,5 @@ define void @reg_plus_offset(i32* %a) {
>>>
>>>  !1 = metadata !{ i32 4 }
>>>
>>> -declare i32 @llvm.nvvm.ldu.global.i.i32(i32*)
>>> +declare i32 @llvm.nvvm.ldu.global.i.i32.p0i32(i32*)
>>>  declare i32 @llvm.nvvm.read.ptx.sreg.tid.x()
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>    ------------------------------
>>  This email message is for the sole use of the intended recipient(s) and
>> may contain confidential information.  Any unauthorized review, use,
>> disclosure or distribution is prohibited.  If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>>  ------------------------------
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140828/c3593f5d/attachment.html>


More information about the llvm-commits mailing list