[PATCH] D16137: AVX512: VMOVDQU8/16/32/64 (load) intrinsic implementation.

Sun Jan 17 04:44:17 PST 2016

igorb added a comment.

In http://reviews.llvm.org/D16137#327385, @mbodart wrote:

> I'm not sure I understand why this is just now becoming an issue.
>
> Is the need for an X86-specific override of LowerOperationWrapper driven by an existing problem
>  with SINT_TO_FP lowering, or by a problem that is only exposed when adding the new masked load intrinsics?


The problem exposed only when adding new masked load intrinsics. In previous implementation only one value was taken ( chain was dropped ).

> Is the chain being dropped for both SINT_TO_FP and masked loads, or just one of them.


Only for SINT_TO_FP ( or any other similar nodes).

> What are the safety consequences of dropping the chain, wrt losing an ordering dependence?


In SINT_TO_FP store->load chain preserved, as i understand LOAD node chain could be dropped.

> And why is I64 write mask legalization fine for most masked intrinsics, but not the masked loads?

>  Is it because none of the other existing I64-masked intrinsics produce an additional chain result?


Yes, masked load intrinsic (packed bytes) operand type legalization is the first one that produce additional chain result.

> A concrete example or two, showing the DAG snippets during legalization , would be helpful.


   SINT_TO_FP DAG snippet
  
   t6: f64 = sint_to_fp t5
      t5: i64 = build_pair t2, t4
        t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
          t0: ch = EntryToken
        t4: i32,ch = CopyFromReg t0, Register:i32 %vreg1
  
  Transformed to
  
    t13: f64,ch = X86ISD::FILD<LD8[FixedStack0]> t11, FrameIndex:i32<0>, ValueType:ch:i64
      t11: ch = store<ST8[FixedStack0](align=4)> t0, t5, FrameIndex:i32<0>, undef:i32
        t0: ch = EntryToken
        t5: i64 = build_pair t2, t4
          t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
          t4: i32,ch = CopyFromReg t0, Register:i32 %vreg1                 
    --------------------------------------------------------
  masked loads snippet
  
    t12: v64i8,ch = llvm.x86.avx512.mask.loadu.b.512<LD64[%x0](align=1)> t0, TargetConstant:i32<4681>, t3, t5, t17
      t0: ch = EntryToken
      t3: i32,ch = load<LD4[FixedStack-1](align=16)> t0, FrameIndex:i32<-1>, undef:i32
      t5: v64i8,ch = CopyFromReg t0, Register:v64i8 %vreg0
      t17: i64,ch = load<LD8[FixedStack-2](align=4)> t0, FrameIndex:i32<-2>, undef:i32
      
  Transformed to
      
    t30: v64i8,ch = masked_load<LD64[%x0](align=1)> t0, t3, t29, t5
      t0: ch = EntryToken
      t3: i32,ch = load<LD4[FixedStack-1](align=16)> t0, FrameIndex:i32<-1>, undef:i32
      t29: v64i1 = concat_vectors t27, t28
        t27: v32i1 = bitcast t24
          t24: i32 = extract_element t17, Constant:i32<0>
            t17: i64,ch = load<LD8[FixedStack-2](align=4)> t0, FrameIndex:i32<-2>, undef:i32
        t28: v32i1 = bitcast t26
          t26: i32 = extract_element t17, Constant:i32<1>
      t5: v64i8,ch = CopyFromReg t0, Register:v64i8 %vreg0 


================
Comment at: lib/Target/X86/X86InstrAVX512.td:2753
@@ -2752,11 +2752,3 @@
                                  HasAVX512>, XS, VEX_W, EVEX_CD8<64, CD8VF>;
 
 def: Pat<(int_x86_avx512_mask_storeu_d_512 addr:$ptr, (v16i32 VR512:$src),
----------------
mbodart wrote:
> Can you please explain why these patterns are being deleted?
This intrinsics is handled by DAG Legalization  pass (X86ISelLowering.cpp , lowerINTRINSIC_W_CHAIN() function)


Repository:
  rL LLVM

http://reviews.llvm.org/D16137