[PATCH] D11218: AVX512 : Integer Truncate with/without saturation support

Thu Jul 16 03:14:06 PDT 2015

ab added inline comments.

================
Comment at: include/llvm/IR/IntrinsicsX86.td:5294
@@ +5293,3 @@
+let TargetPrefix = "x86" in {
+  def int_x86_avx512_mask_pmov_qb_128 :
+          GCCBuiltin<"__builtin_ia32_pmovqb128_mask">,
----------------
delena wrote:
> hfinkel wrote:
> > delena wrote:
> > > ab wrote:
> > > > So, this file is starting to get pretty unwieldy.  Why not use multiclasses?
> > > Ahmed, we don't want to change format of this file now. We need to add hundreds of intrinsics and we generate these lines. Multiclasses will require manual code. It will significantly slow down the process.
> > > 
> > Could you contribute the code that you use to generate the intrinsic definitions?
> > 
> > I'm not entirely sure that a multiclass would help here (as I recall, TableGen does not have a 'map' data structure, and that's probably what you want here). However, to the extent possible, we should have human-maintainable code. TableGen alone might not currently be the best tool for this right now, but if so, we should have a detailed understanding of why.
> > 
> We can't contribute the generator. It is an informal semiautomatic process based on parsing intrinscs headers. But it significantly simplifies our life when we need to add such amount of intrinsics.
At least some of it looks easily multiclass-able, no?  For instance, the _mem_ variants, or the saturation kind.  That's already a factor of 6.  We can also have a multiclass for the size variants.  So, even without being smart with types, you can do:

```
multiclass int_avx512_mask_pmov_base<string SatKind, string Size,
                                     LLVMType InVT, LLVMType OutVT, LLVMType ScVT> {
  def int_x86_avx512_mask_pmov#SatKind#_#NAME#_#Size :
          GCCBuiltin<"__builtin_ia32_pmov"#SatKind#NAME#Size#"_mask">,
          Intrinsic<[OutVT],
                    [InVT, OutVT, ScVT],
                    [IntrNoMem]>;
  def int_x86_avx512_mask_pmov#SatKind#_#NAME#_mem_#Size :
          GCCBuiltin<"__builtin_ia32_pmov"#SatKind#NAME#Size#"mem_mask">,
          Intrinsic<[],
                    [llvm_ptr_ty, InVT, ScVT],
                    [IntrReadWriteArgMem]>;
}

multiclass int_x86_avx512_mask_pmov_base_sized<string Size, LLVMType InVT, LLVMType OutVT,
                                               LLVMType ScVT> {
  defm NAME#"" : int_avx512_mask_pmov_base<  "", Size, InVT, OutVT, ScVT>;
  defm NAME#"" : int_avx512_mask_pmov_base< "s", Size, InVT, OutVT, ScVT>;
  defm NAME#"" : int_avx512_mask_pmov_base<"us", Size, InVT, OutVT, ScVT>;
}

multiclass int_x86_avx512_mask_pmov_all<LLVMType ScVT128, LLVMType InVT128, LLVMType OutVT128,
                                        LLVMType ScVT256, LLVMType InVT256, LLVMType OutVT256,
                                        LLVMType ScVT512, LLVMType InVT512, LLVMType OutVT512> {
  defm NAME : int_x86_avx512_mask_pmov_base_sized<"128", InVT128, OutVT128, ScVT128>;
  defm NAME : int_x86_avx512_mask_pmov_base_sized<"256", InVT256, OutVT256, ScVT256>;
  defm NAME : int_x86_avx512_mask_pmov_base_sized<"512", InVT512, OutVT512, ScVT512>;
}

let TargetPrefix = "x86" in {
  defm qb : int_x86_avx512_mask_pmov_all<llvm_i8_ty,  llvm_v2i64_ty,  llvm_v16i8_ty,
                                         llvm_i8_ty,  llvm_v4i64_ty,  llvm_v16i8_ty,
                                         llvm_i8_ty,  llvm_v8i64_ty,  llvm_v16i8_ty>;
  defm qw : int_x86_avx512_mask_pmov_all<llvm_i8_ty,  llvm_v2i64_ty,  llvm_v8i16_ty,
                                         llvm_i8_ty,  llvm_v4i64_ty,  llvm_v8i16_ty,
                                         llvm_i8_ty,  llvm_v8i64_ty,  llvm_v8i16_ty>;
  defm qd : int_x86_avx512_mask_pmov_all<llvm_i8_ty,  llvm_v2i64_ty,  llvm_v4i32_ty,
                                         llvm_i8_ty,  llvm_v4i64_ty,  llvm_v4i32_ty,
                                         llvm_i8_ty,  llvm_v8i64_ty,  llvm_v8i32_ty>;
  defm db : int_x86_avx512_mask_pmov_all<llvm_i8_ty,  llvm_v4i32_ty,  llvm_v16i8_ty,
                                         llvm_i8_ty,  llvm_v8i32_ty,  llvm_v16i8_ty,
                                         llvm_i16_ty, llvm_v16i32_ty, llvm_v16i8_ty>;
  defm dw : int_x86_avx512_mask_pmov_all<llvm_i8_ty,  llvm_v4i32_ty,  llvm_v8i16_ty,
                                         llvm_i8_ty,  llvm_v8i32_ty,  llvm_v8i16_ty,
                                         llvm_i16_ty, llvm_v16i32_ty, llvm_v16i16_ty>;
  defm wb : int_x86_avx512_mask_pmov_all<llvm_i8_ty,  llvm_v8i16_ty,  llvm_v16i8_ty,
                                         llvm_i16_ty, llvm_v16i16_ty, llvm_v16i8_ty,
                                         llvm_i32_ty, llvm_v32i16_ty, llvm_v32i8_ty>;
}

```

Which makes the differences more obvious, I think (though it's not ideal either)  (and I think we can handle types by faking maps (using !cast<>(##)) and just passing the scalar types).

I admit trunc/ext are one of the worst examples though (but only because of the widening of the smaller types, I think).  For instance, a few hundred lines above, the arithmetic instructions seem even more easily factorizable.

Repository:
  rL LLVM

http://reviews.llvm.org/D11218