[PATCH] D153993: [Headers][doc] Add load/store/cmp/cvt intrinsic descriptions to avx2intrin.h

Fri Jun 30 08:15:31 PDT 2023

pengfei accepted this revision.
pengfei added a comment.
This revision is now accepted and ready to land.

LGTM.

================
Comment at: clang/lib/Headers/avx2intrin.h:3474
+///   IF __M[j+31] == 1
+///     result[j+31:j] := Load32(__X+(i*4))
+///   ELSE
----------------
probinson wrote:
> pengfei wrote:
> > probinson wrote:
> > > pengfei wrote:
> > > > probinson wrote:
> > > > > pengfei wrote:
> > > > > > A more intrinsic guide format is `MEM[__X+j:j]`
> > > > > LoadXX is the syntax in the gather intrinsics, e.g. _mm_mask_i32gather_pd. I'd prefer to be consistent.
> > > > I think the problem here is the measurement is easily confusing.
> > > > From C point of view, `__X` is a `int` pointer, so we should `+ i` rather than `i * 4`
> > > > From the other part of the code, we are measuring in bits, but here `i * 4` is a byte offset.
> > > Well, the pseudo-code is clearly not C. If you look at the gather code, it computes a byte address using an offset multiplied by an explicit scale factor. I am doing exactly the same here.
> > > 
> > > The syntax `MEM[__X+j:j]` is mixing a byte address with a bit offset, which I think is more confusing. To be fully consistent, using `[]` with bit offsets only, it should be
> > > ```
> > > k := __X*8 + i*32
> > > result[j+31:j] := MEM[k+31:k]
> > > ```
> > > which I think obscures more than it explains.
> > Yeah, it's not C code here. But we are easy to fall into C concepts, e.g., why assuming __X is measuring in bytes?
> > That's why I think it's clear to make both in bits.
> > I made a mistake here, I wanted to propose `MEM[__X+j+31: __X+j]`. It matches with [[ https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#ig_expand=4057,4058,4059,4053,665,3890,5959,5910,3870,4280&text=_mm256_maskload_epi32 | Intrinsic Guide ]].
> > 
> We assume `__X` is in bytes because that's how addresses work on X86. Adding a bit offset to a byte address makes no sense. I see that is how existing Intel documentation works, which does not make it correct.
> 
> To "make both in bits" means multiplying `__X` by 8, as in the example in my previous comment. Or coming up with a different syntax that makes the difference clear.
> `MEM(__X)[j+31:j]` or even `MEM[__X][j+31:j]` would be preferable.
My intention is to match with Intrinsic Guide as possible. Multiplying by 8 cannot achive it, but I cannot deny `__X` in bytes makes sense.
So I'm fine to use a different syntax.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153993/new/

https://reviews.llvm.org/D153993