[PATCH] D148021: [Headers][doc] Add FMA intrinsic descriptions

Sun Apr 16 19:31:01 PDT 2023

pengfei accepted this revision.
pengfei added inline comments.
This revision is now accepted and ready to land.

================
Comment at: clang/lib/Headers/fmaintrin.h:22
+/// Computes a multiply-add of 128-bit vectors of [4 x float].
+///    For each element, computes <c> (__A * __B) + __C </c>.
+///
----------------
probinson wrote:
> pengfei wrote:
> > We are using a special format to describute the function in a pseudo code to share it with the intrinsic guide, e.g.,
> > https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/avx512fintrin.h#L9604-L9610
> > https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm512_i32logather_pd&ig_expand=4077
> > 
> > There's no strong requirement to follow it, but it would be better to adopt a uniform format.
> Is a FOR loop with computed bit offsets really clearer than "For each element" ? Is it valuable to repeat information that can be found in the instruction reference? 
> I can accept answers of "yes" and "yes" because I am not someone who ever deals with vector data, but I would be a little surprised by those answers.
> 
We have internal tools to verify them according to these offsets (but of course not for 100% intrinsics), which means we can trust such information in most time.

The suggestion is based on:
1. Correctness. It's easy to be errorprone for tedious documentations and we don't have other verifications but eyes;
2. Unity. Using the same format is not only better for clean code but also distinct for future developers to follow;

As I said before, I'm not forcing this way. You decide it.

================
Comment at: clang/lib/Headers/fmaintrin.h:26
+///
+/// This intrinsic corresponds to the <c> VFMADD213PS </c> instruction.
+///
----------------
probinson wrote:
> pengfei wrote:
> > It would be odd to user given this is just 1/3 instructions the intrinsic may generate, but I don't have a good idea here.
> I listed the 213 version because that's the one that multiplies the first two operands, and the intrinsic multiplies the first two operands. So it's the instruction that most closely corresponds to the intrinsic.
> We don't guarantee that the "corresponding" instruction is what is actually generated, in general. I know this point has come up before regarding intrinsic descriptions. My thinking is that the "corresponding instruction" gives the reader a place to look in the instruction reference manual, so listing only one is again okay.
Note, intrinsics are force inlined. It's rare user will wrap it in a simple function with the same arguments order. In reality, each version may be generated. https://godbolt.org/z/q7j8Wxrnb

However, I'm not against this approach if we don't have any better way to describe it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148021/new/

https://reviews.llvm.org/D148021