[LLVMdev] Adding masked vector load and store intrinsics
Demikhovsky, Elena
elena.demikhovsky at intel.com
Tue Oct 28 01:43:01 PDT 2014
Yes, David is right. We should cover all instructions that can trap on NaN.
I just counted all FP instructions, including conversions: fadd, fsub, .., fptrunc, fext, ..sitofp, fcmp, fma (~13) + gather/scatter (2) + load/store(2).
I'm not sure about integer divide and remainder, because we don't have a solution in the Intel Architecture today. On the other hand, a library may support these operations in masked vector form and do it faster than a scalar sequence.
- Elena
-----Original Message-----
From: dag at cray.com [mailto:dag at cray.com]
Sent: Monday, October 27, 2014 19:39
To: Adam Nemet
Cc: Demikhovsky, Elena; Hal Finkel; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics
Adam Nemet <anemet at apple.com> writes:
> Can you please elaborate on the list. I don’t see how 20 intrinsics
> would cover “All FP”. But do you really have to do all FP or only
> instructions that can trap with LLVM (e.g. division by zero)?
We need intrinsics for all the FP operations. Any operand that is a signaling NaN or even a quiet NaN for some operations will trap. LLVM needs masking to protect itself from that when vectorizing certain kinds of loops.
-David
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
More information about the llvm-dev
mailing list