[PATCH] [AVX-512] - Add FMA instruction with Rounding mode
resistor at mac.com
Thu Jan 15 13:10:42 PST 2015
> On Jan 15, 2015, at 2:59 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>> On Jan 15, 2015, at 12:40 AM, Ahmed Bougacha <ahmed.bougacha at gmail.com <mailto:ahmed.bougacha at gmail.com>> wrote:
>> On Thu, Jan 15, 2015 at 8:42 AM, Elena Demikhovsky
>> <elena.demikhovsky at intel.com <mailto:elena.demikhovsky at intel.com>> wrote:
>>> Hi Adam,
>>> We'd like to submit this code and proceed.
>>> The goal is to let setting rounding mode in intrinsics. The operations that we talk about are FP arithmetic - like FADD, FMUL, FSUB, FDIV, FMA - 512 bit vector only.
>>> And FP conversions. Nothing more, so there is no correlation with masks.
>>> I see 2 options right now.
>>> 1. define an additional X86 node type for each intrinsic
>>> 2. Wrap the existing node with ROUNDMODE
>>> In the patch we are giving solution number 2 and say that it is safe. If ROUNDMODE and FADD will be separated, the compilation will fail with "cannot select".
>>> Do you still have any concerns?
>> FWIW I totally share Adam's concerns, but I'll admit I don't have a
>> better alternative to creating all the specific nodes, and that
>> doesn't seem perfect either.
>> How about - and this is totally handwavy - adding a single new node,
>> having all of the operation, rounding mode, and mask as operands? I
>> don't know if there's precedent, but since you say we're late in the
>> SelectionDAG, your only problem would then be to match these,
>> correct? That might involve some tweaks in tablegen to be able to
>> specify an FP op as an operand, but that doesn't sound too hard. In
>> the SelectionDAG proper, the operation "operand" would simply be a
>> ConstantSDNode, holding the ISD opcode.
>> Say something like:
>> (AVX512OpWithRounding fadd, op1, op2, <rounding mode, ...>)
>> I'm guessing you won't be able to reuse the existing "fadd"/... in
>> tablegen, but adding a few new defs, with a 1-to-1 mapping with the
>> ISD opcode, should do the trick.
> Naive question: why not directly matching the intrinsics themselves without using an intermediate ISD?
> I assumed it was the way to go when there is a 1-1 mapping between an intrinsic and a specific instruction?
I have the same question. This seems way over-engineered to me.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits