[LLVMdev] Matching addsub

Tue Oct 30 09:38:02 PDT 2012

I'd like to resurrect this old thread. Now that we have the TargetTransformInfo infrastructure, and other improvements are being made to our vectorization capabilities, I'd like to tackle this question. In terms of representing asymmetric vector operations, we seem to have several options:

 1. Introduce one or more intrinsics, such as I proposed, and pattern match (or expand) as appropriate. TargetTransformInfo will inform the vectorizer of costs as it will with other intrinsics.
 2. Keep only target intrinsics, but extend TargetTransformInfo to inform the vectorizers regarding how to form the relevant target intrinsics.
 3. Use the DAG combiner to form target-specific ISD nodes for these instructions (as Duncan pointed out, this is how he handles the FHADD instruction). TargetTransformInfo will indicate whether this is likely to happen(?)

Opinions?

Thanks again,
Hal

----- Original Message -----
> From: "Hal Finkel" <hfinkel at anl.gov>
> To: "Dan Gohman" <gohman at apple.com>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Tuesday, October 18, 2011 2:49:21 PM
> Subject: Re: [LLVMdev] Matching addsub
> 
> On Tue, 2011-10-18 at 10:51 -0700, Dan Gohman wrote:
> > On Oct 17, 2011, at 6:40 PM, Hal Finkel wrote:
> > 
> > > On Mon, 2011-10-17 at 17:33 -0700, Dan Gohman wrote:
> > >> On Oct 17, 2011, at 3:40 PM, Hal Finkel wrote:
> > >> 
> > >>> How should I go about matching floating-point addsub-like
> > >>> vector
> > >>> instructions? My first inclination is to write something which
> > >>> matches
> > >>> build_vector 1.0, -1.0, and then use that in combination with a
> > >>> match on
> > >>> fadd, but that does not seem to work. I think this is because
> > >>> BUILD_VECTOR cannot ever be "Legal", and so it is always turned
> > >>> into a
> > >>> constant load before instruction selection.
> > >> 
> > >> Trying to keep a vector producer this naive about the target
> > >> instruction
> > >> set is awkward. If the vector producer doesn't know whether the
> > >> target
> > >> has an addsub or subadd instruction, how is it to know the best
> > >> way to
> > >> expand complex multiplication? It's likely to get suboptimal
> > >> code in many
> > >> cases.
> > >> 
> > >> On the other hand, a producer that knows that the target has
> > >> certain
> > >> instructions could pretty easily just use appropriate intrinsics
> > >> that can
> > >> be mapped directly to the desired instructions. This way,
> > >> there's no need
> > >> for it to play pictionary with the backend, drawing out its
> > >> desired
> > >> semantics in terms of primitive operations and expecting codegen
> > >> to
> > >> rediscover what was meant by pattern-matching.
> > >> 
> > > 
> > > In general, I agree with you. It is always questionable how far
> > > to
> > > attempt to go with idiom recognition. However, some kind of
> > > combination
> > > add/subtract is a common vector operation, and so I seem to be
> > > left with
> > > three choices:
> > > 
> > > 1. Decompose the operation and then attempt to recognize the
> > > decomposition.
> > > 2. Add an additional LLVM instruction.
> > > 3. Add a number of target-specific special cases into the
> > > higher-level
> > > code.
> > > 
> > > I am not sure which is better, but I'd prefer to stay away from
> > > choice
> > > (3) as much as practical in favor of one of the first two
> > > options. Would
> > > you support adding some kind of vector_faddsub LLVM instruction?
> > 
> > I assume you mean an operator that subtracts even elements and adds
> > odd
> > elements; correct me if I'm wrong.  I agree that this operation is
> > fairly
> > common; I think a general intrinsic to do this would be useful for
> > people
> > working on such targets.
> 
> Yes, that is essentially what I mean.
> 
> > 
> > > 
> > > Also, there is a precedent, in some sense, for choices (1) and
> > > (2) in
> > > how fneg %a is serialized (as fsub -0.0, %a). Perhaps we could
> > > recognize
> > > fadd (fmul <-1.0, 1.0>), %b and turn it into something else for
> > > instruction selection in a similar way.
> > 
> > The problem with "fadd(fmul <-1.0,1.0>, %a), %b" is that it
> > requires a
> > separate fmul instruction at the LLVM IR level, and you never want
> > to
> > actually execute an fmul.  If optimization obscures the pattern,
> > you end
> > up with an unwanted fmul.
> 
> I certainly agree that this is less than ideal. Adding a new
> intrinsic
> would be better.
> 
> I would like to add two, one would be something that adds the odd
> elements and subtracts the even elements, maybe:
> @llvm.addsub.*
> And another is one which negates only the even elements, maybe
> @llvm.altneg.* is a good name? (actually, only this one is really
> necessary, but would we like both for clarity or some other reason?)
> 
> In combination with permutations, this should allow the encoding of
> the
> full range of asymmetric-fma-type operations (some of which can be
> directly represented on some platforms), while remaining fairly
> generic
> any easy to emulate.
> 
> What do you think?
> 
> > 
> > The fneg case is simpler because fsub with a constant is just one
> > instruction
> > in LLVM IR.  FWIW, the lack of a proper way to do fneg also happens
> > to
> > be a bug.
> 
> Is this something, then, that we would like to fix? Since fneg is
> already a selection node, it would seem not to be a huge change to
> make
> it an instruction. Alternatively, we could just alter the parser and
> printers to support reading and writing 'fneg'.
> 
>  -Hal
> 
> > 
> > Dan
> > 
> 
> --
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory