[llvm-commits] [RFC] Additions to TargetTransform for operation legality

Fri Oct 12 16:11:01 PDT 2012

----- Original Message -----
> From: "Nadav Rotem" <nrotem at apple.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvm-commits at cs.uiuc.edu LLVM" <llvm-commits at cs.uiuc.edu>, "Jakob Stoklund Olesen" <stoklund at 2pi.dk>
> Sent: Friday, October 12, 2012 5:37:13 PM
> Subject: Re: [RFC] Additions to TargetTransform for operation legality
> 
> 
> On Oct 12, 2012, at 2:31 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > ----- Original Message -----
> >> From: "Nadav Rotem" <nrotem at apple.com>
> >> To: "Hal Finkel" <hfinkel at anl.gov>
> >> Cc: "llvm-commits at cs.uiuc.edu LLVM" <llvm-commits at cs.uiuc.edu>,
> >> "Jakob Stoklund Olesen" <stoklund at 2pi.dk>
> >> Sent: Friday, October 12, 2012 12:58:00 PM
> >> Subject: Re: [RFC] Additions to TargetTransform for operation
> >> legality
> >> 
> >> Hi Hal!
> >> 
> >> Thanks for working on this.
> >> 
> >> On Oct 12, 2012, at 10:05 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> >> 
> >>> Nadav, et al.,
> >>> 
> >>> I'd like to start using the new TargetTransform interface in
> >>> BBVectorize. The first step is to enable BBVectorize to
> >>> understand
> >>> what operations are likely to be efficiently supported by the
> >>> target.
> >> 
> >> Great. I agree.
> >> 
> >>> While that's a difficult question to answer generally, I think a
> >>> reasonable proxy will be: operations that won't be expanded
> >>> during
> >>> legalization are probably supported efficiently.
> >> 
> >> I agree that checking if the operation is non-expandable is a good
> >> idea. You assume that operations which are custom lowered are
> >> efficient, which is a reasonable assumption to make.  I think that
> >> we can improve the accuracy of this question if we add
> >> target-specific white-lists and black-lists that the different
> >> targets can implement. Also, I think that we will need to add
> >> target-hooks.
> > 
> > How are we planning to do this? Are we going to allow individual
> > targets to subclass ScalarTargetTransformInfo, etc. or are we
> > going to provide some other mechanism for customization?
> 
> We need to allow different targets to subclass the
> VectorTargetInformationInfo class.  The VTTI and STTI class are
> already created inside the different targets so this will be easy to
> do.  We just need to define a good interface.
> 
> > 
> >> But we need to start simple, so I am happy with your
> >> approach.
> > 
> > Okay, thanks! This was my thought as well: this is the simplest
> > thing to do.
> > 
> >> 
> >>> I've attached a patch which implements two functions:
> >>> allowsUnalignedMemoryAccesses (which is straightforward), and
> >>> isPromotedOperationLegalOrCustom, and I'd like some feedback on
> >>> whether this seems like the right way to go.
> >> 
> >> You should use the VectorTargetTransformInfo. We don't want to
> >> create
> >> a single mega TargetLowering-like interface. If the main users of
> >> this information are the BB-vectorizer and the Loop-Vectorizer,
> >> then
> >> we should place this info in VectorTargetTransform.
> > 
> > Okay.
> > 
> >> 
> >> Bob Wilson suggested that this interface will return the 'cost' of
> >> the instruction. This will allow us to compare the costs of the
> >> vectorized function before and after vectorization.  We may want
> >> to
> >> vectorize the function even if we found that one instruction is
> >> illegal (unlike gcc which aborts).
> > 
> > This is a good idea.
> > 
> >> It all depends on the ratio of
> >> good to bad instructions - or the cost.  I think that the cost
> >> needs
> >> to be calculated (also) according to the TypeLegalization
> >> decision.
> >> So, for example, if we scalarize a type, we can give it a cost of
> >> 8
> >> which reflects (insert + op  + extract).  If we split a type we
> >> can
> >> give it the cost 2.  We many need an iterative routine to query
> >> getTypeConversion() because type legalization is done in multiple
> >> phases.
> > 
> > I agree, having costs is certainly our end goal. Making an
> > "instrumented" version of getTypeConversion will certainly assist
> > in making a better guess. Maybe we should just modify
> > getTypeConversion to record more about what it is doing.
>  
> I think that the getTypeConversion return value gives us enough
> information.  Basically, it described the action that needs to
> happen. We can put a cost on each action. For example, split can
> multiply the cost by 2. But maybe there are other ways of doing
> this.

Makes sense.

> 
> Generally, I prefer to start with something simple, like white-list &
> black-list of costs of known instructions, and improve this as we
> go.

Okay, sounds good. We can make a function to return the cost of an operation. It will default to 1.0, then the target can customize as necessary. Is that what you're thinking?

> 
> >> 
> >>> I've not really tested this function, but I'd like to know if you
> >>> think this is the right kind of interface for this, and if the
> >>> approach is about right. Quickly,
> >>> isPromotedOperationLegalOrCustom
> >>> first promotes the type as would be required during type
> >>> legalization, converts the IR instruction opcode or intrinsic
> >>> identifier into an ISD opcode, promotes the type based on
> >>> operation-specific promotion, and then checks whether the
> >>> resulting operation/type combination is marked Legal or Custom.
> >> 
> >> I prefer that we don't reimplement the day-builder, but at the
> >> moment
> >> I don't have a better idea.
> > 
> > I agree. I also don't really want to redo the DAG builder; and I
> > don't want to maintain two parallel codes, one which builds the
> > DAG, and one which models building the DAG, but we may not have
> > any other choice if the DAG builder is too slow to use in a
> > speculative sense (is it?).
> > 
> > One thing that we could do is, for some predefined set of
> > instruction sequences based on a target's legal types, when the
> > compiler is built, compile and record the cost of the sequences
> > (the scheduler will give us a cycle-count total). Then we could
> > use those costs to feed the model in the analysis pass.
> > 
> Lets start with something simple.
> 
> I *think* that the loop vectorizer and the BB vectorizer need the
> same things, but just to be sure, lets talk about the information
> that we need here. I think that we don't care about the speed of
> FMUL_v4i32   Vs  FNEG_v4i32.   What we do care about is the speed of
> FMUL_i32  vs FMUL_v4i32  or
>  extract-extract-fmul-fmul-insert-insert.   We care about the
> relative ratio between the scalar version and the vector version.
> The vector version may be a sequence of instructions which may
> contain inserts, extracts and the scalar operation. Do you need
> something else for the BB vectorizer ?

I want to know whether forming an IR-level vector operation will be cheaper than keeping the existing scalar operations. Were no other optimizations done, vectorizing followed by scalarizing would be a fancy no-op. In practice, because less optimization can be done on the scalarized code (because it is expanded so late), vectorizing and then scalarizing leads to worse code. So I want to make sure that when I form a vector instruction, that corresponds to something with very little relevant "internal structure" (hopefully, only one, or maybe two, assembly instructions).

Beyond that (as a second step), I'll want cost information so that different vector instructions can be compared.

> 
> 
> >> Can you refactor
> >> SelectionDAGBuilder;:visitIntrinsicCall before doing this ?
> > 
> > Yes, good idea.
> > 
> >> Maybe
> >> we can share some code there. Maybe we can create a table with
> >> mapping between intrinsics to ISDNodes.
> >> 
> >> I think that the interface of "isPromotedOperationLegalOrCustom"
> >> should be a single parameter - a pointer to the IR.
> > 
> > This would certainly make things easier, but I often want to
> > evaluate the cost of some operation that I've not actually yet
> > created. My understanding is that speculatively creating
> > instructions won't be acceptable. Knowing the cost of the existing
> > instruction steam is helpful, but if I can't compare it accurately
> > to a set of
> > unmaterialized potential streams, then that is not useful.
> 
> 
> > Maybe if we made it possible to construction Instruction objects
> > directly on the stack, disconnected from use tracking, etc. that
> > would make it cheap enough to use them speculatively to check
> > costs. What do you think?
> 
> I think that it will be hard to maintain because people will be
> surprised that things don't behave as they are used to.  Maybe we
> can pass an opcode and a list of arguments ?

That's not a bad idea; but we'll need to think about how to handle, for example, input shuffles. When the BB vectorizer forms a new vector instruction, it often needs to construct new shuffles for the input arguments.

Thanks again,
Hal

> 
> > 
> >> The legalization code may want to walk up the IR and look around.
> >> For
> >> example, for shuffles, it may want to check if one of the operands
> >> is undef.
> > 
> > Agreed.
> > 
> >> Also, you don't provide enough information to make a good
> >> decision. I
> >> want to be able to estimate the cost of zext_v8i8_to_v8i32.
> > 
> 
> 
> Thanks!
> 
> 
> > Yes, there is a problem with all of the conversion instructions
> > with these functions. For those, we need to check both the input
> > and result types.
> > 
> > Thanks again,
> > Hal
> > 
> >> 
> >> Thanks,
> >> Nadav
> >> 
> >>> Thanks in advance,
> >>> Hal
> >>> 
> >>> --
> >>> Hal Finkel
> >>> Postdoctoral Appointee
> >>> Leadership Computing Facility
> >>> Argonne National Laboratory
> >>> <ttrans_oplegal.patch>
> >> 
> >> 
> > 
> > --
> > Hal Finkel
> > Postdoctoral Appointee
> > Leadership Computing Facility
> > Argonne National Laboratory
> 
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory