[llvm-commits] [RFC] Additions to TargetTransform for operation legality

Fri Oct 12 15:37:13 PDT 2012

On Oct 12, 2012, at 2:31 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
>> From: "Nadav Rotem" <nrotem at apple.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: "llvm-commits at cs.uiuc.edu LLVM" <llvm-commits at cs.uiuc.edu>, "Jakob Stoklund Olesen" <stoklund at 2pi.dk>
>> Sent: Friday, October 12, 2012 12:58:00 PM
>> Subject: Re: [RFC] Additions to TargetTransform for operation legality
>> 
>> Hi Hal!
>> 
>> Thanks for working on this.
>> 
>> On Oct 12, 2012, at 10:05 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>> 
>>> Nadav, et al.,
>>> 
>>> I'd like to start using the new TargetTransform interface in
>>> BBVectorize. The first step is to enable BBVectorize to understand
>>> what operations are likely to be efficiently supported by the
>>> target.
>> 
>> Great. I agree.
>> 
>>> While that's a difficult question to answer generally, I think a
>>> reasonable proxy will be: operations that won't be expanded during
>>> legalization are probably supported efficiently.
>> 
>> I agree that checking if the operation is non-expandable is a good
>> idea. You assume that operations which are custom lowered are
>> efficient, which is a reasonable assumption to make.  I think that
>> we can improve the accuracy of this question if we add
>> target-specific white-lists and black-lists that the different
>> targets can implement. Also, I think that we will need to add
>> target-hooks.
> 
> How are we planning to do this? Are we going to allow individual targets to subclass ScalarTargetTransformInfo, etc. or are we going to provide some other mechanism for customization?

We need to allow different targets to subclass the VectorTargetInformationInfo class.  The VTTI and STTI class are already created inside the different targets so this will be easy to do.  We just need to define a good interface.

> 
>> But we need to start simple, so I am happy with your
>> approach.
> 
> Okay, thanks! This was my thought as well: this is the simplest thing to do.
> 
>> 
>>> I've attached a patch which implements two functions:
>>> allowsUnalignedMemoryAccesses (which is straightforward), and
>>> isPromotedOperationLegalOrCustom, and I'd like some feedback on
>>> whether this seems like the right way to go.
>> 
>> You should use the VectorTargetTransformInfo. We don't want to create
>> a single mega TargetLowering-like interface. If the main users of
>> this information are the BB-vectorizer and the Loop-Vectorizer, then
>> we should place this info in VectorTargetTransform.
> 
> Okay.
> 
>> 
>> Bob Wilson suggested that this interface will return the 'cost' of
>> the instruction. This will allow us to compare the costs of the
>> vectorized function before and after vectorization.  We may want to
>> vectorize the function even if we found that one instruction is
>> illegal (unlike gcc which aborts).
> 
> This is a good idea.
> 
>> It all depends on the ratio of
>> good to bad instructions - or the cost.  I think that the cost needs
>> to be calculated (also) according to the TypeLegalization decision.
>> So, for example, if we scalarize a type, we can give it a cost of 8
>> which reflects (insert + op  + extract).  If we split a type we can
>> give it the cost 2.  We many need an iterative routine to query
>> getTypeConversion() because type legalization is done in multiple
>> phases.
> 
> I agree, having costs is certainly our end goal. Making an "instrumented" version of getTypeConversion will certainly assist in making a better guess. Maybe we should just modify getTypeConversion to record more about what it is doing.

I think that the getTypeConversion return value gives us enough information.  Basically, it described the action that needs to happen. We can put a cost on each action. For example, split can multiply the cost by 2. But maybe there are other ways of doing this. 

Generally, I prefer to start with something simple, like white-list & black-list of costs of known instructions, and improve this as we go.

>> 
>>> I've not really tested this function, but I'd like to know if you
>>> think this is the right kind of interface for this, and if the
>>> approach is about right. Quickly, isPromotedOperationLegalOrCustom
>>> first promotes the type as would be required during type
>>> legalization, converts the IR instruction opcode or intrinsic
>>> identifier into an ISD opcode, promotes the type based on
>>> operation-specific promotion, and then checks whether the
>>> resulting operation/type combination is marked Legal or Custom.
>> 
>> I prefer that we don't reimplement the day-builder, but at the moment
>> I don't have a better idea.
> 
> I agree. I also don't really want to redo the DAG builder; and I don't want to maintain two parallel codes, one which builds the DAG, and one which models building the DAG, but we may not have any other choice if the DAG builder is too slow to use in a speculative sense (is it?).
> 
> One thing that we could do is, for some predefined set of instruction sequences based on a target's legal types, when the compiler is built, compile and record the cost of the sequences (the scheduler will give us a cycle-count total). Then we could use those costs to feed the model in the analysis pass.
> 
Lets start with something simple. 

I *think* that the loop vectorizer and the BB vectorizer need the same things, but just to be sure, lets talk about the information that we need here. I think that we don't care about the speed of FMUL_v4i32   Vs  FNEG_v4i32.   What we do care about is the speed of FMUL_i32  vs FMUL_v4i32  or  extract-extract-fmul-fmul-insert-insert.   We care about the relative ratio between the scalar version and the vector version. The vector version may be a sequence of instructions which may contain inserts, extracts and the scalar operation. Do you need something else for the BB vectorizer ?   

>> Can you refactor
>> SelectionDAGBuilder;:visitIntrinsicCall before doing this ? 
> 
> Yes, good idea.
> 
>> Maybe
>> we can share some code there. Maybe we can create a table with
>> mapping between intrinsics to ISDNodes.
>> 
>> I think that the interface of "isPromotedOperationLegalOrCustom"
>> should be a single parameter - a pointer to the IR.
> 
> This would certainly make things easier, but I often want to evaluate the cost of some operation that I've not actually yet created. My understanding is that speculatively creating instructions won't be acceptable. Knowing the cost of the existing instruction steam is helpful, but if I can't compare it accurately to a set of
> unmaterialized potential streams, then that is not useful.

> Maybe if we made it possible to construction Instruction objects directly on the stack, disconnected from use tracking, etc. that would make it cheap enough to use them speculatively to check costs. What do you think?

I think that it will be hard to maintain because people will be surprised that things don't behave as they are used to.  Maybe we can pass an opcode and a list of arguments ?  

> 
>> The legalization code may want to walk up the IR and look around. For
>> example, for shuffles, it may want to check if one of the operands
>> is undef.
> 
> Agreed.
> 
>> Also, you don't provide enough information to make a good decision. I
>> want to be able to estimate the cost of zext_v8i8_to_v8i32.
> 

Thanks!

> Yes, there is a problem with all of the conversion instructions with these functions. For those, we need to check both the input and result types.
> 
> Thanks again,
> Hal
> 
>> 
>> Thanks,
>> Nadav
>> 
>>> Thanks in advance,
>>> Hal
>>> 
>>> --
>>> Hal Finkel
>>> Postdoctoral Appointee
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>> <ttrans_oplegal.patch>
>> 
>> 
> 
> -- 
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory