[LLVMdev] RFC: AVX Pattern Specification [LONG]

Fri May 1 11:46:10 PDT 2009

On Apr 30, 2009, at 3:59 PM, David Greene wrote:

> Here's the big RFC.
>
> A I've gone through and designed patterns for AVX, I quickly  
> realized that the
> existing SSE pattern specification, while functional, is less than  
> ideal in
> terms of maintenance.  In particular, a number of nearly-identical  
> patterns
> are specified all over for nearly-identical instructions.  For  
> example:

Right.  A lot of the X86 backend was written before the current set of  
tblgen features was available. In particular, multiclasses only got  
retrofitted in later to avoid some of the duplicated code.  Where they  
are used, they aren't used as well as they could be.

> Moreover, the various SSE levels were implemented at different times  
> and do
> things subtly differently.  For example:
>
> Note the use of xor vs. vnot and the different placement of the bc*  
> fragments
> and use of type specifiers.  I wonder if we even match both of these.

Right, we have the same problem within the GPR operations.  Ideally  
we'd have a multiclass for most arithmetic operations that expands out  
into 8/16/32/64-bit operations, perhaps even handling reg/reg, reg/ 
imm, and reg/mem versions all at the same time.  Similar things should  
probably be done for SSE1/2 since it adds double versions of all the  
float operations that SSE1 has.

> For AVX we would need a different set of format classes because  
> while AVX
> could reuse the existing XS class (it's recoded as part of the VEX  
> prefix so
> we still need the information XS provides), "Requires<[HasSSE1]>" is
> certainly inappropriate.  Initially I started factoring things out  
> to separate
> XS and other prefix classes from Requires<> but that didn't solve  
> the pattern
> problems I mentioned above.

Right, a lot of these problems can be solved by some nice refactoring  
stuff.  I'm also hoping that some of the complexity in defining  
shuffle matching code can be helped by making the definition of the  
shuffle patterns more declarative within the td file.  It would be  
really nice to say that "this shuffle does a  "1,0,3,2 shuffle and has  
cost 42" and have tblgen generate all the required matching code.

> All of this complication gets multipled with AVX because AVX recodes  
> all of
> the legacy SSE instructions using VEX to provide three-address  
> forms.  So if
> we were to follow the existing sceheme, we would duplicate *all* of
> X86InstrSSE.td and edit patterns to match three-address modes and  
> then add the
> 256-bit patterns on top of that,  effectively duplicating  
> X86InstrSSE.td a
> second time.
>
> This is not scalable.

I agree, I think it is unfortunate that AVX decided to do this at an  
architectural level :).

> So what I've done is a little experiment to see if I can unify all  
> SSE and AVX
> SIMD instructions under one framework.  I'll leave MMX and 3dNow  
> alone since
> they're oddballs and hardly anyone uses them.

Ok.  I agree that the similarity being factored here is across SSE1/2/ 
AVX.

> Essentially I've created a set of base pattern classes that are very  
> generic.
> These contain the basic asm string templates and dag patterns we  
> want to
> match.  These classes are parameterized by things like register class,
> operand type, ModRM format and "memory access operation."  I've also  
> created
> patterns that take a fully specified asm string and/or dag pattern  
> to provide
> flexibility for "oddball" instructions.

Ok.

> The point of all of this is to write patterns and asm strings *once*  
> for each
> kind of instruction (binary arithmetic, convert, shuffle, etc.) and  
> then use
> multiclasses to generate all of the concrete patterns for SSE and AVX.

Very nice.

> So for example, an ADD would be specified like this:
>
> // Arithmetic ops with intrinsics and scalar equivalents
> defm ADD :
> sse1_sse2_avx_binary_scalar_xs_xd_vector_tb_ostb_node_intrinsic_rm_rrm 
> <
>   0x58,   // Opcode
>   "add",  // asm base opcode name
>   fadd,   // SDNode name
>   "add",  // Intrinsic base name (we pre-concat int_x86_sse*/avx and
>           // post-contact ps/pd, etc.)
>   1       // Commutative
>> ;
>
> Now the multiclass name is rather unwieldy, I know.  That can be  
> changed so
> don't worry too much about it.  I'm more concerned about the overall  
> scheme
> and that it make sense to you all.

This does look very nice.

> I have a Perl script that auto-generates the necessary mutliclass  
> combinations
> as well as the needed base classes depending on what's in the top- 
> level .td
> file.  For now, I've named that top-level file X86InstrSIMD.td.
>
> The Perl script would only be need to run as X86InstrSIMD.td  
> changes.  Thus
> its use would be similar to how we use autoconf today.  We only run  
> autoconf /
> automake when we update the .ac files, not as part of the build  
> process.

While I agree that we want to refactor this, I really don't think that  
we should autogenerate .td files from perl.  This has a number of  
significant logistical problems.  What is it that perl gives you that  
we can't enhance tblgen to do directly?

> Initially, X86InstrSIMD.td would define only AVX instructions so it  
> would not
> impact existing SSE clients.  My intent is that X86InstrSIMD.td  
> essentially
> become the canonical description of all SSE and AVX instructions and
> X86InstrSSE.td would go away completely.

Instead of slowly building it up and then cutting over, I'd prefer to  
incrementally move patterns into it, removing them from the other .td  
files at the same time.  This should be a nice clean and continuous  
refactoring that makes the code monotonically better (smaller).

> The pros of the scheme:
>
> * Unify all "important" x86 SIMD instructions into one framework and  
> provide
>  consistency

Yay!

> * Specify patterns and asm strings *once* per instruction type /  
> family
>  rather than the current scheme of multiple patterns for essentially  
> the
>  same instruction

Yay!

> * Bugfixes / optimizations / new patterns instantly apply to all SSE  
> levels
>  and AVX

Yay!

> The cons:
> * Transition from X86InstrSSE.td
> * A more complex class hierarchy

I'm not worried about these.

> * A class-generating tool / indirection

I really don't like this :).  If there is something higher level that  
you need, I think it would be very interesting to carefully consider  
what the root problem is and whether there is a good solution that we  
can directly implement in tblgen.  It is pretty clear that we can  
*improve* the current situation with no tblgen enhancements, but I  
agree that AVX is a nice forcing function that will greatly benefit  
from a *much improved* target description.

-Chris