[LLVMdev] RFC: More AVX Experience

David Greene dag at cray.com
Fri May 15 11:09:15 PDT 2009


Ok, so I've been chugging away at AVX and added some new
features in TableGen to facilitate writing generic patterns.

Here's an example:

//===----------------------------------------------------------------------===//
// Dummy defs for writing generic patterns
//===----------------------------------------------------------------------===//

def SRCREGCLASS;
def DSTREGCLASS;
def MEMCLASS;
def SRC1CLASS;
def SRC2CLASS;
def ADDRCLASS;
def INTRINSIC;
def TYPE;
def INTTYPE;
def MEMOP;

// TYPE        - The data type (f32 for SS, f64 for SD, etc.)
// SRCREGCLASS - The source register class (VR128, FR32, etc.)
// DSTREGCLASS - The destination register class
// MEMCLASS    - The memory classe (f32mem, f64mem, etc.)
// SRC1CLASS   - The first source object class (register or memory, depending)
// SRC2CLASS   - The second source object class (register or memory, 
depending)
// DSTCLASS    - The destination object class (register or memory, depending)
// ADDRCLASS   - Either 'addr' or REGCLASS, depending
// MEMOP       - Either 'memop' or 'srcvalue,' depending

// Scalar
defm FsANDN : sse1_sse2_avx_binary_scalar_xs_xd_node_pattern_rm_rrm<
              0x55, 
              "andn", 
  [[(set DSTREGCLASS:$dst,                                                                      
     (INTTYPE (and (not (INTTYPE (bitconvert (TYPE SRCREGCLASS:$src1)))),
                   (INTTYPE (MEMOP ADDRCLASS:$src2)))))]]>;

// Vector
defm ANDN : sse1_sse2_avx_binary_vector_tb_ostb_node_pattern_rm_rrm<
            0x55, 
            "andn", 
  [[(set DSTREGCLASS:$dst,
     (INTTYPE (and (vnot (INTTYPE (bitconvert (TYPE SRCREGCLASS:$src1)))),
                   (INTTYPE (MEMOP ADDRCLASS:$src2)))))]]>;

The "not" vs. "vnot" is unfortunate.  I could add another class argument that
says "instantiate with members of this list of operators" but see below about
arguments and the combinatorial explosion problem.  That and the fact that we
have no "foreach for subclass specification" makes this difficult to do.

(Thinking about this some more, a "cross product" operator [list x list] -> 
[list] could work.)

In any case, the lower classes take care of substituting the appropriate 
symbols based on the specific instruction generated ([v]PS, [V]PD, etc.).

I still don't know how to capture the hierarchy under 
sse1_sse2_avx_binary_scalar_xs_xd_node_pattern_rm_rrm and other such 
higher-level classes.  Right now it's generated by a Perl script but Chris
isn't enamored of that solution.  I think it can be better as well.

One thought I had was to implement a "copy arguments" feature in TableGen so
we could do something like this:

defm FsANDN : sse1_binary_scalar_xs_node_pattern_rm<
              0x55, 
              "andn", 
  [[(set DSTREGCLASS:$dst,
     (INTTYPE (and (not (INTTYPE (bitconvert (TYPE SRCREGCLASS:$src1)))),
                   (INTTYPE (MEMOP ADDRCLASS:$src2)))))]]>,
           sse2_binary_scalar_xd_node_pattern_rm<''>,
           avx_binary_scalar_xs_node_pattern_rm<''>,
           avx_binary_scalar_xd_node_pattern_rm<''>;

where "''" (two apostrophes) is a mnemonic for the "ditto" mark used in 
English (and other languages?).

This way we could define fewer base classes because we wouldn't have to
define intermediate base classes that just serve to aggregate other classes
in order to get us down to one class and thus one argument specification.

But there would still be a lot of classes to manually define.  Here's an 
incomplete list:

sse1_unary_scalar_xs_node_rm;           // For generic unary
sse1_unary_scalar_xs_node_pattern_rm;   // To use custom patterns
sse1_unary_scalar_xd_node_intrinsic_rm;                  // With an intrinsic
sse1_unary_scalar_xd_node_pattern_intrinsic_ipattern_rm; // Custom patterns

sse1_binary_scalar_xs_node_rm;  // Binary

plus the rest of the sse1 "xs rm" classes, the mr encodings, all the binary 
operations, all the sse2 classes (which look like the sse1 classes except they 
use "xd", all the vector classes, all the AVX classes, LRBni, etc.  We still 
have a combinatorial explosion problem.

Of course, we only have to define the ones we actually use and that cuts
down significantly on the numbers, but it's still large.

So I'm still looking for a complete solution.  Ideas welcome.

                             -Dave



More information about the llvm-dev mailing list