[LLVMdev] Regular Expressions

Tue Jun 16 17:35:51 PDT 2009

On Monday 15 June 2009 14:35, Chris Lattner wrote:

> > I suppose you could argue that additional parameters specifying
> > the source and dest types could be passed, but why bother when
> > it is already encoded in the mnemonic?  That would just be
> > adding error-prone redundancy.
>
> Why not synthesize the opcode string from the information passed down?

That's actually how I started things out initially.  The problem is that it 
leads to a less intuitive specification.  I'll see if I can illustrate.

Say we have a wrapper class like this:

class X86ValueType {
    ValueType VT;
    RegisterClass RegClass;
    string suffix;
}

Now we instantiate some concrete types:

class X86_f32 : X86ValueType {
  let VT = f32;
  let RegClass = FR32;
  let suffix = "ss";
}

class X86_v4f32 : X86ValueType {
  let VT = v4f32;
  let RegClass = VR128;
  let suffix = "ps";
}

class X86_v8f32 : X86ValueType {
  let VT = v8f32;
  let RegClass = VR256;
  let suffix = "ps";
}

Ok, you get the picture.  Now let's look at how we would write instruction 
patterns:

defm BLENDPS  : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C, 
                  i32i8imm, "blend", "blend", "f32">;
defm BLENDPD  : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0D, 
                  i32i8imm, "blend", "blend", "f64">;

We have to send types as strings because we need to be able to munge them
for SSE vs. AVX.  Passing multiple types (v4f32, v8f32, etc.) is not
practical, it just obfuscates the top-level specification.

Now we need some guts:

multiclass sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<
  bits<8> opc,
  PatFrag ImmClass,
  string OpcodeString,
  string Intrinsic,
  string BaseType,
  list<list<dag>> ipatterns = [],
  string asm = ""
> {
  def rr_Int : ... 
  def rm_Int : ...
  def V#NAME#_128rrr_Int : fp_binary_vector_irrr<
    opc,
    // I'm not even sure this field reference will work
    !strconcat(OpcodeStr, 
               !cast<X86ValueType>(!strconcat("X86v??", 
                                              BaseType)).suffix,
    !strconcat(Intrinsic, 
               !strconcat("_", 
                          !cast<X86ValueType>(!strconcat("X86v??", 
                                                         BaseType)).suffix,
    cast<X86ValueType>(!strconcat("X86v??", BaseType)).VT,
    cast<X86ValueType>(!strconcat("X86v??", BaseType)).RegClass, // Src
    cast<X86ValueType>(!strconcat("X86v??", BaseType)).RegClass, // Dst
    [and probably some other stuff],
    ipatterns,
    asm
  > {
    let Prefix = TA;
    let HasOpSize = 1;
    let HasVEX = 1;
  }
  def V#NAME#_128rrm_Int : ...
  def V#NAME#_256rrr_Int : ... {
    let Prefix = TA;
    let HasOpSize = 1;
    let HasVEX = 1;
    let HasVEX_L = 1;
  }
  def V#NAME#_256rrm_Int : ...
}

Ok, that's the first level and right here we have a problem.  How do we figure 
out the vector length ("??" in the strings)?  We could pass it at the top 
level:

defm BLENDPS  : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C, 
                  i32i8imm, "blend", "blend", "f32", 4>;

Now 4 is not the right vector length for VEX.L-encoded AVX, so we'll have to 
munge it:

multiclass sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<
  bits<8> opc,
  PatFrag ImmClass,
  string OpcodeString,
  string Intrinsic,
  string BaseType,
  int BaseVL,
  list<list<dag>> ipatterns = [],
  string asm = ""
> {
  [...]
  def V#NAME#_256rrr_Int : fp_binary_vector_irrr<
    opc,
    // Not even sure this field reference will work
    !strconcat(OpcodeStr, 
               !cast<X86ValueType>(!strconcat(
                   // No support for cast to string yet, but not hard to add
                   // Need integer multiply support as well
                                   !strconcat("X86v", cast<string>(2*BaseVL)), 
                                              BaseType)).suffix,
    [...]
  > {
    let Prefix = TA;
    let HasOpSize = 1;
    let HasVEX = 1;
    let HasVEX_L = 1;
  }
  [...]
}

All right, we can make it work, but is it really worth the pain?  I'm really 
concerned about making the top-level instruction specification .td as readable 
as possible, since that's what people will be primarily editing.

So which is more intuitive and less error-prone?  

defm BLENDPS  : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C, 
                  i32i8imm, "blend", "blend", "f32", 4>;

or

defm BLENDPS  : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C, 
                  i32i8imm, "blendps", "blendps">;

(We can probably drop the Intrinsic string specification since in most cases
we can generate it from the opcode string).

Before answering, consider that the developer must understand how top-level 
parameters like type strings and vector lengths will be munged by the guts 
classes.  She needs to know this to know what values to pass for types and 
vector lengths.

The key problem is the differing type requirements for AVX vs. SSE.  There is
always some munging of some parameter required at some point.  Those 
parameters have to be strings since that's the only type in TableGen
flexible enough to have munging performed on it.  We could add lots of 
operators to munge other types but that would be needlessly complex and
expensive to implement.

Opcode strings are naturally strings.  They contain all of the typing 
information we need.  I think it is better to pass fewer arguments and
let TableGen do the hard work of figuring out the type requirements.

If you buy that argument, we have to have some way to match string patterns.
Regular expressions is a natural solution.

                               -Dave