[llvm-commits] [llvm] r78492 - /llvm/trunk/utils/TableGen/AsmMatcherEmitter.cpp

Sat Aug 8 16:43:39 PDT 2009

On Aug 8, 2009, at 1:55 PM, Daniel Dunbar wrote:
>> +typedef std::pair<std::string, std::string> StringPair;
>
> Not that it matters, but at least in the context of this functionality
> I think this can use a StringRef; it always deals with substrings of
> the existing strings, right?

Sure, but it is actually more natural to express it without sub  
strings.  The code just uses pointers to the entries in the original  
array, it doesn't do string slicing and dicing.

>> +  for (unsigned i = 0, e = Matches.size(); i != e; ++i)
>> +    MatchesByLetter[Matches[i]->first[CharNo]].push_back(Matches 
>> [i]);
>> +
>> +
>> +  // If we have exactly one bucket to match, see how many  
>> characters are common
>> +  // across the whole set and match all of them at once.
>> +  // length, just verify the rest of it with one if.
>
> Edito.

Fixed.

> I think another similar simple optimization which can be done is to
> match common suffixes. See the code that gets generated for "st(0)",
> "st(1)", etc.

Ah, that could be interesting, match from both ends! :)

>> +/// EmitStringMatcher - Given a list of strings and code to  
>> execute when they
>> +/// match, output a simple switch tree to classify the input  
>> string.  If a
>> +/// match is found, the code in Vals[i].second is executed.  This  
>> code should do
>> +/// a return to avoid falling through.  If nothing matches,  
>> execution falls
>> +/// through.  StrVariableName is the name of teh variable to test.
>
> IMHO, we should just implement this as a (string -> unsigned) matcher.
> Thats very frequently the use case, and when it isn't you aren't
> necessarily worse off by using it as (string -> unsigned -> my generic
> code), and you end up with more readable code (instead of intertwining
> the generic actions with the matching code).
>
> This lets the matcher implement assorted fun optimizations, like:
>
> 1. { "0" -> a + b*0, "1" -> n + b*1, ... "9" -> n + b*9} to { "[0-9]"
> -> a + (char - '0') * b}.
>
> 2. { "foo" -> 1, "bar" -> 1, "baz" -> 1, etc -> 0 } into a hash match.

I'd rather just make it generic and implement those optimizations in  
the code generator as we discussed on IRC, but I really don't care  
that much.  I don't see a big reason to implement these in tblgen.  If  
you're going to implement them, it would make more sense to do them in  
one place that is used for all switches.

-Chris