[llvm-commits] [llvm] r78492 - /llvm/trunk/utils/TableGen/AsmMatcherEmitter.cpp
Evan Cheng
evan.cheng at apple.com
Sat Aug 8 16:15:40 PDT 2009
No. But can we give dag nodes string representation and use that for
matching?
Evan
On Aug 8, 2009, at 3:48 PM, Chris Lattner <clattner at apple.com> wrote:
>
> On Aug 8, 2009, at 3:19 PM, Evan Cheng wrote:
>
>> Nice. Can we use it to fix up dagisel?
>
> Does dagisel match strings anywhere?
>
> -Chris
>
>>
>> Evan
>>
>> On Aug 8, 2009, at 1:55 PM, Daniel Dunbar <daniel at zuster.org> wrote:
>>
>>> On Sat, Aug 8, 2009 at 1:02 PM, Chris Lattner<sabre at nondot.org>
>>> wrote:
>>>> Author: lattner
>>>> Date: Sat Aug 8 15:02:57 2009
>>>> New Revision: 78492
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=78492&view=rev
>>>> Log:
>>>> add a little function to do arbitrary string pattern matching in a
>>>> much more efficient way than a sequence of if's. Switch
>>>> MatchRegisterName
>>>> to use it. It would be nice if someone could factor this out to a
>>>> shared
>>>> place in tblgen :)
>>>
>>> Cool!
>>>
>>>> +typedef std::pair<std::string, std::string> StringPair;
>>>
>>> Not that it matters, but at least in the context of this
>>> functionality
>>> I think this can use a StringRef; it always deals with substrings of
>>> the existing strings, right?
>>>
>>>> +/// EmitStringMatcherForChar - Given a set of strings that are
>>>> known to be the
>>>> +/// same length and whose characters leading up to CharNo are the
>>>> same, emit
>>>> +/// code to verify that CharNo and later are the same.
>>>> +static void EmitStringMatcherForChar(const std::string
>>>> &StrVariableName,
>>>> + const std::vector<const
>>>> StringPair*> &Matches,
>>>> + unsigned CharNo, unsigned
>>>> IndentCount,
>>>> + raw_ostream &OS) {
>>>> + assert(!Matches.empty() && "Must have at least one string to
>>>> match!");
>>>> + std::string Indent(IndentCount*2+4, ' ');
>>>> +
>>>> + // If we have verified that the entire string matches, we're
>>>> done: output the
>>>> + // matching code.
>>>> + if (CharNo == Matches[0]->first.size()) {
>>>> + assert(Matches.size() == 1 && "Had duplicate keys to match
>>>> on");
>>>> +
>>>> + // FIXME: If Matches[0].first has embeded \n, this will be
>>>> bad.
>>>> + OS << Indent << Matches[0]->second << "\t // \"" << Matches
>>>> [0]-
>>>>> first
>>>> + << "\"\n";
>>>> + return;
>>>> + }
>>>> +
>>>> + // Bucket the matches by the character we are comparing.
>>>> + std::map<char, std::vector<const StringPair*> > MatchesByLetter;
>>>> +
>>>> + for (unsigned i = 0, e = Matches.size(); i != e; ++i)
>>>> + MatchesByLetter[Matches[i]->first[CharNo]].push_back(Matches
>>>> [i]);
>>>> +
>>>> +
>>>> + // If we have exactly one bucket to match, see how many
>>>> characters are common
>>>> + // across the whole set and match all of them at once.
>>>> + // length, just verify the rest of it with one if.
>>>
>>> Edito.
>>>
>>> I think another similar simple optimization which can be done is to
>>> match common suffixes. See the code that gets generated for "st(0)",
>>> "st(1)", etc.
>>>
>>>> + if (MatchesByLetter.size() == 1) {
>>>> + unsigned FirstNonCommonLetter = FindFirstNonCommonLetter
>>>> (Matches);
>>>> + unsigned NumChars = FirstNonCommonLetter-CharNo;
>>>> +
>>>> + if (NumChars == 1) {
>>>> + // Do the comparison with if (Str[1] == 'f')
>>>> + // FIXME: Need to escape general characters.
>>>> + OS << Indent << "if (" << StrVariableName << "[" << CharNo
>>>> << "] == '"
>>>> + << Matches[0]->first[CharNo] << "') {\n";
>>>> + } else {
>>>> + // Do the comparison with if (Str.substr(1,3) == "foo").
>>>> + OS << Indent << "if (" << StrVariableName << ".substr(" <<
>>>> CharNo << ","
>>>> + << NumChars << ") == \"";
>>>> +
>>>> + // FIXME: Need to escape general strings.
>>>> + OS << Matches[0]->first.substr(CharNo, NumChars) << "\")
>>>> {\n";
>>>> + }
>>>> +
>>>> + EmitStringMatcherForChar(StrVariableName, Matches,
>>>> FirstNonCommonLetter,
>>>> + IndentCount+1, OS);
>>>> + OS << Indent << "}\n";
>>>> + return;
>>>> + }
>>>> +
>>>> + // Otherwise, we have multiple possible things, emit a switch on
>>>> the
>>>> + // character.
>>>> + OS << Indent << "switch (" << StrVariableName << "[" << CharNo
>>>> << "]) {\n";
>>>> + OS << Indent << "default: break;\n";
>>>> +
>>>> + for (std::map<char, std::vector<const StringPair*> >::iterator
>>>> LI =
>>>> + MatchesByLetter.begin(), E = MatchesByLetter.end(); LI !=
>>>> E; ++LI) {
>>>> + // TODO: escape hard stuff (like \n) if we ever care about it.
>>>> + OS << Indent << "case '" << LI->first << "':\t // "
>>>> + << LI->second.size() << " strings to match.\n";
>>>> + EmitStringMatcherForChar(StrVariableName, LI->second, CharNo
>>>> +1,
>>>> + IndentCount+1, OS);
>>>> + OS << Indent << " break;\n";
>>>> + }
>>>> +
>>>> + OS << Indent << "}\n";
>>>> +
>>>> +}
>>>> +
>>>> +
>>>> +/// EmitStringMatcher - Given a list of strings and code to
>>>> execute when they
>>>> +/// match, output a simple switch tree to classify the input
>>>> string. If a
>>>> +/// match is found, the code in Vals[i].second is executed. This
>>>> code should do
>>>> +/// a return to avoid falling through. If nothing matches,
>>>> execution falls
>>>> +/// through. StrVariableName is the name of teh variable to test.
>>>
>>> teh -> the.
>>>
>>> IMHO, we should just implement this as a (string -> unsigned)
>>> matcher.
>>> Thats very frequently the use case, and when it isn't you aren't
>>> necessarily worse off by using it as (string -> unsigned -> my
>>> generic
>>> code), and you end up with more readable code (instead of
>>> intertwining
>>> the generic actions with the matching code).
>>>
>>> This lets the matcher implement assorted fun optimizations, like:
>>>
>>> 1. { "0" -> a + b*0, "1" -> n + b*1, ... "9" -> n + b*9} to
>>> { "[0-9]"
>>> -> a + (char - '0') * b}.
>>>
>>> 2. { "foo" -> 1, "bar" -> 1, "baz" -> 1, etc -> 0 } into a hash
>>> match.
>>>
>>>> +static void EmitStringMatcher(const std::string &StrVariableName,
>>>> + const std::vector<StringPair>
>>>> &Matches,
>>>> + raw_ostream &OS) {
>>>> + // First level categorization: group strings by length.
>>>> + std::map<unsigned, std::vector<const StringPair*> >
>>>> MatchesByLength;
>>>> +
>>>> + for (unsigned i = 0, e = Matches.size(); i != e; ++i)
>>>> + MatchesByLength[Matches[i].first.size()].push_back(&Matches
>>>> [i]);
>>>> +
>>>> + // Output a switch statement on length and categorize the
>>>> elements within each
>>>> + // bin.
>>>> + OS << " switch (" << StrVariableName << ".size()) {\n";
>>>> + OS << " default: break;\n";
>>>> +
>>>> +
>>>> + for (std::map<unsigned, std::vector<const StringPair*>
>>>>> ::iterator LI =
>>>> + MatchesByLength.begin(), E = MatchesByLength.end(); LI !=
>>>> E; ++LI) {
>>>> + OS << " case " << LI->first << ":\t // " << LI->second.size()
>>>> + << " strings to match.\n";
>>>> + EmitStringMatcherForChar(StrVariableName, LI->second, 0, 0,
>>>> OS);
>>>> + OS << " break;\n";
>>>> + }
>>>> +
>>>> +
>>>> + OS << " }\n";
>>>> +}
>>>> +
>>>> +
>>>> +
>>>> /// EmitMatchRegisterName - Emit the function to match a string to
>>>> the target
>>>> /// specific register enum.
>>>> static void EmitMatchRegisterName(CodeGenTarget &Target, Record
>>>> *AsmParser,
>>>> @@ -740,16 +871,20 @@
>>>> << AsmParser->getValueAsString("AsmParserClassName")
>>>> << "::MatchRegisterName(const StringRef &Name, unsigned
>>>> &RegNo) {\n";
>>>>
>>>> + std::vector<StringPair> Matches;
>>>> +
>>>> // FIXME: TableGen should have a fast string matcher generator.
>>>
>>> - FIXME? :)
>>>
>>>> for (unsigned i = 0, e = Registers.size(); i != e; ++i) {
>>>> const CodeGenRegister &Reg = Registers[i];
>>>> if (Reg.TheDef->getValueAsString("AsmName").empty())
>>>> continue;
>>>>
>>>> - OS << " if (Name == \""
>>>> - << Reg.TheDef->getValueAsString("AsmName") << "\")\n"
>>>> - << " return RegNo=" << i + 1 << ", false;\n";
>>>> + Matches.push_back(StringPair(Reg.TheDef->getValueAsString
>>>> ("AsmName"),
>>>> + "RegNo=" + utostr(i + 1) + ";
>>>> return false;"));
>>>> }
>>>> +
>>>> + EmitStringMatcher("Name", Matches, OS);
>>>> +
>>>> OS << " return true;\n";
>>>> OS << "}\n\n";
>>>> }
>>>
>>> Very cool, thanks.
>>>
>>> - Daniel
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list