[llvm-commits] Deterministic finite automaton based packetizer for VLIW architectures

Tue Nov 29 07:34:24 PST 2011

Thanks for the detailed feedback on the functionality, Jakob.

 > How big do these tables get for Hexagon? How many resources? How many 
DFA states?

For Hexagon, we get 41 states, 454 valid transitions, and 8 resources.

 > This file needs more concrete, less abstract documentation.

Sure, I'll add some more documentation to the file.

 > I also think you should move some of the methods to a .cpp file. If 
you can avoid the
 > #includes, that would be great.

Yeah, I went back and forth between a separate cpp file versus a header 
file. I went with the header file since the methods were quite small. 
I'll split them out into two files.

 > You are building CachedTable on demand. Is that really necessary? How 
much of the table is built in
 > a typical run? Would it be better/faster to just build the whole 
thing up front?

That was an interesting decision. So the reason why I implemented 
CachedTable is that currently the DFA generator constructs one common 
transition table for all versions (subtargets) of Hexagon. As a result, 
when we compile for a particular Hexagon subtarget, I noticed that only 
part of the transition table is typically used. I have plans to augment 
the DFA generator to create separate tables for each Hexagon subtarget 
and once I do, I will change this to load up the entire transition 
table. But for now CachedTable is useful.

 > You have 4 table lookups per DFA transition by my count. It seems 
that 1 is enough. Can the API be improved to allow that?

 > Could you get rid of the DFAStateEntryTable by renumbering your states?

 > s -> DFAStateEntryTable[s]

 > This would preclude the sparse transition matrix representation, of 
course.

If I understand correctly, these two questions are related. Let me 
answer them together. I can change the code so that there is one lookup 
per transition. That was my original design. However, most of the table 
entries were invalid transitions. So, while running TableGen, the I/O 
required to emit the table became a bottleneck. It significantly slowed 
down the Hexagon backend build time. Therefore I moved to a sparse 
matrix representation.

 > +      if (!isValidTransition(*SI, j)) {

 > +        continue;

 > +      }

 > +      else {

 >

 > Don't use else after continue;

Will do.

 > +  OS << "const unsigned int " << TargetName << "DFAStateEntryTable[] 
= {\n";

 > +  for (unsigned i = 0; i < states.size(); ++i) {

 > +    // Multiply i by 2 since each entry in DFAStateInputTable is a 
set of

 > +    // two numbers

 > +    OS << StateEntry[i] * 2 << ", ";

 > +  }

 >

 > How about using "const unsigned DFAStateInputTable[][2]" then?

Sure, I'll change that.

 > This string conversion is a bit strange. I think this is actually a 
case where

 > std::map<std::set<unsigned>, ...> makes sense. Other TableGen 
backends do that.

 >

I don't like the string conversion either; thanks for the suggestion. I 
have to think a little more on whether this will work and I'll try to 
change it.

 > Would a BitVector be better than set<unsigned>? You can still iterate.
 >

 > Would it be reasonable to limit the number of resources to 64, and 
just use a

 > uint64_t bit mask instead?

The bits here do not represent the resources. They represent the 
different ways that the current set of instructions in a packet can 
consume resources. If I restrict it to 64 bits, then it will become 
quite limiting. For instance, that will not work for any VLIW target 
that has 6 functional units and an instruction that can be scheduled on 
any of those 6 units.

I will post a new patch with all the changes we discussed.

-Anshu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111129/5975c03e/attachment.html>