[PATCH] [clang] Optimized ASTNodeKind::isBaseOf

Samuel Benzaquen sbenza at google.com
Thu Oct 2 07:55:16 PDT 2014


I have been looking up different ways to do this more efficiently.
There are two reasons the current one is slow: while() loop and loading cold memory for AllKindInfo array.

This CL is one of the solutions I looked into. It removes the while() loop, but makes the caching behavior way worse.
At the very least, ParentDistances should be an array of uint8 to reduce the cache pressure here.

Other solutions I was thinking of require no external memory. That is, you can tell whether they are related (and the distance) from the node ids themselves.
I haven't benchmarked them to determine which one is better. I was waiting on it until isBaseOf() is the largest improvement I can make to get a better comparison.

A list of solutions:
 1 Caching the solution (something like this CL).
   It requires a half matrix of 1-bit per pair cell. ~290 types so ~5k of memory.
   Distance can be calculated as the difference of their depths. Can be stored in nodes or external. External is around ~1k.
   (The current CL creates a matrix of ~330k!)

 2 Something similar to dyn_cast<> where each node knows the range of their descendants.
   The 'last' id can be saved in a separate array or inside the node.
   The calculation becomes two comparisons.
   Distance can be calculated as the difference of their depths. Can also be stored in the nodes.
   If all stored in nodes, requires no external memory and no loops.

 3 Each node gets a unique sequential ID.
   Then the real node id is: NI(X) = NI(Parent(X)) << 8 + UID(X)
   These Ids can be easily generated at compile time.
   It requires a loop to find the answer, but no external memory.
   Distance is calculated in the loop.
 
 4 Each node gets a unique prime.
   Then the real node id is: NI(X) = NI(Parent(X)) * Prime(X)
   The operation becomes a simple modulo operation. No external memory.
   Distance comes from somewhere else.


I do not like (1) because we can solve this without the external memory.
(2) is fine, but was not easy to implement right now. Type class enums do not have firstXXX/lastXXX definitions.
I implemented (3) and it is not complicated. It still has the loop, though.
(4) is cute, but it might require some work to fit all the ids in a uint64. There are some deeply nested hierarchies.

http://reviews.llvm.org/D5577






More information about the cfe-commits mailing list