<div dir="ltr"><div dir="ltr">On Fri, Apr 9, 2021 at 12:01 PM antlists via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Could you do a dynamic hash instead? In other words grow the hash <br>

instead of starting over - what's going to be the cost of retrying as <br>

opposed to growing?<br></blockquote><div><br></div><div>At a really high level, I believe starting the insertion over has the same algorithmic complexity as rehashing with the partially filled table, it's all O(n). Eventually you build a table of size n, and along the way you have to build a table of size n/2, n/4, n/8..., and if you sum up all the work to build those tables, you get 2n, or O(n).</div><div><br></div><div>At a lower level, for common inputs, rehashing could be faster by a constant factor. Suppose there are n input symbols and m unique output symbols, and m is significantly smaller than n. Rehashing only takes O(m) time, but reinserting takes O(n) time. The overall work is still O(n). However, rehashing requires a much fancier concurrent data structure, probably with higher constant factor overhead. You may gain as much as you lose. And you open yourself up to concurrency bugs. Even if you take a tested concurrent data structure off the shelf, they can be tricky to use correctly.</div><div><br></div><div>So, at least for PDB global type hashing, I felt it was better to implement a less general data structure that only does bulk inserts followed by lookups. I also knew how many insertions I was going to do, so I could also conservatively overallocate a big hash table and sidestep rehashing. We can't quite do the same for symbols because symbol resolution can discover new inputs.</div></div></div>