<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 3, 2016 at 1:18 PM, Mehdi Amini <span dir="ltr"><<a href="mailto:mehdi.amini@apple.com" target="_blank">mehdi.amini@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><span class=""><blockquote type="cite"><div>On Jan 26, 2016, at 3:59 PM, Daniel Berlin <<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>> wrote:</div><br><div><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jan 26, 2016 at 3:35 PM, Mehdi Amini <span dir="ltr"><<a href="mailto:mehdi.amini@apple.com" target="_blank">mehdi.amini@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">The hash should (almost?) never change.<div>I thought about this solution, and I plan on evaluating it a well (it applies to constants as well). </div><div>However there is a tradeoff of memory vs speed doing this.</div><div>For instance when doing LTO, we load a lot of metadata when loading and linking individual Modules, which put a lot of stress on the hash tables (both uniquing and grow). But then during optimization and CodeGen it shouldn’t and I’m not sure we want to pay the price of the memory overhead then.</div><div><br></div></div></blockquote><div>This naturally raises the questions of whether you really need hash tables to get what you want, or would some other more memory-efficient data structure be better for uniquing :)</div></div></div></div></div></blockquote><div><br></div></span><div>I think this is a great point.</div><span class=""><br><blockquote type="cite"><div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>While DenseMap is pretty good,  you do end up with a lot of empty buckets (or pay a high probing price).<br></div><div><br></div><div>Whereas, depending on the situation and structure, you can often pay a much lower cost (and have better behavior on hash misses). Obviously, in most cases, DenseMap should be the choice because it's consistent and who cares, but in specific situations if you are placing huge stress on it during say, uniquing, there are often better ways both in time and space (various compressed tries, ternary search trees, etc).</div></div></div></div></div></blockquote><div><br></div></span><div>Debug Info metadata are putting a lot of stress both on CPU time as it shows up on the profile quite frequently, and on the memory (even it is far better now than in the past following all the work driven by Duncan last year).</div><div>An LTO build of llvm-tblgen consumes at peak 130MB without debug info, and 740MB with debug info. Since you’re familiar with GCC are you aware of the </div><div><br></div></div></div></blockquote><div><br></div><div><br></div><div>A sentence got lost here :)</div><div><br></div><div>BTW, do you know the hash miss rate on the hash tables involved in uniqing?<br></div><div><br></div><div>(IE how many total lookups and how many times is the thing in the hash table)</div><div><br>If the hash hit rate is very high, more memory efficient hash tables are viable (sparse_hash_map), etc.</div><div>If the hash hit rate is not that high, it's likely other structures will be both faster and more efficient.</div><div><br></div><div><br></div></div></div></div>