<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Apr 23, 2015 at 2:34 PM, Rafael Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 23 April 2015 at 17:22, Sean Silva <<a href="mailto:chisophugis@gmail.com">chisophugis@gmail.com</a>> wrote:<br>

><br>

><br>

> On Thu, Apr 23, 2015 at 12:12 PM, Rafael Espíndola<br>

> <<a href="mailto:rafael.espindola@gmail.com">rafael.espindola@gmail.com</a>> wrote:<br>

>><br>

>> On 23 April 2015 at 14:17, Rui Ueyama <<a href="mailto:ruiu@google.com">ruiu@google.com</a>> wrote:<br>

>> > I think the patch for LLVM looks okay, but not sure for the other one.<br>

>> ><br>

>> > Your patch makes the linker to not be able to handle archive files<br>

>> > containing unaligned objects, or just makes it slower? If you cross-link<br>

>> > an<br>

>> > executable for machines generous for unaligned accesses, say x86, on<br>

>> > not-so-generous machines, PowerPC for example, does it link fine?<br>

>><br>

>> Not difference on X86 (we avoid the copy).<br>

><br>

><br>

> This has the potential to radically change LLD's physical/virtual memory<br>

> usage characteristics depending on LLVM_IS_UNALIGNED_ACCESS_FAST (LIUAF)<br>

> along with total memory traffic profile and disk access patterns. For<br>

> example, this patch causes the entire file to be faulted in and read up<br>

> front on !LIUAF whereas the file might be faulted and touched on disk<br>

> sparsely and/or in a random order when LIUAF. Realistically most<br>

> benchmarking and optimization work is going to happen on x86 and so<br>

> performance on !LIUAF is likely to "bit rot" (we currently don't have any<br>

> type of performance CI to avoid this; this is on my TODO list).<br>

><br>

> Have you tried copying the buffers on x86? Also, if you make sure that the<br>

> incoming archives are aligned so you can avoid the copy on ppc, how much<br>

> faster does it get? I.e. does (time saved from your patch on ppc)  == (time<br>

> copying buffers with your patch on ppc) + (time saved if we use aligned<br>

> archives and avoid the copy with your patch (for testing purposes))?<br>

><br>

> Can you dig in a bit deeper and figure out where this speedup is coming<br>

> from? As it stands right now, this patch seems like a very opportunistic<br>

> "seems to work on my machine" speedup.<br>

<br>

</div></div>At this I don't think ti is worth it. We don't support powerpc, which<br>

is why I had to do a cross linking to benchmark it.<br></blockquote><div><br></div><div>Could you at least test if eagerly faulting in/loading archives speeds up x86?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

The main issue is deleting a bunch of complicated dead code (on x86)<br></blockquote><div><br></div><div>Ok, that I agree with. However it seems like the ownership issue would be completely sidestepped by just using alignment 2 everywhere. It might not be any slower.</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

that just slows down other architectures.<br>

<br>

For what it is worth, gold copies data when the buffer is not<br>

sufficiently aligned, so this is know to work.<br>

<br>

Cheers,<br>

Rafael<br>

</blockquote></div><br></div></div>