<div dir="ltr"><div>Does anyone have any insight into this problem? Is there a way to minimize excessive spill/fill for this kind of scenario?</div><div>Thanks,</div><div>Jason</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 6, 2016 at 10:44 AM, Jason <span dir="ltr"><<a href="mailto:thesurprises@gmail.com" target="_blank">thesurprises@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><span style="font-size:12.8px">Hi, I am using mcjit in llvm 3.6 to jit kernels to x86 avx2. I've noticed some inefficient use of the stack around constant vectors. In one example, I have code that computes a series of constant vectors at compile time. Each vector has a single use. In the final asm, I see a series of spills at the top of the function of all the constant vectors immediately to stack, then each use references the stack pointer directly:</span><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Lots of these at top of function:</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><div><span style="white-space:pre-wrap"> </span>movabsq<span style="white-space:pre-wrap"> </span>$.LCPI0_212, %rbx</div><div><span style="white-space:pre-wrap"> </span>vmovaps<span style="white-space:pre-wrap"> </span>(%rbx), %ymm0</div><div><span style="white-space:pre-wrap"> </span>vmovaps<span style="white-space:pre-wrap"> </span>%ymm0, 2816(%rsp) # 32-byte Spill<br></div></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Later on, each use references the stack pointer:</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><span style="white-space:pre-wrap"> </span>vpaddd<span style="white-space:pre-wrap"> </span>2816(%rsp), %ymm4, %ymm1 # 32-byte Folded Reload</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">It seems the spill to stack is unnecessary. In one particularly bad kernel, I have 128 8-wide constant vectors, and so there is 4KB of stack use just for these constants. I think a better approach could be to load the constant vector pointers as needed:</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><span style="white-space:pre-wrap"> </span>movabsq<span style="white-space:pre-wrap"> </span>$.LCPI0_212, %rbx</div><div style="font-size:12.8px"><span style="white-space:pre-wrap"> </span>vpaddd<span style="white-space:pre-wrap"> (</span>%rbx), %ymm4, %ymm1</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Thanks,</div><div style="font-size:12.8px">Jason</div></div>
</blockquote></div><br></div>