[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Justin Holewinski
justin.holewinski at gmail.com
Sat Nov 10 06:37:09 PST 2012
Perhaps "compatibility" is the wrong term to use here. For now, I would
like to "match" what the vendor compiler does. I don't think using
preferred alignment would hurt anything in terms of correctness, but I need
to go through the entire back-end to see what effects it could have on
performance (e.g. adding extra padding increases local memory usage). It
could be a complete non-issue for all I know right now.
On Sat, Nov 10, 2012 at 1:37 AM, Duncan Sands <baldrick at free.fr> wrote:
> Hi Justin,
>
>
> On 09/11/12 22:49, Justin Holewinski wrote:
>
>> Test cases exist under test/CodeGen/NVPTX (name changed in May).
>>
>
> I've deleted the empty PTX directory.
>
>
> Now that I'm
>
>> back at NVIDIA, I'm going to be running through the bugzilla issues
>> (thanks
>> Dmitry for the reports!). I have practically the exact same patch here
>> in my
>> queue. :)
>>
>> In this case, I would prefer ABI alignment for compatibility with the
>> vendor
>> compiler.
>>
>
> I don't really understand this argument. If the vendor compiler is
> aligning to
> 4 (say) then some globals will have address a multiple of 4, some will have
> address a multiple of 8, some will have address a multiple of 16 etc,
> depending
> on the accidents of just where in memory they happen to be placed. For
> example, if you have two 4 byte globals that follow each other in memory,
> and
> that are 4 byte aligned, then if the first one has address a multiple of 4
> then
> the second will have address a multiple of 8. In short lots of variables
> will
> be 8 byte aligned by accident. If LLVM gives them all an alignment of 8,
> what
> does that change? OK, I will now admit that there is an effect if
> assumptions
> are being made about globals being placed next to each other: if you
> declare
> two globals A and B immediately after each other in the IR then the LLVM
> semantics doesn't guarantee that they will be laid out one immediately
> after
> the other in memory. But that's how it happens in practice so maybe people
> are (wrongly) relying on that. Bumping up the alignment to a multiple of 8
> may add extra padding between A and B, causing B to not be at the position
> that
> such naughty people are expecting.
>
>
> It should work either way, but I do need to audit the codebase and
>
>> tie up any issues here.
>>
>
> The IR optimizers already bump the alignment of some globals up to the
> preferred alignment, check out enforceKnownAlignment in Local.cpp (it ends
> up being called from instcombine).
>
> Ciao, Duncan.
>
--
Thanks,
Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121110/dc2fcbae/attachment.html>
More information about the llvm-dev
mailing list