            Bug ID: 44395
           Summary: Invalid code generation for win32/32bit windows target
           Product: clang
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: LLVM Codegen
          Assignee: unassignedclangbugs at nondot.org
          Reporter: zamazan4ik at tut.by
                CC: llvm-bugs at lists.llvm.org, neeilans at live.com,
                    richard-llvm at metafoo.co.uk

clang(trunk) with -m32 on windows (win32 target) generates wrong code for the
following code ( https://godbolt.org/z/RKzU5M )

#include <vector>
#include <functional>
#include <emmintrin.h>

struct Id { unsigned handle; };
struct QueryView { __m128 val; unsigned i; };

extern void perform_query(std::function<void(QueryView &)>);

template<typename BT> void foo ( QueryView &qv, const BT & block)
        block( Id{qv.i}, _mm_setzero_ps() );

template<typename BT> void bar ( const BT & block)
        perform_query([&](QueryView& qv) { block(qv.val); });

void external_call ( QueryView & qv ) {
        foo(qv,  [&](Id id, const __m128 two)->void{ bar([&](__m128 &
arg)->void{arg = two;});});

the body of function std::function capture will (reasonably) make aligned read
on argument of __m128 type
While unaligned_stack_store will store __m128 register on stack unaligned,
tightly with Id structure.
This results in crash during runtime (unaligned load).
LLVM code (relevant part)

  %3 = alloca inalloca <{ %struct.Id, <4 x float> }>, align 4

Assembly disasm (relevant part):
        push    eax
        sub     esp, 16
        mov     eax, esp
        xorps   xmm0, xmm0
        mov     dword ptr [eax], ecx
        movups  xmmword ptr [eax + 4], xmm0 #unaligned!

Possible fix: https://reviews.llvm.org/D71915

