[LLVMdev] crash with large structure values on the stack

Wed Nov 26 09:37:55 PST 2014

While it would be good to get this bug fixed, I want to point out that
frontends are generally discouraged from emitting large aggregate loads and
stores. It's considered "more canonical" for the frontend to use
@llvm.memcpy to move this stuff around, and then load the individual
elements as needed. See past discussions with "daedal nix" about optimizer
problems in this area.

On Wed, Nov 26, 2014 at 9:10 AM, Dale Martin <Dale.Martin at mathworks.com>
wrote:

>  Hello,
>
> This example input crashes if you run it through llc on x86.
>
> [begin example]
>  ; ModuleID = 'test'
>
>  %struct_2 = type { [90000 x %struct_1] }
> %struct_1 = type { i8 }
>
>  define void @testFcn(%struct_2 %in1) {
> testFcn_entry:
>   %in1_ = alloca %struct_2
>   store %struct_2 %in1, %struct_2* %in1_, align 8
>   %localStruct_ = alloca %struct_2
>   store %struct_2 %in1, %struct_2* %localStruct_, align 8
>   br label %exit
>
>  exit:                                             ; preds =
> %testFcn_entry
>   ret void
> }
>  [end example]
>
>  It looks like at some stage of the backend compiler flow there is a
> "merge_values" instruction generated where the number of inputs exceeds
> 16k, but the number of inputs is stored in an unsigned short.  When this
> instruction is being translated into x86 machine code, then there is an out
> of bounds access:
>  ~> llc bug-simple.bc
> llc:
> /local/martind/oss/llvm-3.5.0.src/include/llvm/CodeGen/SelectionDAGNodes.h:649:
> llvm::EVT llvm::SDNode::getValueType(unsigned int) const: Assertion `ResNo
> < NumValues && "Illegal result number!"' failed.
>
>  Probably the truncation of NumOperands should be caught directly with an
> assertion in the SDNode constructor.
>
> One other interesting aspect of it is that if you make the struct_2 type a
> smaller matrix, like:
>  %struct_2 = type { [65534 x %struct_1] }
>
>  Then you don't get a crash, instead it takes about 20+ minutes to
> process and you get a lot of movb instructions out - like 200k or so.
> Callgrind says that 33% of the time consumed is directly in
> llvm::SUnit::ComputeHeight(). Clearly something has gone badly non-linear
> for this case.
>
>  I looked at what clang is doing for a similar construct and I see it
> generates a memcpy intrinsic instead of the direct load/store. I'm now
> doing that in my own front-end, but it seems like this is intended to be
> supported so I thought I'd report it.  It seems like turning this construct
> into memcpy as an optimization or part of the back-end lowering might be a
> better long-term approach.
>
>  I filed this as bug #21671 in bugzilla.
>
>  Let me know if you'd like more information or if I can help come up with
> a fix.  (I'm not familiar at all with the backend so I'd need some guidance
> to be of much help.)
>
> Thanks,
>   Dale Martin
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141126/bb2404d6/attachment.html>