[LLVMdev] A bug in LLVM-GCC 4.2 with inlining __exchange_and_add

Sun Feb 3 02:07:39 PST 2013

Hi,
I have encountered an issue which seems to be a serious reproducible bug in LLVM-GCC 4.2. It can be reproduced by compiling the following C++ file that uses boost:#include "boost/statechart/event.hpp"

using namespace std;

class EvActivate : public boost::statechart::event< EvActivate >
{
public:
    EvActivate(){}

private:
};

extern "C" const void* activate()
{
    return (EvActivate()).intrusive_from_this().get();
}
The problem is that the generated assembler looks like:_activate:00000000	    b5f0	push	{r4, r5, r6, r7, lr}00000002	    af03	add	r7, sp, #1200000004	e92d0d00	stmdb	sp!, {r8, sl, fp}00000008	ed2d8b10	vstmdb	sp!, {d8-d15}0000000c	    b094	sub	sp, #800000000e	f2405088	movw	r0, :lower16:__ZN5boost10statechart6detail9id_holderI10EvActivateE11idProvider_E-0x24+0xfffffffc00000012	    2300	movs	r3, #000000014	f2c00000	movt	r0, :upper16:__ZN5boost10statechart6detail9id_holderI10EvActivateE11idProvider_E-0x24+0xfffffffc00000018	f2407140	movw	r1, :lower16:0x770-0x2c+0xfffffffc0000001c	f2c00100	movt	r1, :upper16:0x770-0x2c+0xfffffffc00000020	f24052c8	movw	r2, :lower16:__ZTV10EvActivate-0x34+0xfffffffc00000024	    4478	add	r0, pc00000026	f2c00200	movt	r2, :upper16:__ZTV10EvActivate-0x34+0xfffffffc0000002a	    9304	str	r3, [sp, #16]0000002c	    4479	add	r1, pc0000002e	    9005	str	r0, [sp, #20]00000030	    a803	add	r0, sp, #1200000032	    9006	str	r0, [sp, #24]00000034	    447a	add	r2, pc00000036	f8ddc018	ldr.w	ip, [sp, #24]0000003a	    3004	adds	r0, #40000003c	    9001	str	r0, [sp, #4]0000003e	    6808	ldr	r0, [r1, #0]00000040	f1020108	add.w	r1, r2, #8	@ 0x800000044	f8cc1000	str.w	r1, [ip]00000048	f3bf8f5a	dmb	ishst0000004c	    9901	ldr	r1, [sp, #4]0000004e	e8512f00	ldrex	r2, [r1]00000052	    9200	str	r2, [sp, #0]00000054	    441a	add	r2, r300000056	e8412c00	strex	ip, r2, [r1]0000005a	f1bc0f00	cmp.w	ip, #0	@ 0x00000005e	    d1f6	bne.n	0x4e...
What happens in the code between 4e and 5e is an atomic check of a variable by the inlined __exchange_and_add. The problem is that the result read by ldrex is stored by the inline optimization on the stack for further use. However, as the atomically read variable is also on the stack and resides very close to this compiler-induced intermediate storage - the write hits the ERG. On Apple's A6X devices this reproduced consistently - the code entered a perpetual loop, as the str instruction at 0x52 caused the srtex at 0x56 to always fail and always return 1 and the following branch started it all over. 
Generating such code violates the ARM recommendation:"For these reasons ARM recommends that:the Load-Exclusive and Store-Exclusive are no more than 128 bytes apartno explicit cache maintenance operations or data accesses are performed between the Load-Exclusive and the Store-Exclusive."
I've encountered this issue in a real code and would be glad to get the feedback on it. Please let me know if I need to submit a bug somewhere to get it resolved. I've found out that clang does not have this problem.

Moshe

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130203/3e18e1b1/attachment.html>