[LLVMdev] A bug in LLVM-GCC 4.2 with inlining __exchange_and_add
Moshe Kravchik
mkravchik at hotmail.com
Sun Feb 3 02:07:39 PST 2013
Hi,
I have encountered an issue which seems to be a serious reproducible bug in LLVM-GCC 4.2. It can be reproduced by compiling the following C++ file that uses boost:#include "boost/statechart/event.hpp"
using namespace std;
class EvActivate : public boost::statechart::event< EvActivate >
{
public:
EvActivate(){}
private:
};
extern "C" const void* activate()
{
return (EvActivate()).intrusive_from_this().get();
}
The problem is that the generated assembler looks like:_activate:00000000 b5f0 push {r4, r5, r6, r7, lr}00000002 af03 add r7, sp, #1200000004 e92d0d00 stmdb sp!, {r8, sl, fp}00000008 ed2d8b10 vstmdb sp!, {d8-d15}0000000c b094 sub sp, #800000000e f2405088 movw r0, :lower16:__ZN5boost10statechart6detail9id_holderI10EvActivateE11idProvider_E-0x24+0xfffffffc00000012 2300 movs r3, #000000014 f2c00000 movt r0, :upper16:__ZN5boost10statechart6detail9id_holderI10EvActivateE11idProvider_E-0x24+0xfffffffc00000018 f2407140 movw r1, :lower16:0x770-0x2c+0xfffffffc0000001c f2c00100 movt r1, :upper16:0x770-0x2c+0xfffffffc00000020 f24052c8 movw r2, :lower16:__ZTV10EvActivate-0x34+0xfffffffc00000024 4478 add r0, pc00000026 f2c00200 movt r2, :upper16:__ZTV10EvActivate-0x34+0xfffffffc0000002a 9304 str r3, [sp, #16]0000002c 4479 add r1, pc0000002e 9005 str r0, [sp, #20]00000030 a803 add r0, sp, #1200000032 9006 str r0, [sp, #24]00000034 447a add r2, pc00000036 f8ddc018 ldr.w ip, [sp, #24]0000003a 3004 adds r0, #40000003c 9001 str r0, [sp, #4]0000003e 6808 ldr r0, [r1, #0]00000040 f1020108 add.w r1, r2, #8 @ 0x800000044 f8cc1000 str.w r1, [ip]00000048 f3bf8f5a dmb ishst0000004c 9901 ldr r1, [sp, #4]0000004e e8512f00 ldrex r2, [r1]00000052 9200 str r2, [sp, #0]00000054 441a add r2, r300000056 e8412c00 strex ip, r2, [r1]0000005a f1bc0f00 cmp.w ip, #0 @ 0x00000005e d1f6 bne.n 0x4e...
What happens in the code between 4e and 5e is an atomic check of a variable by the inlined __exchange_and_add. The problem is that the result read by ldrex is stored by the inline optimization on the stack for further use. However, as the atomically read variable is also on the stack and resides very close to this compiler-induced intermediate storage - the write hits the ERG. On Apple's A6X devices this reproduced consistently - the code entered a perpetual loop, as the str instruction at 0x52 caused the srtex at 0x56 to always fail and always return 1 and the following branch started it all over.
Generating such code violates the ARM recommendation:"For these reasons ARM recommends that:the Load-Exclusive and Store-Exclusive are no more than 128 bytes apartno explicit cache maintenance operations or data accesses are performed between the Load-Exclusive and the Store-Exclusive."
I've encountered this issue in a real code and would be glad to get the feedback on it. Please let me know if I need to submit a bug somewhere to get it resolved. I've found out that clang does not have this problem.
Moshe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130203/3e18e1b1/attachment.html>
More information about the llvm-dev
mailing list