[LLVMdev] Question on Fence Instruction
Duan, Yue Lu
duan11 at illinois.edu
Wed Oct 17 08:28:42 PDT 2012
The paper is "A Case for an SC-Preserving Compiler" from PLDI 2011. What I did is following their "naive SC preserving compilation", that restricts the compiler to do any reordering for potentially shared load/store instructions. The paper says the resulting code running on x86 machine (SC-preserving binary run on non-SC hardware) will get 22% slowdown comparing with a normally optimized code running on same machine (non-SC binary run on non-SC hardware). The experiment is to see how much performance will be lost by restricting the reordering of shared load/store instructions because of those disabled compiler transformations. The fences are removed from the assembly code because they are too costly so that the performance lost of compilation restriction can not be checked independently.
The result I get shows such reordering restriction in compilation only lead to 4% slowdown, way less than the paper's report. The reason could be that the compiler does not respect SC fences so unexpected reordering is done and lead to better performance. It could also be that their implementation is different than mine. I am not sure.
From: John Criswell [criswell at illinois.edu]
Sent: Wednesday, October 17, 2012 9:45 AM
To: Duan, Yue Lu
Cc: "陳韋任 (Wei-Ren Chen)"; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Question on Fence Instruction
On 10/17/12 9:21 AM, Duan, Yue Lu wrote:
> Thank you very much for the quick reply. I was trying to confirm what I did was correct. I did a test that could enable a simple way of sc-preserving compilation by inserting fences for every load/store instruction before any opts, applying standard opts and then removing them after assembly code generation. It turned out that such sc-preserving compilation only caused ~4% slowdown for 18 benchmarks on average on a Intel Xeon machine. The result surprised me a lot because it was reported that such naive way of compilation can cause 20% slowdown in a recent PLDI paper (they also use LLVM), so I posted this question. I will try to examine if the generated binary code really respects sc fences.
Perhaps I'm misunderstanding something, but why are you removing the
fences before code generation? I would think that removing the fences
would permit the hardware to re-order loads and stores in a way that
violates sequential consistency. In other words, while you've ensured
that the compiler doesn't do anything to violate sc, you're letting the
hardware violate sc.
Are you compiling for a machine that is sequentially consistent by default?
Also, to what PLDI paper are you referring?
-- John T.
> From: 陳韋任 (Wei-Ren Chen) [chenwj at iis.sinica.edu.tw]
> Sent: Wednesday, October 17, 2012 9:00 AM
> To: Duan, Yue Lu
> Cc: llvmdev at cs.illinois.edu
> Subject: Re: [LLVMdev] Question on Fence Instruction
> On Tue, Oct 16, 2012 at 01:44:57PM +0000, Duan, Yue Lu wrote:
>> I have a question with the latest released LLVM which supports Fence
>> Instruction as IR. Say if I intentionally place a Sequentially Consistent Fence
>> Instruction somewhere in the code, then would the other transformation passes
>> that applied later respect the Fence and do not perform any reordering across
> In theory, all optimization passes should respect sc. If you find any
> counter example, I think it's a bug.
> Wei-Ren Chen (陳韋任)
> Computer Systems Lab, Institute of Information Science,
> Academia Sinica, Taiwan (R.O.C.)
> Tel:886-2-2788-3799 #1667
> Homepage: http://people.cs.nctu.edu.tw/~chenwj
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
More information about the llvm-dev