[LLVMdev] [x86] Prefetch intrinsics and prefetchw

Joshua Magee joshua_magee at playstation.sony.com
Thu Jul 30 12:46:05 PDT 2015


Hi,

I am looking at how the PREFETCHW instruction is matched to the IR prefetch intrinsic (and __builtin_prefetch).

Consider this C program:
char foo[100];
int bar(void) {
    __builtin_prefetch(foo, 0, 0);
    __builtin_prefetch(foo, 0, 1);
    __builtin_prefetch(foo, 0, 2);
    __builtin_prefetch(foo, 0, 3);

    __builtin_prefetch(foo, 1, 0);
    __builtin_prefetch(foo, 1, 1);
    __builtin_prefetch(foo, 1, 2);
    __builtin_prefetch(foo, 1, 3);

    *foo = 1;

    return foo[0];
}

The generated IR for the prefetches follow this pattern:

  tail call void @llvm.prefetch(i8* %0, i32 0, i32 0, i32 1)
  tail call void @llvm.prefetch(i8* %1, i32 0, i32 1, i32 1)
  tail call void @llvm.prefetch(i8* %2, i32 0, i32 2, i32 1)
  tail call void @llvm.prefetch(i8* %3, i32 0, i32 3, i32 1)
  tail call void @llvm.prefetch(i8* %4, i32 1, i32 0, i32 1)
  tail call void @llvm.prefetch(i8* %5, i32 1, i32 1, i32 1)
  tail call void @llvm.prefetch(i8* %6, i32 1, i32 2, i32 1)
  tail call void @llvm.prefetch(i8* %7, i32 1, i32 3, i32 1)

The generated x86_64 code for the first 4 calls, where the read/write parameter
is 0 (read) is exactly as expected:
(Generated with clang -O2 -S -march=btver2 test.c)
	prefetchnta	foo(%rip)
	prefetcht2	foo(%rip)
	prefetcht1	foo(%rip)
	prefetcht0	foo(%rip)

The question is what should be expected when the r/w parameter is 1 (write).
Currently the backend generates:
	prefetchnta	foo(%rip)
	prefetcht2	foo(%rip)
	prefetcht1	foo(%rip)
	prefetchw	  foo(%rip)

However, a different possibility would be for the r/w parameter to take
precedence over the locality parameter to generate:
	prefetchw	  foo(%rip)
	prefetchw	  foo(%rip)
	prefetchw	  foo(%rip)
	prefetchw	  foo(%rip)

The PREFETCHW instruction prefetches the L1 cache line and sets the cache-line
state to modified.  Since there is no PREFETCHW for higher-level cache-lines,
it is debatable what prefetch instruction should be generated when a write
prefetch is requested with a locality < 3.  One opinion is that the rw
parameter takes precedence over locality, therefore prefetch(a, 1, 1, 1) should
generate prefetchw and not prefetch2.  FWIW, this is what GCC appears to
do (write trumps locality.)

Not sure if there is a right/wrong here; what is the preferred behavior?

Thanks,
 - Josh




More information about the llvm-dev mailing list