[PATCH] D12154: [x86] invert logic for attribute 'FeatureFastUAMem'

Wed Aug 19 10:21:07 PDT 2015

RKSimon added inline comments.

================
Comment at: lib/Target/X86/X86.td:489
@@ -486,1 +488,3 @@
+                               FeatureFSGSBase, FeatureSlowUAMem]>;
 
+def : Proc<"geode",           [FeatureSlowUAMem, Feature3DNowA]>;
----------------
spatel wrote:
> RKSimon wrote:
> > You can drop FeatureSlowUAMem for BD targets - the AMD 15h SOG confirms that unaligned performance should be the same for aligned addresses and only +1cy for unaligned. It might be more complex for cache-line crossing but most targets will suffer there, not just BD.
> Thanks, Simon. Can we make the same argument for AMD 16H? I was planning to fix these up in the next patch and add test cases since that would be a functional change (FIXME at line 445).
Yes I'm happy for any changes to made in a followup patch.

Jaguar (16h) is definitely as fast for unaligned load/stores with aligned addresses and +1cy for unaligned.

IIRC Bobcat you could do fast unaligned loads (as long as the SSE unaligned flag was set). I think there was something about stores that you had to be careful with though. This is probably the same for all AMD 10h/12h families.


http://reviews.llvm.org/D12154