[PATCH] D19401: MachineScheduler: Fully compare top/bottom candidates
Tom Stellard via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 22 06:47:07 PDT 2016
tstellarAMD added a comment.
I would like to take a quick look at the performance stats with these updated before this is committed. This shouldn't take me too long.
================
Comment at: test/CodeGen/AMDGPU/shl_add_constant.ll:77-78
@@ -76,4 +76,4 @@
; SI: s_add_i32 [[TMP:s[0-9]+]], [[Y]], [[SHL3]]
-; SI: s_addk_i32 [[TMP]], 0x3d8
-; SI: v_mov_b32_e32 [[VRESULT:v[0-9]+]], [[TMP]]
+; SI: s_add_i32 [[TMP2:s[0-9]+]], [[TMP]], 0x3d8
+; SI: v_mov_b32_e32 [[VRESULT:v[0-9]+]], [[TMP2]]
; SI: buffer_store_dword [[VRESULT]]
----------------
MatzeB wrote:
> tstellarAMD wrote:
> > This is a regression.
> Are you sure this is actually a regression: The different ordering in the function also gives me:
> -; NumSgprs: 8
> +; NumSgprs: 6
> when comparing the previous and new version!
Using 0-48 SGPRs gives allows for maximum occupancy, so saving 2 sgprs doesn't impact performance here.
I think this is probably uncovering a bug in somewhere else in the backend.
Repository:
rL LLVM
http://reviews.llvm.org/D19401
More information about the llvm-commits
mailing list