[PATCH] D19401: MachineScheduler: Fully compare top/bottom candidates

Wed Jun 22 06:47:07 PDT 2016

tstellarAMD added a comment.

I would like to take a quick look at the performance stats with these updated before this is committed.  This shouldn't take me too long.


================
Comment at: test/CodeGen/AMDGPU/shl_add_constant.ll:77-78
@@ -76,4 +76,4 @@
 ; SI: s_add_i32 [[TMP:s[0-9]+]], [[Y]], [[SHL3]]
-; SI: s_addk_i32 [[TMP]], 0x3d8
-; SI: v_mov_b32_e32 [[VRESULT:v[0-9]+]], [[TMP]]
+; SI: s_add_i32 [[TMP2:s[0-9]+]], [[TMP]], 0x3d8
+; SI: v_mov_b32_e32 [[VRESULT:v[0-9]+]], [[TMP2]]
 ; SI: buffer_store_dword [[VRESULT]]
----------------
MatzeB wrote:
> tstellarAMD wrote:
> > This is a regression.
> Are you sure this is actually a regression: The different ordering in the function also gives me:
> -; NumSgprs: 8
> +; NumSgprs: 6
> when comparing the previous and new version!
Using 0-48 SGPRs gives allows for maximum occupancy, so saving 2 sgprs doesn't impact performance here.

I think this is probably uncovering a bug in somewhere else in the backend.


Repository:
  rL LLVM

http://reviews.llvm.org/D19401