# [LLVMbugs] [Bug 2485] New: Do all 4-element SSE shuffles in a maximum of two shuffle instructions

bugzilla-daemon at cs.uiuc.edu bugzilla-daemon at cs.uiuc.edu
Sun Jun 22 01:09:04 PDT 2008

```http://llvm.org/bugs/show_bug.cgi?id=2485

Summary: Do all 4-element SSE shuffles in a maximum of two
shuffle instructions
Product: new-bugs
Version: unspecified
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P2
Component: new bugs
AssignedTo: unassignedbugs at nondot.org
ReportedBy: sharparrow1 at yahoo.com
CC: llvmbugs at cs.uiuc.edu

Currently, shuffles for <4 x i32> and <4 x float> SSE/SSE2 vectors end up
falling back in the general case to doing three shuffle operations.  This is
unnecessary; all shuffles can be done with at most two shufps:

Suppose the elements all come from one vector. Then the result can be
calculated with a single shufps.  (This logic already exists in the code.)

Suppose there are at most two distinct elements that come from each source
vector.  Then we can use a shufps to make an intermediate vector with all the
elements we need in the final result.  This vector can then be rearranged with
a shufps to the final result.  (This logic already exists, but it doesn't
actually catch all the relevant cases.)

Otherwise, we must have three elements from one vector, call it X, and one
element from the other, call it Y.  First, use a shufps to build an
intermediate vector with the one element from Y and the element from X that
will be in the same half in the final destination (the indexes don't matter).
Then, use a shufps to build the final vector, taking the half containing the
element from Y from the intermediate, and the other half from X.

This might be something to stick into X86/README.txt, but it seems simple
enough to implement that it really should be just done.

--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

```