[LLVMbugs] [Bug 20006] New: uchar4 vector element-by-element LUT handled worse than 3.4.

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Wed Jun 11 13:05:13 PDT 2014


            Bug ID: 20006
           Summary: uchar4 vector element-by-element LUT handled worse
                    than 3.4.
           Product: clang
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: -New Bugs
          Assignee: unassignedclangbugs at nondot.org
          Reporter: simon.hosie at arm.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Created attachment 12640
  --> http://llvm.org/bugs/attachment.cgi?id=12640&action=edit
Function demonstrating regression.

Given a simple LUT loop which operates independently on each element of a
uchar4, trunk presses ahead with a bunch of inserts and extracts through a
temporary vector, along with a flurry of type conversion operations.

In contrast, Clang 3.4 appears to abandon the vector pretense at the outset and
takes the scalar values directly from the source pointer -- completing the work
in half the time.

GCC 4.8 appears to behave like Clang 3.4, but additionally packs the output
into a single 32-bit scalar register before writing.

This affects both amd64 and ARM.  Simply -Ofast to reproduce on amd64, and
additionally -mfpu=neon for ARM to ensure that SIMD operations are available.

You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20140611/0b3e4c7e/attachment.html>

More information about the llvm-bugs mailing list