[llvm] [DAGCombiner] Fix subvector extraction index for big-endian STLF (PR #180795)

Wed Feb 11 09:34:14 PST 2026

================
@@ -20831,8 +20831,16 @@ SDValue DAGCombiner::ForwardStoreValueToDirectLoad(LoadSDNode *LD) {
         if (!TLI.isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, InterVT))
           break;
 
----------------
Michael-Chen-NJU wrote:

> We entered an infinit loop in one of our down.stream testcases and needed to add this:
> 
> ```
>          if (!TLI.isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, InterVT))
>            break;
>  
> +        // Avoid infinite loop: Don't transform loads from fixed stack objects,
> +        // as legalization expands extract_subvector to such loads.
> +        SDValue LDBase = LD->getBasePtr();
> +        if (LDBase.getOpcode() == ISD::ADD)
> +          LDBase = LDBase.getOperand(0);
> +        if (LDBase.getOpcode() == ISD::FrameIndex)
> +          break;
> +
>          // In case of big-endian the offset is normalized to zero, denoting
>          // the last bit. For big-endian we need to transform the extraction
>          // to the last sub-vector.
>          unsigned ExtIdx = 0;
> ```

Hi @KennethHilmersson,

Thanks for the feedback! I've been testing the FrameIndex check and noticed it's a bit of a double-edged sword. While it effectively prevents the infinite loop you mentioned, it also blocks some highly beneficial STLF optimizations on X86 (e.g., in shuffle_chained_v16bf16), where we were previously able to eliminate stack spills/reloads entirely.

To find a more surgical way to break the loop without sacrificing these optimizations, could you share the specific testcase (or a reduced version) that triggers the infinite loop on your target? Perhaps we can refine the check or do you think this performance regression on X86 is an acceptable trade-off for ensuring safety against infinite loops across all targets?

https://github.com/llvm/llvm-project/pull/180795