[llvm] [AArch64] Add MATCH loops to LoopIdiomVectorizePass (PR #101976)

Wed Dec 11 06:52:24 PST 2024

david-arm wrote:

> I think I'm missing something. We should never be reading beyond the end of the search or needle arrays (if we do, that's definitely a bug), so I don't think we would be prone to segfaulting. The idea was to use llvm.get.active.lane.mask to create a valid predicate for the loads, shouldn't that work?

Doesn't your predicate simply ensure you never load more data than the range of the loop, but it doesn't protect you against seg faults? It is possible to use masks to avoid seg faults, but you'd have to split your loops up page by page which is pretty complicated. Consider one of the example C codes in your test:

```
;   char* find_first_of(char *first, char *last, char *s_first, char *s_last) {
;     for (; first != last; ++first)
;       for (char *it = s_first; it != s_last; ++it)
;         if (*first == *it)
;           return first;
;     return last;
;   }
```

The scalar loop may find a match `if (*first == *it)` on the first iteration of your inner loop. Suppose the pointer `it` starts at the last byte of a page and (s_last - s_first) = 16. In the scalar version you'll exit the loop before cross the page boundary so you won't seg fault, but in the vector version you'll load all 16 bytes which cross a page boundary. If the second page isn't mapped in memory the version will seg fault. This is the fundamental problem with vectorising loops that contain early exits.

https://github.com/llvm/llvm-project/pull/101976