[llvm] [LoopIdiomRecognizer] Implement CRC recognition (PR #79295)

Thu Mar 21 10:53:39 PDT 2024

joe-img wrote:

> > So the return value is the quotient? Doesn't that complicate the expression of CRC? What we need for CRC is surely something like premstep instead?
> 
> Yes, you're right. Actually, maybe we need both? In coremark after we emit crc8 intinsic, after inlining they combine to a larger crc (32 I think). To recognize that, we need to see that after doing 8-bit crc we continue with the quotient that we got, right? My memory is vague on this, so please correct me if I am wrong.
> 
> Sorry for all the noise. Looks like we only need the remainder after all. The identity is something like:
> 
> ```
> prem((prem(x, d, k_1) << s) || y), d, k_2) = prem((x << s ) || y, d, k_1 + k_2)
> ```

I'm a bit confused with all those arguments to prem. What do they represent?

Also, what is considered a 'step' in polynomial division?

In CoreMark, the helper function `ee_u16 crcu8(ee_u8 data, ee_u16 crc )` is used, but this is still crc16, as the generator polynomial and remainder are 16-bits wide. I think it's called 'crcu8' because it xor's in 8 bits of data.

If we're using a prem intrinsic, then surely it makes most sense for it to be something like `prem(divisor, dividend)`, where it returns the remainder after polynomial division? With this, I see two issues:
Firstly, for crcn, prem would have to take a divisor of bitsize 2n. CRC implementations avoid this by using a linear feedback shift register.

Secondly, I think this complicates the expression of crc. For example, I'm pretty sure that coremark's crcu8 will be represented like:

`(crc << 8) ^ prem((((crc & 0xff00) >> 8) ^ (data)) << 16), poly | (1 << 16))`

Although crc is the remainder of polynomial division accross some stream of data, the way it's implemented in crc, I don't think matches up very well with a polynomial remainder intrinsic.

> We can expand back into a loop (or unrolled loop) like expandMemCpyAsLoop

Is this necessary? If in LoopIdiomRecognize we're recognizing the loop, can we just avoid emitting the intrinsic using the same logic as would be in  `PreISelIntrinsicLowering` to decide whether to lower to loop?

https://github.com/llvm/llvm-project/pull/79295