[llvm] [RISCV] Construct constants via instructions if materialization is costly (PR #86926)

Sun Mar 31 20:04:20 PDT 2024

wangpc-pp wrote:

> > > > > For the vector case, I think it ends up cheaper if you always start by bitcasting the operand to an i8 vector, and doing an i8 popcount. That makes the constants involved significantly cheaper. (The width you use to do the arithmetic doesn't matter for correctness, but doing it in i8 means you need fewer vsetvli, I think). Then you can bitcast back to the wider type to do the multiply/shift.
> > > > 
> > > > 
> > > > I don't know if this is benefical. Using smaller EEW means that we can simplify the materialization but we may execute more uops I think, which may result in worse performance.
> > > 
> > > 
> > > Hopefully simple bitwise operations like shifts, ands, and adds don't get split based on element width.
> > 
> > 
> > Yeah, most instructions will be split to chunks of datapath length in most implementations, but I do know an implementation that doesn't.
> 
> Can you share what implementation that is?

Just checked, I was wrong as the implementation doesn't split shift/and/add based on EEW (It's a small core targeting for low-end IoT scenarios and vlen is really small). Sorry for that.
Besides, maybe we can do some optimizations for more possible operations if this `simple arithmetics don't get split based on EEW` assumption is right?

https://github.com/llvm/llvm-project/pull/86926