I am wondering that how does the LLVM PTX backend find out the constraints on executing GPU thread/block/grid size ( i.e. a block can at most have 1024 threads). Can anyone point me to the code ? I need information in the optimizer, how can I get it ? Thanks Xin