GPU databases are limited primarily by memory constraints on the card (e.g. ~32G...

GPU databases are limited primarily by memory constraints on the card (e.g. ~32GB maximum per card for GV100 or whatever) and interconnect latency/bandwidth, not by raw parallel scan speed. If scan speed was all that mattered, we'd have had GPU-like parallel database hardware decades ago. You can crunch rows, but only as long as it fits in memory. Once your working set exceeds the provided RAM and has to page out data to the CPU over PCIe or some other link, the numbers and utilization begin looking much worse. "Every benchmark looks amazing when your working set fits entirely in cache."

But even more than that, for the price of a single high end Tesla (approx. 10k USD), you can build a high-end COTS x86 machine with a shitload of RAM, NVMe, and then install ClickHouse on it. That machine will scale to trillions of rows with ease and millisecond response times, whether or not everything fits in memory. It will cost less money and also cost less energy and it will scale out easier, and have better utilization of the hardware.

I'd wager that unless you have infinite money to dump on Nvidia or exceedingly specific requirements, any GPU database will get soaked by a comparable columnar OLAP store in every dimension.