Well, it's early days yet (I'm just finishing up my coursework, and haven't gotten far into my research yet), but my proposal is to build generic collections data structures for general-purpose GPU programming (and related architectures). Single-threaded performance is hitting some pretty serious walls so parallel execution seems to be the best way forward, but memory layout can make orders-of-magnitude difference in performance and is hard to get right. My goal is to build something analogous to the STL to make it easier for the average programmer to program for GPU.