Quite often, I hear complaints that coding for GPUs is difficult. In response to such comments, I believe that, for correct perspective, the discussion needs to be framed somewhat differently.
First of all, squeezing the last drop of performance out of modern CPUs is hard, too. Here’s a nice article on cache effects by Igor Ostrovsky that explains some of the phenomena one needs to take into account and the surprising things that can happen.
It just appears to me that on the CPU, fewer people care about good performance, whereas for GPUs, you admit that you do care simply by your choice of architecture. Not caring about CPU is not entirely unreasonable—you are somewhat likely to get ‘average’ performance even without detailed analyses. On the GPU on the other hand, carelessly written code is not as likely to perform well.
So, in summary, my belief is that both CPUs and GPUs can be equally difficult to understand, it’s just that the potential payoff of caring about performance is much greater on one than on the other.