That is, you should never prematurely optimize everything (i.e. convert everything from your sequential code into CUDA kernels). It's a lot of effort to convert programs into CUDA and you may be wasting a considerable amount of valuable time in something that will have negligible impact.
Benchmark everything and then strategically optimize portions where the program spends the most time on. Depending on algorithms, you may find that the program spends 90%+ of its time on a few select routines. If those routines are well suited for CUDA's SPMD (Single Program Multiple Data) architecture, you might reap huge benefits.
No comments:
Post a Comment