Accelerating Dynamically-Typed Languages on Heterogeneous Platforms Using Guards Optimization
Scientific applications are ideal candidates for the “heterogeneous computing” paradigm, in which parts of a computation are “offloaded” to available accelerator hardware such as GPUs. However, when such applications are written in dynamic languages such as Python or R, as they increasingly are, things become less straightforward. The same flexibility that makes these lan- guages so appealing to programmers also significantly complicates the problem of automatically and transparently partitioning a program’s execution between a CPU and available accelerator hardware without having to rely on programmer annotations. A common way of handling the features of dynamic languages is by introducing speculation in conjunction with guards to ascertain the validity of assumptions made in the speculative computation. Unfortunately a single guard violation during the execution of “offloaded” code may result in a huge performance penalty and necessitate the complete re-execution of the offloaded computation. In the case of dynamic languages, this problem is compounded by the fact that a full compiler analysis is not always possible ahead of time.
This paper presents MegaGuards, a new approach for speculatively executing dynamic languages on heterogeneous platforms in a fully automatic and transparent manner. Our method translates each target loop into a single static region devoid of any dynamic features. The dynamic parts are instead handled by a construct that we call a mega guard which checks all of the speculative assumptions ahead of its corresponding static region. Notably, the advantage of MegaGuards is not limited to heterogeneous computing; because it removes guards from compute-intensive loops, the approach also improves sequential performance.
We have implemented MegaGuards along with an automatic loop parallelization backend in ZipPy, a Python Virtual Machine. The results of a careful and detailed evaluation reveal very significant speedups of more than 34x on average with a maximum speedup of up to 175x when compared to the original ZipPy performance as a baseline. These results demonstrate the potential for applying heterogeneous computing to dynamic languages.