Loopy lets you easily generate the tedious, complicated code that is necessary to get good performance out of GPUs and multi-core CPUs.

Loopy’s core idea is that a computation should be described simply and then transformed into a version that gets high performance. This transformation takes place under user control, from within Python.

It can capture the following types of optimizations:

  • Vector and multi-core parallelism in the OpenCL/CUDA model
  • Data layout transformations (structure of arrays to array of structures)
  • Loopy Unrolling
  • Loop tiling with efficient handling of boundary cases
  • Prefetching/copy optimizations
  • Instruction level parallelism
  • and many more

Loopy targets array-type computations, such as the following:

  • dense linear algebra,
  • convolutions,
  • n-body interactions,
  • PDE solvers, such as finite element, finite difference, and Fast-Multipole-type computations

It is not (and does not want to be) a general-purpose programming language.


See the Loopy Documentation.


Having trouble with Loopy? Maybe the nice people on the Loopy mailing list can help.


See also the Installation section of the Documentation.

Download Loopy here

(Note that there is an extra period in Loopy’s name on the Python package index, compared to its module name.)

A link to a prebuilt binary is also available from the front page of loopy’s documentation.

Its git repository is available on


See conda forge for prebuilt packages of islpy and PyOpenCL.

Loopy is licensed under the liberal [MIT license] (http://en.wikipedia.org/wiki/MIT_License) and free for commercial, academic, and private use.