Generating an OpenCL Kernel using Textual Templating

A more advanced, but also less lightweight, alternative is the usage of a so-called templating engine, as it is being used to generate web pages.

This offers tremendous flexibility in generation, including the possibility for full flow control, allowing applications such as loop unrolling.

In the example below, we use a templating engine called 'Mako':

In [1]:
from mako.template import Template
In [2]:
tpl = Template(r"""
    __kernel void ${name}(${arguments})
    {
      int lid = get_local_id(0);
      int gsize = get_global_size(0);
      int work_group_start = get_local_size(0)*get_group_id(0);
      long i;

      for (i = work_group_start + lid; i < n; i += gsize)
      {
        %for i_unroll in range(n_unroll):
            ${operation};
            %if i_unroll + 1 < n_unroll:
                i += gsize;
            %endif
        %endfor
      }
    }
""", strict_undefined=True)
In [3]:
print(tpl.render(
    name="scale",
    arguments="float *y, float a, float *x",
    operation="y[i] = a*x[i]",
    n_unroll=2,
))
    __kernel void scale(float *y, float a, float *x)
    {
      int lid = get_local_id(0);
      int gsize = get_global_size(0);
      int work_group_start = get_local_size(0)*get_group_id(0);
      long i;

      for (i = work_group_start + lid; i < n; i += gsize)
      {
            y[i] = a*x[i];
                i += gsize;
            y[i] = a*x[i];
      }
    }

In [ ]: