10.2.8. Multiple OpenCL devices

While most systems will only have one OpenCL capable GPU installed, darktable is also able to make use of multiple devices in parallel. There is a configuration parameter which helps to optimize GPU priorities in that case.

It is important to understand how darktable uses OpenCL devices. Each processing sequence of an image – to convert an input to the final output using a certain history stack – is run in a so called pixelpipe. There are four different types of pixelpipe in darktable. One type is responsible to process the center image view (or full view) in darkroom mode, another pixelpipe processes the preview image (navigation window) top left in darkroom mode. Of each of these two pixelpipe there can be one at a time – with the full and the preview pixelpipe running in parallel. In addition there can be multiple parallel pixelpipes doing file exports and there can be multiple parallel pixelpipes generating thumbnails. If an OpenCL device is available darktable dynamically allocates it to one specific pixelpipe for one run and releases it afterwards.

The computational demand depends a lot on the pixelpipe type. Preview image and thumbnails have a low resolution and can be processed quickly; center image view is more demanding, let alone the pixelpipe doing a file export. If you have a reasonably fast GPU and want to get a low latency during interactive work, it is therefore important that your GPU is allocated to do the more demanding center image (full) pixelpipe, while the smaller preview image can be processed in parallel by the CPU. Older versions of darktable would therefore not allow the preview pixelpipe to grab any OpenCL device.

Starting with darktable 1.2 there is a more flexible scheme to allocate and prioritize your OpenCL device(s). Configuration parameter opencl_device_priority holds a string with the following structure:

a,b,c.../k,l,m.../o,p,q.../x,y,z...

Each letter represents one specific OpenCL device. There are four fields in the parameter string separated by a slash, each representing one type of pixelpipe. a,b,c... defines the devices that are allowed to process the center image (full) pixelpipe. Likewise devices k,l,m... can process the preview pixelpipe, devices o,p,q... the export pixelpipes and finally devices x,y,z... the thumbnail pixelpipes. An empty field means that no OpenCL device may serve this type of pixelpipe.

darktable has an internal numbering system, where the first available OpenCL device will receive number 0. All further devices are numbered consecutively. This number together with the device name is displayed when you start darktable with darktable -d opencl. You can specify a device either by number or by name (upper/lower case and whitespace do not matter). If you have more than one device – all with the same name – you need to use the device numbers in order to differentiate them.

A device specifier can be preceded by an exclamation mark !, in which case the device is excluded from processing this pixelpipe. You can also give an asterisk * as a wildcard, representing all devices not mentioned explicitly before in that group.

Sequence order within a group matters. darktable will read the list from left to right and whenever it tries to allocate an OpenCL device to a pixelpipe it will scan the devices in that order, taking the first free device it finds.

darktable's default setting for opencl_device_priority is:

*/!0,*/*/*

Any detected OpenCL device is allowed to process our center view image. The first OpenCL device (0) is not allowed to process the preview pixelpipe. As a consequence, if there is only one GPU owned by your system, preview pixelpipe will always be processed on CPU, keeping your single GPU exclusively for the more demanding center image view. This is reasonable and identical to the old behavior. No restrictions apply to export and thumbnail pixelpipes.

The default is a good choice if you have only one device. If you have several devices it forms a reasonable starting point. However, as your devices might have quite different levels of processing power, it makes sense to invest a few thoughts and optimize your priority list.

Here is an example. Let's assume we have a system with two devices, a fast Radeon HD7950 and an older and slower GeForce GTS450. darktable (started with darktable -d opencl) will report the following devices:

[opencl_init] successfully initialized.
[opencl_init] here are the internal numbers and names of 
                          OpenCL devices available to darktable:
[opencl_init]           0       'GeForce GTS 450'
[opencl_init]           1       'Tahiti'
[opencl_init] FINALLY: opencl is AVAILABLE on this system.

So the GeForce GTS 450 is detected as the first device; the Radeon HD7950 ('Tahiti') as the second one. This order will normally not change unless the hardware or driver configuration is modified. But it's better to use device names rather than numbers to be on the safe side.

As the GTS450 is slower than the HD7950, an optimized opencl_device_priority could look like:

!GeForce GTS450,*/!Tahiti,*/Tahiti,*/Tahiti,*

The GTS450 is explicitly excluded from doing the center image pixelpipe; this is reserved to all other devices (i.e. the HD7950/Tahiti). Completely the opposite for our preview pixelpipe. Here the Tahiti is excluded, so that only the GTS450 will be allowed to do the work.

For file export and thumbnail generation we want all hands on deck. However, darktable should first look if device Tahiti is free, because it's faster. If that's not the case, all other devices – in fact only the GTS450 – are checked.