darktable and OpenCL (updated)

posted on Fri 2 March 2012

Many readers will have already heard about GPU processing and the fact that darktable can make use of OpenCL to improve performance. As we still lack a detailed documentation of that topic, please find here a few explanations and howtos.

The Background

Processing high resolution images belongs to the more demanding tasks in modern computing. Both, in terms of memory requirements and in terms of CPU power, getting the best out of a typical 15, 20 or 25 Megapixel image can quickly bring your computer to its limits.

darktable’s requirements are no exception. Our decision to not compromise processing quality, has led to all calculations being done on 4 × 32bit floating point numbers. This is slower than “ordinary” 8 or 16bit integer algebra, but eliminates all problems of tonal breaks or loss of information.

A lot of hand optimization has been invested to make darktable as fast as possible. If you run a current version of darktable on a modern computer, you might not even notice any “slowness”. However, there are conditions and certain modules where you feel (or hear from the howling of your CPU fan) how much your poor multi-core processor has to struggle.

That’s where OpenCL comes in. OpenCL allows us to take advantage of the enormous power of modern graphics cards. It has been gamer’s demand for more and more highly detailed 3D worlds in modern ego shooters, that has fostered GPU development. ATI, NVIDIA and Co had to put enormous FPU processing power into their GPUs to meet these demands. The result is modern graphics cards with highly parallelized GPUs to quickly calculate surfaces and textures at high frame rates.

You are not a gamer and you don’t take advantage of that power? Well, then you should at least use it in darktable!

For the task of highly parallel floating point calculations modern GPUs are much faster than CPUs. That is especially true, when you want to do the same few processing steps over millions of items. Typical use case: processing of megapixel images.

How OpenCL works

As you can imagine, hardware architectures of GPUs can vary significantly. There are different producers, and even different generations of GPUs from the same producer may differ clearly. At the same time GPU manufacturers are normally not willing to disclose many hardware details of their products to the public. One of the known consequences is the need to use proprietary drivers under Linux, if you want to take full advantage of your graphics card.

Fortunately an industry consortium lead by The Khronos Group has developed an open, standardized interface called OpenCL. It eases the use of your GPU as a numerical processing device. OpenCL offers a C99-like programming language with a strong focus on parallel computing. An application that wants to use OpenCL will need to bring along a suited OpenCL source code that it then hands over to a hardware specific OpenCL compiler at run-time. This way the application can use OpenCL on different GPU architectures (even at the same time). All “hardware secrets” are hidden in this compiler and are normally not visible to the user (or the application). The compiled OpenCL code is loaded onto your GPU and – with certain API calls – it is ready to do calculations for you.

How to activate OpenCL in darktable

Using OpenCL in darktable requires that your PC is equipped with a suitable graphics card and that it has the required libraries in place. Namely modern graphics cards from NVIDIA and ATI come with full OpenCL support. The OpenCL compiler is normally shipped as part of the proprietary graphics driver; it is reachable as a dynamic library called “libOpenCL.so”. This library must be in a folder where it is found by your system’s dynamic linker.

When darktable starts, it will first try to find and load libOpenCL.so and – on success – check if the available graphics card comes with OpenCL support. A sufficient amount of graphics memory (1GB+) needs to be available to take advantage of the GPU. If that is OK, darktable tries to setup its OpenCL environment: a processing context needs to be initialized, a calculation pipeline to be started, OpenCL source code files (extension is .cl) need to be read and compiled and the included routines (called OpenCL kernels) need to be prepared for DT’s modules. If all that is done, the preparation is finished.

As we still regard darktable’s OpenCL support as experimental, we require the user in addition to positively activate OpenCL. Go into the preferences dialog and look for core options. Here you find a checkbox that says: “activate opencl support (experimental)”. Check that box and from that on OpenCL is used by darktable.

You can at any time switch it off and on again. Depending on the type of modules you are using, you will notice the effect as a general speed-up during interactive work and during export. Not all modules can take advantage of OpenCL at the moment and not all modules are demanding enough to make a noticeable difference. In order to feel a real difference, take modules like “shadows & highlights”, “sharpen”, “lowpass”, “highpass” or as an extreme case “equalizer”.

Let’s have a look at an example. I took an image of 20 MPx and processed it with a typical history stack for my way of working. This covers modules equalizer, tone curve, highpass and sharpen.

My computer is equipped with an i7-2600 CPU and an NVIDIA GeForce GTS 450 graphics card with 1GB memory. Core memory is 16GB.

For a single run of my pixelpipe in interactive mode (so called “full” pipeline), I get the following figures:

OpenCL not activated	0.76 seconds
OpenCL activated	0.11 seconds

This would be the typical delay, if you change a parameter or if you pan or zoom into the image.

With the same image and the same settings, I profiled the export pixelpipe when generating a JPEG file with full resolution. Here are the results:

OpenCL not activated	25.2 seconds
OpenCL activated	6.5 seconds

If you are interested in more profiling figures, you can call darktable with command line parameters -d opencl -d perf. After each run of the pixelpipe you will get a detailed allocation of processing time to each module plus an even more fine grained profile for all used OpenCL kernels.

Besides the speed-up you should not see any difference in the results between CPU and GPU processing. Except of rounding errors, the results are designed to be identical. If, for some reasons, darktable fails to properly finish a GPU calculation, it will normally notice and automatically (and transparently) fall back to CPU processing.

Possible Problems and Solutions

If severe OpenCL errors occur at run-time, or the setup of our OpenCL environment fails during initialization, OpenCL will be automatically deactivated. You will notice if you open the preferences dialog and the activation checkbox has been reset to “off”.

There can be various reasons why OpenCL failed. We depend on hardware requirements and on the presence of certain drivers and libraries. In addition all these have to fit in terms of maker model and revision number. If anything does not fit, e.g. your graphics driver (loaded as a kernel module) does not match the version of your libOpenCL.so, OpenCL support is likely to fail and CPU is taking over.

In that case, the best thing to do is start darktable from a console with

darktable -d opencl

This will give additional debugging output about the initialization and use of OpenCL. First see if you find a line that starts with “[opencl_init] FINALLY …” This should tell you, if OpenCL support is available for you or not. If initialization failed, look at the messages above for anything that reads like “could not be detected” or “could not be created”. Check if there is a hint about where it failed.

Here are a few cases observed in the past:

DT might tell you that no OpenCL aware graphics card is detected or that the available memory on your GPU is too low and the device is discarded. In that case you might need to buy a new card, if you really want OpenCL support.

DT might also tell you that a context could not be created. This often indicates a version mismatch between (loaded) graphics driver and libOpenCL. Check if you have left-over kernel modules or graphics libraries of an older install and take appropriate action. In doubt, make a clean reinstall of your graphics driver. Sometimes, immediately after a driver update, the loaded kernel driver does not match the newly installed libraries: reboot your system in that case.

DT might crash in very rare cases directly during startup. This can happen if your OpenCL setup is completely broken or if driver/library contains a severe bug. If you can’t fix it, you can still use darktable with option –disable-opencl, which will skip the entire OpenCL initialization step.

DT might on some systems fail to compile its OpenCL source files at run-time. In that case you will get a number of error messages looking like typical compiler errors. This could indicate an incompatibility between your OpenCL implementation and our interpretation of the standard. In that case visit us at darktable-devel@sourceforge.net and report the problem. Chances are good that we can help you. Please also report in case you see significant differences between CPU and GPU processing of an image!

There also exist a few on-CPU implementations of OpenCL. These come as drivers provided by INTEL or AMD. We observed that they do not give us any speed gain versus our hand-optimized CPU code. Therefore we simply discard these devices.

Summary

Although OpenCL support in darktable is still experimental and incomplete, it is already very usable. Give it a try and see what it can do for you!

[Update]

Here are a few more words about optimization of your OpenCL setup once it’s running. As a general rule, darktable tries to catch all OpenCL runtime errors and take appropriate action. Therefore OpenCL should normally not cause darktable to crash or give garbled output. Instead, in case of errors, DT will notice and reprocess everything again on CPU; an additional step which could slow-down processing significantly for you! Therefore it is worth investing some effort to avoid those errors.

The most limiting resource for OpenCL is GPU memory. Modern graphics cards might be equipped with 1GB or even 2GB RAM, but this is low compared to core memory and it is not too much if we want to do an export of a high resolution image. One further problem with GPU memory is the fact, that we do not know what is really free. At startup we will read from each OpenCL device the amount of available memory, but we can not take all of it. There is some (unknown) amount which the GPU driver will need for its overhead and for X11 video tasks. Trying to allocate more memory for our purposes than is available at a time will cause allocation failures and the pixelpipe to abort.

darktable’s escape route out of this limitation is “tiling”. Images that are too big are processed in smaller parts (rectangular tiles) one after the other and then combined again. This happens on a per-module basis, i.e. for each module that we want to process, a decision is taken if and how many tiles we will need.

Before going into the details, the above already makes clear that we should not process several images in parallel with OpenCL. We already make maximum use of GPU memory by tiling and the nature of GPU processing will already parallelize processing to the max on a pixel by pixel basis. No room for additional parallelization. In preferences set “export multiple images in parallel” to 1.

When you are running darktable with OpenCL support and if you suspect slow processing (namely during image exports), restart DT from a console with option -d opencl.

Watch out for modules that fail with an error message. Pay special attention to error code -4; this is the error we get when on-GPU memory allocation fails. Module “equalizer” is a hot candidate for this. Sometimes you might get a message on a module failing due to not fulfilled “roi” requests (esp. module “demosaic”). This can be ignored; it is a current darktable limitation but does not indicate any OpenCL problem.

If you get “-4” errors, go into file $HOME/.config/darktable/darktablerc, where DT stores its configuration parameters and look for opencl_memory_headroom. This value tells darktable how many megabytes (out of the totally available amount) should be left free for driver and video purposes. By default it is set to 300MB, which works well with current NVIDIA cards. If you increase this value (steps of 50 are a good choice), you even further reduce danger to run into allocation failures. On the negative side, this requires stronger tiling (more but smaller tiles) which is a bit less efficient. In the end you should rather accept more tiling than more allocation failures!

With current Radeon cards users have observed a different issue. Those cards will often only report to have less available memory than they physically own; typically 512MB out of 1GB. In the first place this will prevent them from being accepted as valuable OpenCL devices by DT (we set a minimum requirement of 768MB). You can change this behavior if you set opencl_memory_requirement to 512. The good news is that Radeon cards seem to have less memory overhead (at least within the reported 512MB). Therefore you can try to set opencl_memory_headroom to a value as low as 150 or even 100. This should leave you with a quite reasonable amount of free GPU memory for OpenCL processing. Give it a try and share your success stories at darktable-users@sourceforge.net.

Filed under: Blog Development Darktable OpenCL Tutorial

These are comments from the old website, archived as static HTML

Pierre on Sat Mar 03 02:50:43 2012:

Damn, I just ordered my new laptop but it has only 512MB of graphics memory... no hope at all to have a chance to use OpenCL with that?
upegelow on Sat Mar 03 08:45:10 2012:

There is a configuration parameter opencl_memory_limit in .config/darktable/darktablerc. You could try to set it to something below 512 but you might not get much fun out of it.
Ivan on Tue Mar 06 00:43:20 2012:

using DT from git, I obtain an error:

ivan@it-notebook ~/src/darktable/build $ /opt/darktable/bin/darktable -d opencl
[opencl_init] trying to load opencl library: ''
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] could not get platforms: -1001
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.

How can I investigate about this?

Thanks
upegelow on Tue Mar 06 21:34:33 2012:

Looks like an incomplete or inconsistent OpenCL installation to
me. libOpenCL.so was found and loaded but then was not able to do the next needed step, i.e. find the number of working OpenCL devices.

A few possible things to look at:

Search libOpenCL.so on your system. Assuming it is a symbolic link, follow it to find out what library it finally points to.

Most likely you find a revision number in that name. Check if this fits to the revision number of your graphics driver setup including the other relevant libraries and the kernel module!

Make sure, that the right kernel module is really loaded, not a left-over one from an older driver version. In case of NVIDIA cards, pay special attention that the free "nouveau" kernel driver is *not* loaded. In doubt, blacklist the nouveau driver.

Check if everything is complete. I for example run an NVIDIA setup under openSUSE. I need to install two RPMs to get everything working, one rpm (x11-video-nvidiaG02) for the X11 driver and a separte one (nvidia-computeG02) for GPU computing. Maybe it's the same for you.

Hope this helps.
Josh on Wed Mar 07 08:53:34 2012:

Many thanks to upeglow for the hints above that finally allowed me to get darktable running with opencl on my Arch system. Turns out I needed to add the cuda-toolkit package...
mobiphil on Wed Mar 07 16:52:30 2012:

fyi: http://www.phoronix.com/scan.php?page=news_item&px=MTA1Mzk
upegelow on Wed Mar 07 17:25:46 2012:

Good that you mention. For NVIDIA cards you need a working libcuda.so that matches your graphics driver. libcuda.so normally is a dynamic link that finally points to libcuda.so.abc.xyz where abc.xyz is your graphics driver revision number.

OpenCL on NVIDIA devices is implemented as a frontend for CUDA, NVIDIA's proprietary GPU computing interface.
upegelow on Wed Mar 07 17:26:18 2012:

Interesting! Certainly worth a trial. Do you have experiences?
tharkang on Fri Mar 09 01:34:55 2012:

ATI Mobility Radon HD 5730

When running I got error:

[opencl_build_program] could not build program: -11
BUILD LOG:
/tmp/OCLPfK5n3.cl(45): warning: global variable declaration is corrected by
the compiler to have addrSpace constant
const sampler_t sampleri = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;
^

/tmp/OCLPfK5n3.cl(136): error: identifier "M_PI" is undefined
if (H > 0.0f) H = H / (2.0f*M_PI);
^

/tmp/OCLPfK5n3.cl(137): error: identifier "M_PI" is undefined
else H = 1.0f - fabs(H) / (2.0f*M_PI);
^

/tmp/OCLPfK5n3.cl(149): error: identifier "M_PI" is undefined
float a = cos(2.0f*M_PI*LCH.z) * LCH.y;
^

3 errors detected in the compilation of "/tmp/OCLPfK5n3.cl".

Internal error: clc compiler invocation failed.

I edited it to M_PI_F and now it compiles, but I still get different errors ("could not create kernel `levels'! (-46)", "[default_process_tiling_cl] can not handle requested roi's. tiling for module 'demosaic' not possible.".
upegelow on Fri Mar 09 16:51:00 2012:

Looks like you are quite close to get it running. The compiler warning is most likely not a show stopper. The three M_PI related errors are already fixed in master. Either get it from sourceforge or wait for the 1.0rc2 tarball to be release (most likely in the next few days). The (in fact false) "levels" warning has also been removed. And as written above, the roi issue is not related to OpenCL and can be safely ignored.
tharkang on Sat Mar 10 23:32:48 2012:

Last time I also had a problem with one of atrous kernels (for equalizer - IMO most awesome tool in DT), but using git master everything seems to work perfectly. Thanks!
Dave on Thu Mar 15 06:57:40 2012:

Hi.

I'm on Ubuntu 12.04, Beta2. HP Pavilion dv7, 8 Gigs RAM, ATI 6700M 2 Gig graphics card

darktable 1.0 running on OpenCL is a joy. Lightning fast.

Thanks so much. It is a pleasure to use. On average processes are about 5x faster :D
papa on Wed Mar 28 17:29:16 2012:

Hi,

I have

fglrxinfo
display: :0.0 screen: 0
OpenGL vendor string: Advanced Micro Devices, Inc.
OpenGL renderer string: AMD Radeon HD 6570
OpenGL version string: 4.2.11318 Compatibility Profile Context

when I start ~$ darktable -d opencl I get this output:

[opencl_init] trying to load opencl library: ''
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] found 2 devices
[opencl_init] device 0 `Turks' supports image sizes of 8192 x 8192
[opencl_init] device 0 `Turks' allows GPU memory allocations of up to 256MB
[opencl_init] device 0: Turks
MAX_WORK_GROUP_SIZE: 256
MAX_WORK_ITEM_DIMENSIONS: 3
MAX_WORK_ITEM_SIZES: [ 256 256 256 ]

and more...

[opencl_init] discarding CPU device 1 `AMD Athlon(tm) 64 X2 Dual Core Processor 5600+' as it will not deliver any performance gain.
[opencl_init] successfully initialized.
[opencl_init] FINALLY: opencl is AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is ON.

and more...

it is a 2GB version of the AMD Radeon HD 6570, does it mean only 256 MB are used ?
upegelow on Fri Mar 30 18:06:53 2012:

Looks good. The numbers you see do not relate to available GPU memory. These are maximum dimensions for work groups in pixels.

DT does not display the amount of GPU memory at the moment (maybe we should add this). There is a small program called clTest which you can find on the net. It gives you all info on your OpenCL devices.
Rob. on Thu Apr 05 15:03:20 2012:

Thanks for the work on OpenCL and info here. I now have OpenCL running courtesy of a new graphics card and dt runs noticeably quicker. Card is a Radeon HD 6770A and does misreport memory, but reducing opencl_memory_requirement to 512 and opencl_memory_requirement to 100 as suggested works.
Coudy on Thu Apr 05 20:47:44 2012:

Hi, I have RV730 Radeon HD4670 1GB RAM, Intel Q8300, 4BG Ram.
Opencl is detected, but not enabled. Is there a chance to enable it ?

-> glxinfo | grep rende
direct rendering: Yes
OpenGL renderer string: ATI Radeon HD 4600 Series
GL_NV_conditional_render, GL_NV_copy_depth_to_color, GL_NV_copy_image,

-> grep opencl darktablerc
opencl_memory_headroom=300
opencl=TRUE
opencl_library=
opencl_memory_requirement=512
opencl_runtime=

-> darktable -d opencl
[opencl_init] trying to load opencl library: ''
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] found 2 devices
[opencl_init] discarding device 0 `ATI RV730' due to missing image support.
[opencl_init] discarding CPU device 1 `Intel(R) Core(TM)2 Quad CPU Q8300 @ 2.50GHz' as it will not deliver any performance gain.
[opencl_init] no suitable devices found.
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.
upegelow on Fri Apr 06 18:26:27 2012:

The relevant topic is this:

[opencl_init] discarding device 0 `ATI RV730′ due to missing image support.

This card lacks an important feature that ATI probably only offers with recent models. Without this feature darktable can't use that device. Unfortunately you are out of luck.
Ghis Decorte on Wed Apr 11 14:39:27 2012:

I have Darktable 1.0 running on an ASUS X53S with Intel i7, 8Gb and 2Gb graphics card. Operating system is Linux Mint 12.
This is exactly the kind of software that made me dump Windows!
Dt has rapidly become a daily instrument. Thanks for the good work to all who ever collaborated.
Michal on Thu Apr 12 14:30:51 2012:

I am trying to run the linux version of Darktable on Nvidia GT 525M (with Optimus), so I need to use optirun to enable the Nvidia card.

System: ubuntu 12.04, bumblebee 3 installed from its PPA for Precise Pangolin.

Now the error is:

$ optirun darktable -d opencl
[opencl_init] trying to load opencl library: ''
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] could not get platforms: -1001
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.

Any help?
Thanks in advance!
delic on Fri Apr 20 11:06:05 2012:

Hi,

has anyone succeded with an ATI card to get opencl working.

Have tried with an ati hd 85xx, on ubunbtu maverick, with 0.9.3, got this message :

[opencl_init] trying to load opencl library: ''
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] could not get platforms: -1001

Not sure if opencl is correctly installed, pehaps are there some nvidia files hidden somewhere, changed gpu.

also cannot find .config/darktable/darktablerc on my system.

thx
delic on Thu May 03 19:50:21 2012:

~$ darktable –d opencl
[export_job] exported to `/media/sda4/shots/darktable/render/20120428_0082-hdr.jpg'

could get it working opencl with a 560 gtx 2gb, on maverick buntu, 0.9.3, nice, but whith heavy hdr's files (made in darktable) darkroom mode shows white while displying correctly in the top left preview, vm memrory reaching 1.6 1.9 gb like.

with raw files it displays correctly.
Fernando on Wed May 30 21:06:13 2012:

Hi, I have tried to install bumblebee + cuda (http://askubuntu.com/questions/131506/how-can-i-get-nvidia-cuda-or-opencl-working-on-a-laptop-with-nvidia-discrete-car)
in a core i7 with the integrated card and a nvidia gpu, but when running darktable using opencl, the system does not recognice opencl as available.

Has someone used darktable+opencl in a laptop with bumbleblee?

THanks
Eric on Thu May 31 21:11:15 2012:

[opencl_init] trying to load opencl library: ''
[opencl_init] could not find opencl runtime library 'libOpenCL'
[opencl_init] no working opencl library found. Continue with opencl disabled
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.

Newest AMD 12.4 drivers with ubuntu 12.04 and hd6870. I should be able to get it working right?
Eric on Sat Jun 02 09:11:59 2012:

Hi,
Can anyone help me with this output?

[opencl_init] trying to load opencl library: ''
[opencl_init] could not find opencl runtime library 'libOpenCL'
[opencl_init] no working opencl library found. Continue with opencl disabled
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.
[pixelpipe_process] [export] using device -1
[pixelpipe_process] [export] using device -1

I have a hd6870 and installed the AMD graphic drivers.
upegelow on Sun Jun 03 12:01:06 2012:

Just check if you have /usr/lib/libOpenCL.so installed. If not you are probably missing part of the driver installation. I don't know AMD's setup, but for NVIDIA you need to install two packages. One for graphics and one for GPU computing.

Ulrich
kevin on Tue Jun 05 10:37:10 2012:

More on no OpenCL - can anyone please HELP!!!!!!

$ /opt/darktable/bin/darktable -d opencl
[opencl_init] trying to load opencl library: ''
[opencl_init] could not find opencl runtime library 'libOpenCL'
[opencl_init] no working opencl library found. Continue with opencl disabled
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.

$ locate libOpenCL
/usr/lib64/nvidia/libOpenCL.so.1
/usr/lib64/nvidia/libOpenCL.so.1.0.0

$ locate libOpenCL|xargs rpm -qf
xorg-x11-drv-nvidia-libs-295.33-3.fc16.x86_64
xorg-x11-drv-nvidia-libs-295.33-3.fc16.x86_64

where the RPM is sourced from rpmfusion-nonfree-updates and installed via yum
upegelow on Tue Jun 05 20:30:17 2012:

OK, the folder /lib/lib64/nvidia is a bit uncommon. Typically your ld.so (dynamic linker) on which DT relies will be configured to search in /lib, /usr/lib, /lib/lib64, ...
Just check if /lib/lib64/nvidia is found in one of the config files (either "/etc/ld.so.conf" or one of the files found in folder "/etc/ld.so.conf.d"). If a corresponding entry is missing, you need
to add it and re-run ldconfig as root.
kevin on Thu Jun 07 10:10:51 2012:

SOLVED!

upegelow - thanks for your suggestions, but the software was installed via RPM and I had not have a say about the file locations.

The problem is that the RPM does not create the canonical file libOpenCL which is then linked to the current version. From a suggestion by Tobias

ln -s /usr/lib64/nvidia/libOpenCL.so.1 /usr/lib64/nvidia/libOpenCL.so
ldconfig

and all is well.
jo on Sun Jun 10 04:41:25 2012:

i'm using that exact setup on a dell laptop, it works fine (when running darktable through optirun). bedfore i had installed bumblebee it was even simpler, X was running on the intel card and opencl on the nvidia device. if you can't run it, you have most likely not setup your driver/opencl correctly (e.g. look at /etc/OpenCL/vendors/nvidia.icd it should point to libcuda.so).
Matthias on Fri Jun 15 15:06:41 2012:

I had the same problem and reinstalling my nvidia drivers and the cuda toolkit didn't help. however, by creating the file nvidia.icd in /etc/OpenCL/vendors/ with the only content being libcuda.so, I got it working.
Paraita Wohler on Mon Jun 18 13:14:46 2012:

Having worked with OpenCL for a while I can say that error also happen on older NVidia cards not supporting certain operations (atomics) and while OpenCL might seem to work, you should check your graphic card's supported CUDA and OpenCL version.

For the problem encountered above, make sure LD_LIBRARY_PATH knows where libOpenCL.so is, in case the ld don't.
Michal on Tue Jul 10 12:03:04 2012:

Thanks to the advice from Mathias I got it working. Thank you very much!
Richard on Thu Aug 30 17:51:24 2012:

As already said, there's the configuration parameter opencl_memory_limit, *and* there is the parameter opencl_memory_headroom. As I understand it (I may be wrong), the sum of them must not be more than the memory displayed.

I just reduced opencl_memory_limit to 256 (from 768) and opencl_memory_headroom to 200 (from 300), and while that meant darktable had to push through tiles instead of the whole image, it still meant a *very* noticable speed-up, *and* darktable didn't eat up my CPU (which effectively made all of X11, including the mouse, to freeze up for some considerable time when I was exporting).
Michal on Mon Sep 24 09:16:13 2012:

Oh, it stopped working again on optimus with 304.43 nvidia drivers...
Chytzkoi on Mon Oct 01 19:56:15 2012:

Is there significant difference while using for example 4G of DDR3 memory instead of 2GB? Some benchmark would be good.
paul on Sat Oct 13 18:17:16 2012:

I read this article a while ago, but could not get opencl working then.
My AMD Graphic Card has 512MB and only shows 256MB when i start darktable with the '-d opencl' option, so at first i thought that this was the reason.
But today I tried changing my darktablerc settings to opencl_memory_requirement=256 and opencl_memory_headroom=64. When trying this with the latest development version my desktop froze every time I tried changing something in the shadows/higlights-module. So i went back to the stable version 1.0.5 and tried again - the results for opening and enlarging the same picture:
[dev_process_image] pixel pipeline processing took 0.324 secs (0.194 CPU) with opencl
[dev_process_image] pixel pipeline processing took 0.965 secs (2.251 CPU) without opencl

and best of all- until now no freezes (i haven't played around a lot, though).

I usually do not write comments, but your work is absolutely fantastic, and it is fun to play around with it - Thanks a lot for the work you are putting in this project!

Ok, just now darktable segfaulted and gave this message:
[opencl_atrous] couldn't enqueue kernel! -5 (might this be connected to my limited video memory?)
Never mind, it's fun to play around with this feature all the same
Matthias on Thu Oct 25 12:15:20 2012:

I ran into this problem again today after I updated my drivers, and it seems that the content of /usr/share/nvidia-current/nvidia.icd changed to libnvidia-opencl.so.1 . After some trial and error I found that the problem is fixed if the direct path of this new library is written into /etc/OpenCL/vendors/nvidia.icd (in my case this is /usr/lib/nvidia-current/libnvidia-opencl.so.304.60)
Michal on Tue Nov 13 09:54:16 2012:

worked like charm, thank you very much!
Michal on Tue Nov 20 13:19:23 2012:

I've been playing with PRIMUS to switch between the integrated card and my Optimus Nvidia card. It seems much more efficient than bare bumblebee (optirun), so I wonder if anybody got it to work with Darktable.

I managed to get it to work with other apps but not with Darktable (yet) for the usual "could not get platforms: -1001" error. Darktable works fine with Bumblebee though.

Anybody had more luck with it?
Mathew on Tue Nov 27 19:07:45 2012:

Hi, i have a Linux System with Mint 14 64Bit and AMD APU E-450 4GB RAM
Darktable couldnt start with opencl Support, i reduced the Variables step by step...but it wont help. The AMD E-450 APU support 512MB, AMD CommandCenter shows 384MB GPU RAM - ANY Suggestions ?

cache_memory=536870912
worker_threads=2
host_memory_limit=1500
opencl_memory_headroom=50
opencl_memory_requirement=256

darktable -d opencl
[opencl_init] trying to load opencl library: ''
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 2 devices
[opencl_init] discarding device 0 `Loveland' due to insufficient global memory (192MB).
[opencl_init] discarding CPU device 1 `AMD E-450 APU with Radeon(tm) HD Graphics' as it will not deliver any performance gain.
[opencl_init] no suitable devices found.
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.
[pixelpipe_process] [export] using device -1
DM on Sat Dec 29 11:06:22 2012:

I would guess:
[opencl_init] discarding device 0 `Loveland’ due to insufficient global memory (192MB).

is the key message here, since you have opencl_memory_requirement=256

An E450 is a pretty low-end thing to be running darktable on...

Cheers

David
Jarda-wien on Wed Jan 23 17:53:47 2013:

I finally got to measure some performance info.

SPECS:
Core i3 540@3.150
Radeon 5850 1GB (running default speeds)
8 Gigs of RAM
Camera Samsung NX10 (~15 megapixel images)
DT version 1.1.2, distro Arch, Catalyst 13.1, VRAM headroom droped to 150MB

CPU only results:

[dev_pixelpipe] took 0.090 secs (0.067 CPU) initing base buffer [export]
[dev_pixelpipe] took 0.014 secs (0.033 CPU) processing `white balance' [export]
[dev_pixelpipe] took 0.145 secs (0.537 CPU) processing `demosaic' [export]
[dev_pixelpipe] took 0.120 secs (0.427 CPU) processing `input color profile' [export]
[dev_pixelpipe] took 0.045 secs (0.157 CPU) processing `crop and rotate' [export]
[dev_pixelpipe] took 9.545 secs (36.383 CPU) processing `equalizer' [export]
[dev_pixelpipe] took 0.060 secs (0.210 CPU) processing `tone curve' [export]
[dev_pixelpipe] took 0.092 secs (0.347 CPU) processing `levels' [export]
[dev_pixelpipe] took 1.881 secs (2.557 CPU) processing `highpass' [export]
[dev_pixelpipe] took 0.188 secs (0.663 CPU) processing `output color profile' [export]
[dev_pixelpipe] took 0.461 secs (1.633 CPU) processing `vignetting' [export]
[dev_pixelpipe] took 0.046 secs (0.167 CPU) processing `gamma' [export]

OpenCL results:

[dev_pixelpipe] took 0.090 secs (0.067 CPU) initing base buffer [export]
[dev_pixelpipe] took 0.014 secs (0.033 CPU) processing `white balance' [export]
[dev_pixelpipe] took 0.145 secs (0.537 CPU) processing `demosaic' [export]
[dev_pixelpipe] took 0.120 secs (0.427 CPU) processing `input color profile' [export]
[dev_pixelpipe] took 0.045 secs (0.157 CPU) processing `crop and rotate' [export]
[dev_pixelpipe] took 9.545 secs (36.383 CPU) processing `equalizer' [export]
[dev_pixelpipe] took 0.060 secs (0.210 CPU) processing `tone curve' [export]
[dev_pixelpipe] took 0.092 secs (0.347 CPU) processing `levels' [export]
[dev_pixelpipe] took 1.881 secs (2.557 CPU) processing `highpass' [export]
[dev_pixelpipe] took 0.188 secs (0.663 CPU) processing `output color profile' [export]
[dev_pixelpipe] took 0.461 secs (1.633 CPU) processing `vignetting' [export]
[dev_pixelpipe] took 0.046 secs (0.167 CPU) processing `gamma' [export]

I don't know if I got the right part of the log out, but you can definitely see a huge improvement in total time to process the image. It seems there are plugins that take longer on the GPU though. How come?
Leon on Thu Jan 24 20:44:43 2013:

The box is selected, but "actvate opecl support" is greyed out on OSX 10.6.8 with ATI Radeon HD 2600XT with 256 MB.
I assume the the problem is ATI card with insufficient memeory. Is it correct?
mogurakun on Thu Jan 31 10:06:36 2013:

Correct, additionally Mac OS X before 10.7 doesn't support images in OpenCL, which is a must for Darktable.