Daniel Playfair Cal’s Blog

The voice referendum

2023-10-11T18:27:00+11:00

Australia is having a referendum on October 14 about amending the Australian constitution to establish an Aboriginal and Torres Strait Islander voice to government. This feels like it has crept up on me - I regret to say that I haven’t discussed it with as many of my friends and family as much as I would have liked to. But I feel strongly that Australia should embrace the voice by voting yes in the referendum, and here I will explain why as best I can.

Australia as it is today was populated by Aborigines for at least 60,000 years before it was invaded and colonised by the British in 1788, 235 years ago. The British colonists spoke of “Terra Nullius” - the lie that the land wasn’t inhabited, and that what they were doing was not an invasion. Following this there was a long period of armed conflict, in the form the Frontier Conflicts, mostly in the early 1800s. The colonial forces were sometimes government military/police organisations, and sometimes private forces. Private individuals were often rewarded with land grants for the work of driving out and/or massacring the local indigenous population. A great read on this subject for me was the wikipedia article on the Bathurst War, fought between the Wiradjuri people and the colonisers. It’s an area that’s familiar to me, and the article tells the story very well. If you get through it, also check out the one on Abercrombie House - built on the 3200 acres given to the head of NSW Police at the time, who oversaw the massacres. That one is just an article about some loverly colonial architecture, but hits a different way when you read them in that order.

Having killed an estimated minimum of 100,000 Indigenous people in the Frontier wars, the Australian state began stealing the children of Aboriginal families in what is now known as the Stolen Generations, to be raised in missions or in white families. This was done to somewhere between 20,000 and 100,000 children between 1905 and 1967, which is 56 years ago. Many of these children are alive today.

I’m not personally responsible for doing any of these things and probably neither are you. But we all share responsibility for how Australia operates today, which includes how it moves forward from this history. That includes acknowledging the history, recognising the impact it has on how Australia is today, and helping Australia to do what is right today.

Despite the struggle to survive in colonised Australia, Aboriginal and Torres Strait Islander cultures are still active today. They have languages, traditions, and communities that have lived through colonisation. Also, the Australian government still makes decisions that affect them. Sometimes explicitly and deliberately, through policies like native title laws or the Northern Territory intervention, and sometimes implicitly or accidentally, for example in the provision of government services like COVID vaccines to areas with a high proportion of indigenous people. These are practical decisions that our government makes today, which it could make better with the direct input of first nations people.

The Uluru statement from the heart is an open letter from the first nations people of Australia to all other Australians. It calls for a constitutionally enshrined voice. It calls for Australia and first nations people to come together after a struggle, and to build a fair and truthful relationship.

I think considering that we still share this land with it’s original inhabitants and owners, the voice is a fair and modest request. And considering the grotesque treatment that Australia has subjected first nations people to, the offer to come together is a brave and generous one. It’s an offer to recognise our best intentions and to work with them.

The voice proposal was made with a consultative process with first nations people across the country. According to two polls, it is supported by about 80% of aboriginal people. It gives first nations people a guaranteed ability to speak to our government, which is a very powerful thing. It does not provide any direct decision making power. The referendum question and the amendment to the constitution are both very short and simple and you can read them here.

I think the choice to have a referendum is the right one. The Uluru Statement calls for the voice to be enshrined in the constitution. This makes sense. Having only the right to speak, the power of the voice would come from people listening to it, and particularly from the government’s awareness that the population wishes for it to be listened to. It’s important that it exists with the explicit support of Australians. If it is born from that explicit support (via a referendum), then it follows that it should be enshrined in the constitution, because its existence would be a decision by the Australian people, not open for governments to change without popular support.

I think it’s also the right call that the constitutional amendment doesn’t specify the exact model by which the voice will operate - i.e. things like the process for choosing representatives. That structure is something that will need to evolve, and it isn’t obvious that it would be perfect from day 1 - i.e. it’s not something which could be worked out with a binary question in a referendum. A yes vote in the referendum isn’t a vote for a particular model, it’s a vote for the existence of a voice. A yes result is Australia making a commitment that we will find a model that works.

The decision we make in the referendum is powerful and it will be respected. There are two choices only. Neither choice is the default, and we are each responsible for the choice we make and its predictable impact - both symbolic and practical. We will decide if Australia welcomes the Uluru Statement from the Heart and commits to listen to indigenous people on issues that affect them through the voice as proposed, or alternatively, that Australia rejects that idea for a generation.

I would really like to wake up on October 15th as a proud Australian, knowing that collectively we have looked to the future and made a commitment to our first nations people to listen to them, and to build the honest and truthful relationship that they want to have with us. It has to start by voting yes.

Introducing Dewobble 1.0.0

2021-09-28T23:14:00+10:00

After lots of research and experimentation, multiple rewrites, and lots of optimization, version 1.0.0 of Dewobble has been released! Dewobble is an open source C++ library (and accompanying FFmpeg filter, not yet released) that applies motion stabilisation and lens projection changes to videos. It’s not the first software to do either of those things, even among FFmpeg filters, but it has a number of advantages over most existing solutions.

In video form, here is what dewobble does:

Going for a run with a GoPro strapped to my chest

The killer feature: no wobbling jelly

There are many other softwares that change lens projection of videos, including with FFmpeg as I’ve previously written. There are also many other softwares that can do video stabilisation, like vid.stab, or closed source video editors like Premiere Pro or Final Cut Pro.

However, most of them (at least, all the FFmpeg filters) use an affine based motion model. I’ll explain more about this in a separate post, but the practical consequence is that they don’t work very well for videos shot with a wide field of view and/or large camera movements. This is because they don’t account for the way objects in the image are distorted when the lens is not facing directly towards them. The result is that there is a wobbling jelly like effect in the stabilised video, as the objects in the image are distorted in different ways depending on where the camera was facing.

Below is the most clear example of this I could produce. Dewobble and vid.stab are both configured with their fixed/tripod mode, which forces them to make large corrections even for slow camera movements. The grid makes the distortion obvious:

Dewobble (left) vs vid.stab (right)

Here’s a comparison with a more realistic video. Notice that although vid.stab does a good job of detecting the average camera movement in each frame, the wobbling jelly effect is clearly visible as the camera moves (e.g. the horizon changes shape, and the trees sway sideways):

Dewobble (left) vs vid.stab (right)

Action cameras like GoPros have become very popular. Two of their distinguishing features are that they have a wide field of view, and are small and light. Because they are small and light, they can be attached to the body or to moving objects. In these situations, the camera movement is often noisy and the resulting video is very shaky. This severe shakiness and the wide field of view both present a problem for most video stabilisation software.

This is the use case that Dewobble is optimized for. Dewobble uses a rotation model of camera motion. When it detects and compensates for camera movement, it does so in terms of camera orientation in 3 dimensions, rather than in terms of a translations/rotations on a 2 dimensional plane. The result is that the accuracy does not degrade further from the center of the image, and therefore it can accurately detect and correct for large camera movements with a wide angle camera.

Combined projection change and stabilisation

Another important feature of Dewobble is that it performs both projection changes and stabilisation at once (i.e. with a single pass of pixel interpolation). This has a number of advantages:

More efficient than 2 separate passes, therefore faster.
Reduces quality loss, since interpolation always results in some loss of information.
No cropping of intermediate frames (e.g. after stabilisation but before projection change). Every pixel is included as long as it can be mapped from the output to the input.

Cropping of intermediate frames

In the similar example above, I’ve worked around the cropping issue by inserting borders in the input as a buffer, in order to avoid cropping the original input. This dramatically slows down the process because the intermediate frames need to be larger in order to accommodate the borders. Here is what the comparison looks like without that workaround. Notice that Dewobble never crops the input, whereas the combination of a stabilisation filter such as vid.stab with a separate projection change filter results in unnecessary cropping.

Dewobble (left) vs vid.stab (right) – without cropping workaround

How to use it

1. Find the projection and field of view of your camera

Dewobble needs to know the projection and field of view (or focal length) of the camera. Fortunately, finding out this information is not as difficult as it may seem. Most cameras have a projection that is very close to the rectilinear projection (where straight lines in the real world remain straight in the image), or the fisheye projection. So if you took a guess at one of those, you would probably be right.

For the field of view, you may be able to look it up. GoPro publishes tables with the field of view for their cameras. You may also find information about your camera in the extensive Lensfun database. Failing that, you can measure it yourself by facing your camera towards a wall and doing some geometry. If you use Dewobble (or any other method) to convert to a rectilinear projection, you can also do a process of trial and error, adjusting the field of view setting until the output has straight lines. If you’re comfortable compiling C++ code, you can also measure it very accurately using OpenCV. There’s a mini guide for some of these methods in the README.

For example, I’m using a GoPro Hero 5 Black. I switch the picture mode to “4:3 Wide”, and disable the camera’s built in stabilisation (which crops the image, and effectively changes the field of view). The camera uses a fisheye projection, and has a 145.8° field of view (diagonal, corner to corner).

2. Transform your video

Using the FFmpeg filter

The easiest way to use Dewobble is to use the FFmpeg filter that wraps it. My patch to add the Dewobble filter to FFmpeg hasn’t yet been applied, but I hope soon it will be. In that case you will be able use it as follows:

$ ffmpeg \
    -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl \
    -i INPUT \
    -vf 'format=nv12,hwupload,libdewobble=in_p=fish:in_dfov=145.8:out_p=rect:out_dfov=145.8,hwdownload,format=nv12' \
    OUTPUT

Let’s break this down:

The first line of flags has -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl. This is configuring an OpenCL device for Dewobble to use. You can change the 0.0 depending on which OpenCL implementation and device you want to use. Ideally you should use one that makes use of a GPU, since that will be much faster than a CPU. This choice won’t affect the result, but it will affect the speed.

The second line specifies the input video. Then there is the video filter chain. format=nv12,hwupload uploads the video to your chosen OpenCL device, and afterwards hwdownload,format=nv12 downloads it back to the CPU. If you want you can change these to avoid copying the video to the CPU, especially if you use hardware encoding or decoding. This also won’t affect the result (in terms of what Dewobble does).

The important part is the Dewobble filter and its settings: libdewobble=in_p=fish:in_dfov=145.8:out_p=rect:out_dfov=145.8. What this sais is that the input video has a fisheye projection and a diagonal field of view of 145.8°, and that the output should have the same projection. Stabilisation is applied by default, so the output will be smooth.

There are quite a few things you can change here if you want, besides providing the information about your camera (in_p and in_dfov):

You can change the output camera projection with out_p and out_dfov, which results in changing the projection. In most of the examples in this post, I used out_p=rect:out_dfov=145.8. Reducing the field of view is equivalent to zooming in or cropping.
You can turn off stabilisation if you only want to change projection by using stab=none. You can also use stab=fixed to maintain a fixed camera position (AKA tripod mode). With the default stab=sg you can adjust stab_r from the default 30 frames to control the “smoothness” of the output. The number is how many frames ahead/behind are considered when plotting a smooth camera path. stab_h controls how many consecutive frames will have their motion interpolated (i.e. guessed) in case it can’t be detected.
You can change the output dimensions to differ from the input (out_w and out_h), and change where the center of the image is (out_fx and out_fy). In case your input isn’t centered you can also specify the focal point in the input with in_fx and in_fy.
You can change the pixel interpolation method, which affects the quality of the output, using the interp option.
You can change how the borders are coloured (i.e. areas that the camera did not see) using the border and border_rgb options. If you don’t like the default black borders, you can have the image reflected on the edges, replicated, or simply use a different colour.

Using the C/C++ library

You don’t have to use Dewobble with FFmpeg. Headers are provided for C++ and C – see the documentation.

Conclusion

I’ve had a great time writing Dewobble (which doesn’t mean its finished!) and I plan to share more details about it in the near future. Coming up soon is a detailed explanation of how it works, and a more comprehensive comparison between Dewobble and other stabilisation software.

I wrote Dewobble to solve my own problems filming my team’s dodgeball matches, and also as an experiment for my own learning and enjoyment. However I’d be happy to know if anybody else also finds it useful. I would love to hear from you if you decide to use it, or if you have any questions, suggestions, or patches. Happy dewobbling!

Transforming compressed video on the GPU using OpenCV

2021-03-05T20:20:00+11:00

In a previous post, I described various FFmpeg filters which I experimented with for the purpose of lens correction, and I mentioned I might follow it up with a similar post about video stabilisation. This post doesn’t quite fulfill that promise, but at least I have something to report about GPU acceleration!

For background: videos that I recorded of my dodgeball matches had not only lens distortion, but also unwanted shaking. Sometimes the balls would hit the net that the camera was attached to, and the video became very shaky.

Attempt 1: FFmpeg filter

The first thing I tried was to find an FFmpeg filter which could solve the problem. I found that the combination of vidstabdetect and vidstabtransform (wrappers for the vid.stab library) produced reasonably good results. However, this method had a number of issues:

It required 2 passes: one for detection of camera movement, and one to compensate for it.
It was very slow. The combination perspective remapping and the stabilisation resulted in a framerate of about 3fps. This meant that a 40 minute dogeball match took half a day to process!
It created “wobbling” when the camera was shaking.

Wobbling

The model used by vid.stab to represent the effect of camera movement is a limited affine transformation, including only translation, rotation, and scaling. In my application, the main way that the camera moved was by twisting – i.e. the camera remained at the same location, but it turned to face different directions as it shook. There was little rotation in practice, and little change in the position of the camera, so I don’t think that vid.stab detected much rotation or scaling. Instead I think it applied translation (basically moving a rectangle in 2 dimensions) in order to correct for changes in the angle of the camera.

The problem is that translation is not what happens to the image when you twist a camera – what happens is a perspective transformation. Close to the center of the image or at a high zoom level translation is a good approximation, but it gets worse further away from the center of the image and with a wider field of view. My camera had a very wide field of view, so the effect was quite significant.

Speed

There were a few reasons why the processing speed was so slow. One was that the expensive (and destructive) interpolation step was happening twice – once to correct for lens distotion, and then again for stabilisation. No matter how optimised the interpolation process was, this was a waste of time. In theory there is no reason not to perform the interpolation for both steps at once, but this wasn’t supported by the FFmpeg filters, and probably wouldn’t even make sense to do with the FFmpeg filter API.

Another opportunity was to use the GPU to speed up the tranformations. FFmpeg supports the use of GPUs with various APIs. The easiest thing to get working is compression and decompression. On Linux the established API for this is VA-API, which FFmpeg supports. I was already using VA-API to decompress the H.264 video from my GoPro camera, and to compress the H.264/H.265 output videos I was creating, but the CPU was still needed for the projection change and video stabilisation.

For more general computation on GPUs, there are various other APIs, including Vulkan and OpenCL. Although there are some FFmpeg filters that support these APIs, neither the lensfun or the vid.stab filters do. The consequence for me was that during the processing, the decoded video frames (a really large amount of data) had to be copied from the GPU memory to the main memory so that the CPU based filters could perform their tasks, and then the transformed frames copied back to the GPU for encoding.

This copying takes significant time. For example, I found that an FFmpeg pipeline which decoded and reencoded a video entirely on the GPU ran at about 380fps, whereas modifying that pipeline to copy the frames to the main memory and back again dropped this to about 100fps.

Attempt 2: OpenCV

At this point I felt like I had exhausted my ability to solve the problem with scripts that called the FFmpeg CLI, and that to make more progress I would need to work at a lower level. Here are the tools I used:

libavformat, libavcodec, libavutil: C libraries for muxing/demuxing and encoding/decoding (part of FFmpeg)
OpenCV: An extensive library for computer vision written in C++
VA-API: Linux API for GPU video encoding and decoding
OpenCL: API for working with objects in GPU memory, well supported by OpenCV

I knew that there were methods in OpenCV to do things like perspective remapping, and that many of its more popular methods had implementations that operated directly on GPU memory with OpenCL. In order to take advantage of this, I needed to take the VA-API frames from the GPU video decoder and convert them to OpenCV Mat objects. To make the process run as fast as possible, I wanted to do this entirely on the GPU, without copying frames to the main memory at any point.

Decoding with OpenCV VideoCapture

The first thing to do was to decode the input video and get VA-API frames. I first attempted to use OpenCVs VideoCapture API to do so. Depending on the platform there is a choice of backing APIs from which to retrieve decoded video. The applicable choices were CAP_FFMPEG, and CAP_GSTREAMER. There weren’t any capture properties in the OpenCV capture API at the time related to hardware decoding. While the FFmpeg backend only accepted a file path as input, the GStreamer backend also accepted a GStreamer pipeline. So with a bit of experimentation I came up with a GStreamer pipeline which decoded the video with VA-API (confirmed by running intel_gpu_top from igt-gpu-tools).

Mat frame;
VideoCapture cap(
    "filesrc location=/path/to/input.mp4 ! demux ! vaapih264dec ! appsink sync=false",
    CAP_GSTREAMER
);
while (true) {
    cap.read(frame);
    // Do stuff with frame
}

Although the decoding was done with VA-API, the resulting frame was not backed by GPU memory – instead the VideoCapture API copied the result to the main memory before returning it.

Aside: recently, support for hardware codec props has been added to the VideoCapture and VideoWriter APIs. Although this would simplify using VA-API with the Gstreamer backend (and make it possible with the FFmpeg backend), it still doesn’t return hardware backed frames. You can see the FFmpeg capture implementation copying the data to main memory in retrieveFrame and vice versa in writeFrame. Similarly in the GStreamer backend it looks like the buffer is always copied to main memory in retrieveFrame.

Decoding with libavcodec

VideoCapture seemed like a dead end, so instead I turned my attention to demuxing and decoding the video with libavformat and libavcodec. Although this required a lot more code, I found that it worked very well. There are lots of examples in the documentation, including for hardware codecs, OpenCL, and mapping different types of hardware frames. I wrote code to open a file, and create a demuxer and video decoder. Then I set up a finite state machine to pull video stream packets from the demuxer and send them to the decoder, as well as code to pull raw frames from the decoder and process them. It was something like this pseudocode:

state = AWAITING_INPUT_PACKETS
while (state != COMPLETE):
    switch(state):
        case COMPLETE:
            break
        case FRAMES_AVAILABLE:
            while (frame = get_raw_frame_from_decoder()):
                process_frame()
            state = AWAITING_INPUT_PACKETS
            break
        case AWAITING_INPUT_PACKETS:
            if (input_exhausted()):
                state = COMPLETE
            else:
                send_demuxed_video_packet_to_decoder()
                state = FRAMES_AVAILABLE
            break

This was mostly the result of copying examples like this one (except for the part that copies the VA-API buffer to main memory).

Converting VA-API frames to OpenCL memory

With the VA-API frames available, it was time to convert them into OpenCL backed OpenCV Mat objects. OpenCL has an Intel specific extension cl_intel_va_api_media_sharing which allows VA-API frames to be converted into OpenCL memory without copying them to the main memory. Luckily I had an Intel GPU.

I could see two options for using this extension. One was to use OpenCVs interop with VA-API, and another was to first map from VA-API to OpenCL in libavcodec. On the first attempt with libavcodec I couldn’t find a way to expose the OpenCL memory, so I chose the OpenCV VA-API interop option.

OpenCV VA-API interop

There were a few basic snags with OpenCV’s VA-API interop. OpenCV is built without it by default, and the Arch Linux package doesn’t include the necessary build flags. So I had to create a custom PKGBUILD and built it myself. In the process it became apparent that OpenCV was not compatible with the newer header provided by OpenCL-Headers, and only worked with the header from a legacy Intel specific package. So I had to also patch OpenCV to build with the more up to date headers (this is no longer necessary after this recent fix to OpenCV).

Making it work required some additional effort. The VA-API and OpenCL APIs both refer to memory on a specific GPU and driver, and also with a specific scope (a “display” in the case of VA-API and a “context” for OpenCL). So it’s necessary to initialise the scope of each API such that the memory is compatible and can be mapped between the APIs. The easiest way seemed to be to choose a DRM device, use it to create a VA-API VADisplay, and then use this to create an OpenCL context (which the OpenCV VA-API interop handles automatically). The code looked something like this:

#include 
extern "C" {
    #include 
}

void initOpenClFromVaapi () {
    int drm_device = open("/dev/dri/renderD128", O_RDWR|O_CLOEXEC);
    VADisplay vaDisplay = vaGetDisplayDRM(drm_device);
    close(drm_device);
    va_intel::ocl::initializeContextFromVA(vaDisplay, true);
}

The OpenCV API handles the OpenCL context in an implicit way - so after initializeContextFromVA you can expect that all the other functionality in OpenCV that uses OpenCL will use the VA-API compatible OpenCL context.

From there it was reasonably simple to create OpenCL backed Mat objects from VA-API backed AVFrames:

Mat get_mat_from_vaapi_frame(AVFrame *frame) {
    Mat result;
    va_intel::convertFromVASurface(
        vaDisplay,
        frame->data[0], // <- If I remember correctly...
        dimensions,
        result
    );
    return result
}

This method worked, but it wasn’t as fast as I had hoped. After reading the code I had a reasonably good idea why.

Video codecs like H.264 (and by extension APIs like VA-API) usually deal with video in NV12 format. NV12 is a semi planar format, which means instead of storing each pixel separately including all its colour channels, there are separate matrices to store the luminance/brightness of the whole image, and the chroma/colour (which incorporates 2 channels).

Also, OpenCL has various different types of memory, and they cannot all be treated the same way. OpenCV Mat objects when backed by OpenCL memory use an OpenCL Buffer, whereas VA-API works with instances of Image2D. So in order to create a OpenCL backed Mat from a VA-API frame, it’s necessary to first remap from an OpenCL Image2D to an OpenCL Buffer. What this means physically is dependent on the hardware and drivers.

The OpenCL VA-API interop handles both of these problems transparently. It maps VA-API frames to and from 2 Image2Ds corresponding to the luminance (Y) and chroma (UV) planes, and it uses an OpenCL kernel to convert between these images and a single OpenCL Buffer in a BGR pixel format. Both of these steps take time, so the speed to decode a video and convert each frame to a Mat with a BGR pixel format was about 260fps, compared to about 500fps for just decoding in VA-API.

libavcodec hw_frame_map

The OpenCV VA-API interop worked, but required patches to OpenCV and its build script, and it took away control over how the NV12 pixel format was handled. So I took another stab at doing the mapping with libavcodec. libavcodec has a lot more options for different types of hardware acceleration and for mapping data between the different APIs, so I was hopeful that then or in the future there might be a way to do it on non Intel GPUs.

As the OpenCV VA-API interop did, it was necessary to derive an OpenCL context from the VA-API display so that the VA-API frames could be mapped to OpenCL. It was also necessary to initialise OpenCV with the same OpenCL context as the libavcodec hardware context so that they could both work with the same OpenCL memory.

// These contexts need to be used for the decoder
// and for mapping VA-API frames to OpenCL
AVBufferRef *vaapi_device_ctx;
AVBufferRef *ocl_device_ctx;

void init_opencl_contexts() {

    // Create a libavcodec VA-API context
    av_hwdevice_ctx_create(
        &vaapi_device_ctx,
        AV_HWDEVICE_TYPE_VAAPI,
        NULL,
        NULL,
        0
    );

    // Create a libavcodec OpenCL context from the VA-API context
    av_hwdevice_ctx_create_derived(
        &ocl_device_ctx,
        AV_HWDEVICE_TYPE_OPENCL,
        vaapi_device_ctx,
        0
    );

    // Initialise OpenCV with the same OpenCL context
    init_opencv_opencl_context(ocl_device_ctx);
}

void init_opencv_opencl_context(AVBufferRef *ocl_device_ctx) {
    AVHWDeviceContext *ocl_hw_device_ctx =
        (AVHWDeviceContext *) ocl_device_ctx->data;
    AVOpenCLDeviceContext *ocl_device_ocl_ctx =
        (AVOpenCLDeviceContext *) ocl_hw_device_ctx->hwctx;
    size_t param_value_size;

    // Get context properties
    clGetContextInfo(
        ocl_device_ocl_ctx->context,
        CL_CONTEXT_PROPERTIES,
        0,
        NULL,
        &param_value_size
    );
    cl_context_properties *props = malloc(param_value_size);
    clGetContextInfo(
        ocl_device_ocl_ctx->context,
        CL_CONTEXT_PROPERTIES,
        param_value_size,
        props,
        NULL
    );

    // Find the platform prop
    cl_platform_id platform;
    for (int i = 0; props[i] != 0; i = i + 2) {
        if (props[i] == CL_CONTEXT_PLATFORM) {
            platform = (cl_platform_id) props[i + 1];
        }
    }

    // Get the name for the platform
    clGetPlatformInfo(
        platform,
        CL_PLATFORM_NAME,
        0,
        NULL,
        &param_value_size
    );
    char *platform_name = (char *) malloc(param_value_size);
    clGetPlatformInfo(
        platform,
        CL_PLATFORM_NAME,
        param_value_size,
        platform_name,
        NULL
    );

    // Finally: attach the context to OpenCV
    ocl::attachContext(
        platform_name,
        platform,
        ocl_device_ocl_ctx->context,
        ocl_device_ocl_ctx->device_id
    );
}

To make this work I had to fix a bug in FFmpeg where the header providing AVOpenCLDeviceContext was not copied to the include directory.

Next, I attached the VA-API hardware context to the decoder context and configured the decoder to output VA-API frames:

// AVCodecContext *decoder_ctx = avcodec_alloc_context3(decoder);
// ...etc

// Attach the previously created VA-API context to the decoder context
decoder_ctx->hw_device_ctx = av_buffer_ref(vaapi_device_ctx);
// Configure the decoder to output VA-API frames
decoder_ctx->get_format = get_vaapi_format;

// This just selects AV_PIX_FMT_VAAPI if present and errors otherwise
static enum AVPixelFormat get_vaapi_format(
    AVCodecContext *ctx,
    const enum AVPixelFormat *pix_fmts
);

At this point the decoder was generating VA-API backed frames, so we could map them to OpenCL frames on the GPU:

AVFrame* map_vaapi_frame_to_opencl_frame(AVFrame *vaapi_frame) {
    AVFrame *ocl_frame = av_frame_alloc();
    AVBufferRef *ocl_hw_frames_ctx;

    // Create an OpenCL hardware frames context from the VA-API
    // frame's frames context
    av_hwframe_ctx_create_derived(
        &ocl_hw_frames_ctx,
        AV_PIX_FMT_OPENCL,
        ocl_device_ctx, // <- The OpenCL device context from earlier
        vaapi_frame->hw_frames_ctx,
        AV_HWFRAME_MAP_DIRECT
    );

    // Assign this hardware frames context to our new OpenCL frame
    ocl_frame->hw_frames_ctx = av_buffer_ref(ocl_hw_frames_ctx);

    // Set the pixel format for our new frame to OpenCL
    ocl_frame->format = AV_PIX_FMT_OPENCL;

    // Map the contents of the VA-API frame to the OpenCL frame
    av_hwframe_map(ocl_frame, frame, AV_HWFRAME_MAP_READ);

    return ocl_frame;
}

Internally, av_hwframe_map uses the same Intel OpenCL extension as the OpenCV VA-API interop. However libavcodec supports many other types of hardware, and for all I know there are or will be other options that work on non Intel GPUs. For example it might work to first convert to a DRM hardware frame, then then to an OpenCL frame.

Next we need to convert the OpenCL backed AVFrame into an OpenCL Mat:

Mat map_opencl_frame_to_mat(AVFrame *ocl_frame) {
    // Extract the two OpenCL Image2Ds from the opencl frame
    cl_mem luma_image = (cl_mem) ocl_frame->data[0];
    cl_mem chrome_image = (cl_mem) ocl_frame->data[1];

    size_t luma_w = 0;
    size_t luma_h = 0;
    size_t chroma_w = 0;
    size_t chroma_h = 0;

    clGetImageInfo(cl_luma, CL_IMAGE_WIDTH, sizeof(size_t), &luma_w, 0);
    clGetImageInfo(cl_luma, CL_IMAGE_HEIGHT, sizeof(size_t), &luma_h, 0);
    clGetImageInfo(cl_chroma, CL_IMAGE_WIDTH, sizeof(size_t), &chroma_w, 0);
    clGetImageInfo(cl_chroma, CL_IMAGE_HEIGHT, sizeof(size_t), &chroma_h, 0);

    // You can/should also check things like bit depth and channel order
    // (I'm assuming that the input is in NV12),
    // and you can probably avoid repeating this for each frame.

    UMat dst;
    dst.create(luma_h + chroma_h, luma_w, CV_8U);

    cl_mem dst_buffer = (cl_mem) dst.handle(ACCESS_READ);
    cl_command_queue queue = (cl_command_queue) ocl::Queue::getDefault().ptr();
    size_t src_origin[3] = { 0, 0, 0 };
    size_t luma_region[3] = { luma_w, luma_h, 1 };
    size_t chroma_region[3] = { chroma_w, chroma_h * 2, 1 };

    // Copy the contents of each Image2Ds to the right place in the
    // OpenCL buffer which backs the Mat
    clEnqueueCopyImageToBuffer(
        queue,
        cl_luma,
        dst_buffer,
        src_origin,
        luma_region,
        0,
        0,
        NULL,
        NULL
    );
    clEnqueueCopyImageToBuffer(
        queue,
        cl_chroma,
        dst_buffer,
        src_origin,
        chroma_region,
        luma_w * luma_h * 1,
        0,
        NULL,
        NULL
    );

    // Block until the copying is done
    clFinish(queue);

    return mat;
}

I made a different choice to the OpenCV VA-API interop in this case – rather than converting the image to the BGR pixel format immediately, I copied it in the simplest/fastest way possible, preserving the NV12 pixel format. This makes sense to me because there are many algorithms that operate only on single channel images anyway, so it seems pointless to throw away the luminance plane. If I want to convert the frame to BGR, then I can do so with cvtColor, which also has an OpenCL implementation.

The combination of libavcodec mapping between VA-API and OpenCL hardware frames, OpenCL conversion from Image2D to Buffer, and cvtColor seems to be about as fast as the OpenCV VA-API interop.

The video stabilisation part

Anyway, this was an interesting adventure. The next step is to actually use the OpenCV API to do the change in lens projection and video stabilisation. That requires some more experimentation, so I will leave this here for now. At least I’m confident that even a very slow implementation will be miles faster than the 3fps I started out with!

P.S. In case you really want to see the source code, it’s here (probably in a mostly working state).

Making a new desk (part 1): Choosing a frame

2021-02-20T00:10:00+11:00

Like many people I spend a really huge amount of time working at a desk, so I think its important that my desk setup is comfortable and generally induces positive feelings. Until recently I was using a desk I found at a second hand store.

This desk served me well for many years. More recently though, I started to experience pain in various places and I realised that it was caused by sitting at my desk. Some of this pain was in my wrist and hands, and this I solved by changing my keyboard and mouse. But some of it was in my back and legs. I guessed that this was related to the slouchy posture that I tended to fall into when working at my desk. Although I tried, I struggled to remain seated with my back straight and feet flat on the floor for any length of time.

So I decided to build myself a new desk. In this first part, I’ll write about choosing the frame, and in subsequent parts I’ll write about building the tabletop.

The convention is wrong

There seems to be lots of agreement that a good seating position involves the following:

feet flat on the floor
shins vertical
thighs horizontal, resting on a chair
vertical torso
vertical upper arms
horizontal lower arms, with fingers resting on a keyboard

Basically every fixed height desk is 74cm (about 29 inches) high. People obviously vary in height but I am 183cm tall, which is above average. If I adjust the height of my chair and sit in it such that my body position matches the recommendations, the distance from the bottom of my elbows to the floor is 69cm. If my forearms are to be level and resting on a keyboard which is on my desk, then the height of my desk would have to be 69cm, less the height of my keyboard. This means that the 74cm standard height of desks is too high for almost all humans.

183cm tall human (left), ergonomic seating position (centre), result of standard 74cm height desk (right)

It gets worse though – even adjustable height sit/stand desks rarely go lower than 70-74cm. This is still too high, even for a relatively tall person like me! Fortunately there are some adjustable height desks that go low enough for most people. Typically these are the ones that have 3 segments in the telescopic legs, rather than just 2. They are more expensive, but at least they can be adjusted to the right height.

Choosing a desk frame

I wanted my desk to have an adjustable height to accommodate a comfortable and healthy position seated and standing, so I decided to buy a desk frame. I bought the UpDown desk pro series. This is a sturdy electric height adjustable desk frame , of the sort with 3 segment telescopic legs. It’s able to go down to 61.5cm (subject to the table top thickness), which can accommodate most humans comfortably without a foot stool (not just an uncommonly tall person). It arrived quickly and was very easy to assemble.

Next step: making a tabletop

A desk frame isn’t much use without a tabletop. For the tabletop I decided to take a more ambitious route and make it myself. Read more in part 2.

Making a new desk (part 2): Buying a timber slab

2021-02-20T00:10:00+11:00

In part 1 of this series I bought a desk frame, but I didn’t buy a tabletop to rest on it. That’s because I made my own out of a solid slab of Eucalyptus. What follows for the next few posts is not so much a how to, but an account of my journey making the tabletop. I’m an amateur and I was doing many things for the first time, so this includes many mistakes I made along the way.

Finding a slab

My brother recommended Go Natural Timbers, a business in Sydney which specialises in timber slabs and associated furniture, so I paid them a visit. There was a large warehouse full of slabs, in most cases stacked in their original positions in the tree. It doesn’t take that many large trees to fill a warehouse, so my choice was largely dictated by what was available – it wasn’t practical to choose a specific species and dimensions.

My research led me to believe there were a few objectively important characteristics to look for when choosing a slab:

stable moisture content, relative to the intended climate
lack of active rot, termites, or other wood eating creatures

There are cheap electronic meters that can measure moisture, but the appropriate moisture content varies depending on the climate. The aim is that the moisture content should be stable, since changes in moisture content cause the wood to swell or shrink, and therefore warp. I didn’t have one of these meters, and I didn’t know what an appropriate moisture content was for the Sydney’s climate. The staff told me that their slabs were kept outdoors for a long period of time before being sold, and that this ensured that they were dried appropriately.

It’s also important that there are no wood eating organisms living in a piece of furniture, because otherwise they would eat the furniture. My understanding is that there are two ways to ensure that this is the case. Either the wood is heated in a kiln to a temperature that kills the organisms, or it is treated with a chemical that kills them. The timber shop staff told me that their slabs are chemically treated.

So I took them at their word that the slabs on sale were dry and not decomposing. What remained was subjective choices. I wanted to build a desk that was reasonably deep (65-75cm) and 140cm wide, with a straight cut edge on the back and sides, and using the natural edge of the tree for the front edge. The depth requirement eliminated most of the slabs, which came from relatively narrow trees. The apartment I live in has a dark coloured carpet, so I wanted a light coloured wood to balance it. I also wanted wood with an interesting grain, with variations and imperfections. In my mind the purpose of using a slab rather than planks is to display the wood as it grows, and to embrace its natural form.

I returned a week later and had a look at a few slabs. They are kept in stacks, with gaps in between for ventilation. The grain is not easily visible because the wood is rough, unfinished, and has been exposed for a long time. To help see the grain and colour, the staff sanded patches of a few slabs that I was interested in, and sprayed some oil on them to make the grain visible. Given what was available, the choice was between Camphor (of which there seemed to be an infinite supply), or a light coloured Eucalypt they referred to as “true blue gum” (the exact species was unknown). When I thought about it I realised I really wanted my desk to be made from a Eucalypt. Eucalypts are the dominant trees in most of Australia and especially around Sydney where I live. They are the trees that I see on bushwalks, that constantly amaze me with their infinitely varying shapes and colours, and their various impressive abilities like surviving bushfires and growing out of tiny cracks in cliff faces. They remind me of many of my favourite places.

The original slab, milled but undressed

So I chose the “true blue gum”. It fit all my requirements, and it had a very irregular grain, and lots of knots and cracks. This would be terrible for anything structural, but great to look at. The tree must have been absolutely enormous. The photo shows only half of the log (the narrower half!), and the slabs underneath are closer to the middle of the trunk, and therefore wider. Even this piece led me to stray from my intended dimensions. The narrowest point was 75cm wide.

Wood species

I don’t know exactly what species the timber is, and it seemed like the staff at Go Natural Timbers didn’t either. There are apparently many species that are referred to as “true blue gum”. My brother is an arborist and suspects that it might be Eucalyptus Saligna, and it seems like Eucalyptus Globulus is also a possibility.

Some unrelated but awesome trees I came across in Sydney

Outsourcing

I asked Go Natural Timbers about the cost of doing various parts of the process (since they also sell bespoke furniture), and their advice to me as an amateur. It turned out that cutting a slab to shape is trivially easy (with a table saw), and levelling (dressing) it is also a quick process (and therefore cheap). However filling in cracks and sanding are labour intensive and therefore expensive. At the same time, levelling is difficult without specialised tools which I didn’t have access to.

So I had the shop level the faces of the slab and cut it to shape, and decided to do the rest myself. Even though my apartment has no space for such a large desk I asked for it to be cut with the largest depth possible (75-85cm) because it seemed like a travesty to discard more of the wood. The price including dressing and cutting was $600.

Delivery

The dressed slab, in my physical possession!

A week later I went to pick up my dressed slab. It had become significantly thinner, and some of the gaps turned out to be larger then they originally appeared. There was a small area of rot, but it seemed dry and still hard enough to be a desk. It also smelled really great.

And so I had successfully acquired a slab! In part 3 of this series I’ll write about the next steps – filling the gaps, sanding, and shaping the slab.

Making a new desk (part 3): Gap filling with epoxy and sanding

2021-02-20T00:10:00+11:00

In part 2 of this series I wrote about how I came into possession of a cut and dressed timber slab. With the slab in my possession, the first task was to turn it into an appropriate shape to be a desk. This included filling in various gaps, sanding surfaces, and rounding corners.

Bruno's previous project

My brother Bruno had previously built a coffee table out of a burl (a burl is to a tree what a tumour is to an animal). Bruno gave me a hand with my project – apart from lots of advice and lending me a lot of tools, he helped me with the mould and epoxy pouring, which sounds like it was the hardest part of his coffee table.

Cleaning and preparing

The first step is to remove all the loose bark, dried sap, dead animals, etc. There were quite a few holes in the slab from boring insect larvae, and some of them still contained larvae. One of them became visible when the slab was cut, which made me slightly nervous. I didn’t see it actively move, so I guess it was dead. Either way, I took care to dig out each of the holes carefully including any larvae. One important lesson was to wear safety goggles when digging stuff out of wood. At one point some fragments of dry sap flew into my eye and caused me significant pain over many hours. Another is to take the time to find the most appropriate tool to dig/scratch/hook loose things. It takes a surprisingly long time to find all the cracks.

We decorated the cracks by charring the inside edges with a blowtorch. This creates an interesting effect, but a disadvantage is that the charred wood tends to release bubbles into the epoxy during the gap filling. Potentially this could have been avoided with more limited burning or with slower curing epoxy.

Building a mould

The mould prevents the epoxy from escaping while it sets in the slab. To build the mould we used a single piece of melamine which we bought from Bunnings. Bunnings provides a free cutting service, so it was cut mostly to shape at the store, which meant that it fit in my car and we had less cutting to do.

Bruno, next to the mould

Building the mould was quite simple – just some use of a circular saw and inserting lots of screws. I tried to preserve the straight cut edges from Bunnings as much as possible rather than attempt to cut long straight edges myself.

Many tutorials mention products called “mould release” – a substance which is applied to the inside surface of the mould to prevent any epoxy from sticking to it. We used some cooking oil which we applied with a paper towel. This worked well enough.

There are lots of opinions about what kind of adhesive tape should be used to seal the piece when the epoxy is poured. We used painters tape (which looks like waxy paper). It’s cheap, easy to find, easy to apply, and perfectly capable of holding the epoxy in if applied neatly.

After demoulding: channels of epoxy that seeped through grooves between layers of tape

One issue I faced later is that the epoxy is able to spread down the grooves in between strips of tape. For this reason, I think it would have worked better to apply the tape in the direction of the grain. This would mean that any epoxy which travels down the grooves is unlikely to spread far from the crack it leaked out from.

Leakage and sticking where the warp caused the slab to rise above the mould

Another problem is keeping the piece firmly against the mould. We simply rested the mould on an outdoor table, but that IKEA table was not particularly rigid in the central area. Also by this time (i.e. after a month of researching how to get this far) the slab had developed a slight warp which caused the central area to lift slightly from the mould. This probably contributed to a small leakage of epoxy in the central area, which made it more difficult to remove the mould. It didn’t stick strongly (thanks to the tape, oil, and melamine coating), but because the stuck area was in the middle it required some effort with a machete to remove.

In hindsight, it would have been worthwhile to set up some vices to hold the slab firmly on the bottom of the mould. As well as preventing the epoxy leaking issue, this would have prevented the epoxy from baking in the warp, and instead caused it to hold the surrounding timber in a flat shape.

Pouring epoxy

I used two different epoxies – one which was left over from Bruno’s coffee table (Fibreglass Sales Epoxy 2020) and another which I bought for this project (Barnes Megapour). I discovered later that using two epoxies was a risk, because there are many different types with different compositions, and not all of them can form a chemical bond with each other. It was also a mistake to do the first pour before buying enough to finish the job. Each pour needs to be done before the previous pour has fully cured, because this allows a chemical bond to form between the layers. Like most glue, epoxy does not stick well to smooth surfaces, including unsanded epoxy!

There are many different kinds of epoxy. They vary in terms of their curing time, endotherm, viscosity, shrinkage during cure, clarity, UV yellowing resistance, and more. I chose to allow the fill to be transparent, and not to add any dye. This avoids the need to buy and mix in dye, but also makes bubbles and yellowing (including UV induced yellowing) is more visible. So it was useful to choose a UV resistant epoxy. Barnes Megapour is, but Fibreglass Sales Epoxy 2020 is not.

Small bubbles emerging from charred wood

Another consideration is the curing speed, which is closely related to the endotherm and maximum pouring depth. I used one with a 2-3 hour cure time, and another with 24-48 hours. I much preferred the longer curing time. The working times respectively were about 45 minutes and 3 hours. 45 minutes is short enough to feel rushed when there are many cracks to find and there is a constant stream of bubbles to pop. A lower viscosity and longer cure time means it is easier for bubbles to rise to the surface and pop rather than being frozen in. The combination of all the narrow and irregularly shaped cracks in the timber and the charred edges meant that bubbles kept appearing for a long time, and the results were better with the slow curing epoxy.

The first pour of epoxy (about half the depth)

It’s important not to pour too much depth at once, because the endothermic curing reaction can cause the epoxy to overheat and cure too fast. This results in the epoxy changing shape after it has hardened, which causes stress and uneven surfaces. The staff at the timber shop told me that in their pieces they typically pour 5 separate layers. My experience confirms that this is a good idea. With the 2020 epoxy, pouring too much at once (more than about 1-2cm) results in bubbles and shrinkage after cure. Barnes Megapour actually shrinks more (subjectively), but it does so before it hardens, so without any permanent impact. But the result of this is that no matter how carefully you fill in all the cracks the first time, the epoxy always sinks in further just before it hardens. So even with deep pouring epoxy, you need to do multiple pours. You may as well do them at a consistent depth and minimise bubbles.

Permanently soft semi cured epoxy (in the finished piece)

Its a silly mistake but its worth saying: make sure you mix the epoxy correctly. Use scales, measuring cups, calculators – do what you have to do. If you don’t mix it properly it will never cure, and you won’t know it until the soft paste is in your piece. I made this mistake with one pour. Luckily I was only filling some smaller cracks, so the soft parts on the finished product are not obvious unless you know where they are.

Bubbles stuck underneath a peak (but actually above a trough)

We poured the epoxy with the desk upside down. I think the right way up would have worked better. A place where you can see bubbles from above is a place where bubbles have space to rise to the surface and pop. But as you can see there are places where bubbles are visible from underneath but cannot rise. Also, there is no chance that a fold in the tape will get stuck in the epoxy on the top side, because there is no tape on the top side.

In hindsight I can see why it is common to tint the epoxy (even with a dark brown colour which does not stand out). There are a lot of visual imperfections that can happen in epoxy pouring, and a tint hides most of them.

First coat of sanding

The first coat of sanding after filling in the larger cracks is the only time its necessary to remove a lot of material. Inevitably there will be large areas where the epoxy is above the original surface of the wood, and these puddles need to be sanded away to get back to the wood.

Scratches made by the belt sander

We used a belt sander with 40 grit sanding belts for this purpose. This was very fast, but I can confirm that belt sanders can also cause a lot of damage very fast, so its important to maintain even pressure, use an appropriate pattern of movement, and avoid mistakes like rolling it over the edges or leaving it running in one position. It’s also important to adjust the rollers frequently, otherwise the machine will happily sand itself and self destruct.

I found that sanding can expose more gaps in the wood, and it pays to touch these up with epoxy later. This should be done as late as possible in the sanding process, but not so late that you need to repeat a coat of sanding. I’m not sure when that would be. I continued to use a 40 grit sanding belt after my final epoxy touch ups, but I suspect that doing it after one of the early coats with the random orbital sander would have left less unfilled cracks.

Filling other faces

Unfilled cracks on the top side (underneath for the first pour)

In my slab, the epoxy poured from the bottom face was able to reach most of the gaps in slab, but there were still some smaller gaps visible from other faces which were not filled in this stage. To fill these I did some additional pours at different angles.

The mould, reconstructed as a stand

I reused the mould mostly as it was to pour from the top face. Things got a little bit more difficult for the edges though. I decided to reuse the melamine from the mould to build a pair of stands to hold the slab at the appropriate angles while I poured into the edges.

For these pours, there was no need for a complete mould. Although were some gaps that weren’t sealed around each edge, it was good enough to cover any potential leakage points with painters tape.

Shaping the edges

For the cut edges, I took up an offer from Go Natural Timbers to use a router to round the corners. They also showed me how to roll a random orbital sander over corners to round them, which is what I did to round the corners of the natural edge. They seemed not to be too shocked by the work so far, which was reassuring. In fact they said: “nothing there that can’t be fixed”.

The natural edge was one of the harder parts of the project, and is probably the area with the largest issues in the finished product. The incorrectly mixed epoxy ended up in the natural edge, and there are also some cracks where subsequent layers of epoxy didn’t bond and have since separated. The varying angle of the natural edge made it more difficult to pour very much epoxy at once, and increased the risk of these mistakes.

I tried to fill in every last crack in the natural edge, and in doing so covered the entire surface in epoxy. I then had to remove this epoxy using wire brush drill attachments and a random orbital sander, which resulted in more of the wood being removed than I would have liked. I think a lighter touch with the epoxy (even at the expense of leaving some small cracks unfilled) would have removed the need for such aggressive use of the wire brush attachment and sanding, and preserved the natural appearance better. Perhaps also a finer/softer wire brush could have removed some discoloured wood from the surface without creating such deep scratches.

Final sanding

For the later coats of sand, I used a random orbital sander. This machine is much slower than the belt sander, but also much easier to use, and produces much better results with minimal visible scratches. I found that it was worth repeating the same grit that was done with the belt sander. In my case it took a long time with the random orbital sander at 40 grit to remove the scratches made by the belt sander at 40 grit, and the result was much smoother. I think it might even be worth going down a step or two, for example using 60-80 grit with the belt sander and then switching to the random orbital sander at 40 grit.

I found that using a foam backing pad reduced noise and vibration and produced less visible scratches. It was also useful to allow the sanding disc to wrap around the corners and over the natural edge. Take care to get a backing pad and sanding discs with size and hole positions matching the sander – otherwise things will get very dusty. Finer sanding grits produce much less sawdust, but it is finer and more irritating.

I did the following progression of grits: 40, 60, 80, 120, 180, 240, 320. One of the carpenters at Go Natural Timbers told me that it was necessary to go up to 180 for an epoxy finish (which is a deep coating), and 240 for a lacquer finish (typically a very shallow coating). The finer grits don’t take as long though, so it made sense to go slightly further. It’s hard to capture in photos, but in real life the sanding makes the wood look much better – all the small details in the grain become visible.

Sanding complete and corners rounded

Onwards to finishing

The slab was now in its final shape, so this concludes part 3 of the series. In part 4, I’ll write about the process of choosing and applying a finish to protect the wood and enhance the aesthetics.

Making a new desk (part 4): Choosing and applying a finish

2021-02-20T00:10:00+11:00

In part 3 of this series I ended up with a tabletop for my desk that was in its final shape, with smooth sanded surfaces, gaps filled, and nicely rounded corners. The next step was to apply a finish to the wood in order to protect it from the elements, and to enhance its aesthetic qualities including the appearance and texture.

Choosing a finishing product

Wow, there are a lot of options for how to finish wood, and a lot of strong opinions on what the best way is! I got held up for at least a month trying to understand all the different options here, and despite that I feel like I still have a very limited understanding.

One thing I learned is that there are many differences between the names that finishing products are marketed under, and the actual composition and properties of those products. Words like “lacquer” and “oil” do not have a consistent meaning, and some products have different names in different countries.

Most importantly I wanted a finish that was durable – I didn’t want to be stressed about burning it with a mug of hot coffee, scratching it with a mouse, or soaking it by spilling a drink. Ideally I also wanted to be able to use household cleaning products on it (e.g. surface cleaner) without causing any damage. This eliminated most “oils” (including products like “Danish oil” which is not just oil), because they do not entirely seal the timber and reliably keep water out. It also eliminated shellac, which is very easily dissolved.

While I researched finishing products, the slab was busy warping

The remaining choices were “lacquer” (which is a lot of things), polyurethane, and epoxy (epoxy has various chemical compositions but at least they all seem to work in a similar way). Epoxy is generally applied by pouring a large amount of it over a level surface so that it creates a puddle. The effect is a deep mirror finish which is extremely durable. I wanted a shallower finish which had more of the appearance of wood rather than glass. Perhaps more importantly, at this point my slab had a significant warp, and it seemed that this would complicate the process of applying a puddle of epoxy.

It seems that amongst professionals, the most preferred way of applying thin coatings is with a spray gun – a device that atomises paint into fine particles and blows it onto a surface. Using these is dangerous because paints are often extremely flammable and toxic. Probably this explains why such paints are hard to find in consumer hardware stores. Most of the higher quality coatings, and especially ones where there are clear advertised specifications, are designed to be applied with a spray gun.

I had already spent a large amount of time and money on this project and I didn’t want to compromise it with a sub par finishing product, or risk using a product that didn’t have any clear specifications (i.e. anything from hardware stores). However, I also didn’t want to spend a ridiculous sum on a spray gun that I would rarely use. The most common way to apply such finishing products is using a HVLP (high volume low pressure) spray gun. This requires a spray gun, but also a source of low pressure air from either an air compressor and pressure regulator or a turbine type motor (like a vacuum cleaner). Although most of these are expensive, I managed to find a integrated turbine and spray gun for $69 in the Chicago Air C600PS. I suspect that no professional would use this machine, but I hoped that it would be good enough for my purpose. This opened the possibility of using spray finishes.

I managed to find a number of businesses that sell wood coatings in Sydney, including Croma Coatings and BC Coatings. Each has a catalogue on their website with detailed data sheets explaining the properties of each finish. The sales rep from Croma Coatings was particularly helpful in explaining my options. The better polyurethane coatings (the 2 component or 2K ones) were not an option because they are highly toxic, to the extent that spraying them outside is illegal. Those that remained remained were lacquers.

The older type of lacquer (Nitrocellulose lacquer) was not ideal because it was less durable. It dries purely through the evaporation of solvents, so the same solvents can dissolve the paint after it dries. Also it has a tendency to turn yellow over time, especially when exposed to UV light. This was important to me because my desk was of a light coloured timber and I didn’t want it to turn yellow over time, even if it was in some sunlight.

There are newer types of lacquer which chemically harden in addition to drying through the evaporation of solvents, which makes the finish more durable. These are called “catalysed lacquer”. Some have a slow catalyst which is added before sale, while others must be mixed with the catalyst immediately before use. Some of these lacquers are clear and non yellowing. My choice was the post catalysed option, which is called “acid catalysed lacquer” in Australia, and “conversion varnish” in the USA. The specific product was CM1510 clear multicoat. Being a multicoat, it can be applied both as an undercoat to seal the wood and as a topcoat to create the finished surface, so there is no need for multiple finishing products. The sales rep told me that it was almost as durable as 2K polyurethane while being safer to apply. I chose the 25% gloss option (AKA “satin”). Besides the lacquer, I bought separately the catalyst, a thinner (which is important especially for cheaper and less powerful spray guns), and a UV absorber additive (to protect the non UV resistant 2020 epoxy that I had used earlier). I considered buying also a retarder additive, which slows down the evaporation of solvents. This can be necessary to prevent “blushing” or bubbles when spraying in hot and humid weather. The sales rep told me that it wouldn’t be necessary if I picked an appropriate day to spray outdoors. The total cost for a 4 litre can of lacquer plus the additives was $242.88.

Applying the finish

TIL: masks don't work so well when you have a beard

As well as the $69 paint sprayer, I bought some safety goggles and face masks to protect myself from the toxic lacquer fumes. Incredibly, a pack of disposable P2/N95 vapour masks was more expensive than the paint sprayer. Strange times.

So far I had done everything on the balcony of my apartment. Although my neighbours were very patient with loud power tools and probably some sawdust, I think spraying toxic paint fumes would have been a bridge too far (and also would have made my apartment borderline uninhabitable for a few days). My parents had an outdoor shed, which seemed like the best option. Lacquer dries to the touch in only a few minutes, but in that time its important to prevent small objects from falling onto it and creating imperfections in the finish. The shed was good because it was reasonably well ventilated but also covered and therefore sheltered from wind, dust, pollen, etc.

I experimented spraying some scrap pieces of wood to get used to the spray gun. I found that the best results were achieved with quite a large amount of thinner. The lacquer data sheet specified that it could be diluted with up to 30% thinner. I found that with any less than this, the paint gun was unable to evenly atomise it and instead threw large splotches of lacquer rather than an even mist. Unfortunately the more thinned mixture was also less able to resist dripping down vertical surfaces, which limited how much paint I could apply at once and created a dimpled surface on the vertical surfaces. Luckily my piece was almost entirely one horizontal surface, so this wasn’t a large issue, although the edges did end up with a slightly cloudy/frosty appearance. Judging by videos of automotive painting, better spray guns can atomise paint much more finely than the super cheap one I was using.

2-3 minutes after application. I was careful not to inhale while taking this photo!

It quickly became apparent that the toxicity of lacquer fumes is not a joke. Although the thinner has the strongest smell, the lacquer fumes are truly unpleasant. In large concentrations they cause a burning sensation in the nose, and in smaller concentrations they cause a feeling resembling nausea or a headache. Even a faint smell of it (e.g. a few rooms away with windows open in the second day of drying) makes it impossible to sleep. My P2 masks and safety goggles were the bare minimum of protection. I felt a very direct physical need to hold my breath while spraying and to leave the room before inhaling again.

About 10 minutes after application. Some lumps due to dust, and cracks that weren't filled with epoxy.

It turns out that its very difficult to prevent even a grain of dust from falling onto wet paint, even just for 15 minutes. The shed I was spraying the lacquer in had some dust on the roof, and although I did my best to shake/sweep/blow it away, some of it remained and fell on the wet lacquer. Some small hairs fell off my arms, at least one piece of pollen blew in (perhaps when I was opening/closing the shed door), and various small insects perished in the toxic lacquer fumes and fell rudely on the wet lacquer, creating imperfections.

The lacquer was also not able to fill even small cracks which were exposed by sanding after epoxying or by wood movement while I paused for weeks to read about finishes. With many coats it can fill in pores, but really nothing larger.

I ended up doing 3 coats on the top side, and 2 on the bottom side. The first coat raised the grain of the wood and ended up very rough. The second coat was much smoother, but was still a bit rough around the deeper pores in the wood. The third coat still didn’t fill in the pores entirely, but was very smooth to the touch. Before each coat I sprayed some lacquer on scrap wood to adjust the spray gun volume and check the conditions. Spraying on a wet piece of wood exaggerated the blushing effect which can happen when it is too hot or humid, and was a good test for the conditions. I got a noticeably smoother finish during the colder times of the day, in the context of some fairly typical Sydney days with 25C maximum temperature, relatively high humidity, and occasional showers.

After the first coat I sanded the surface using a random orbital sander with 320 grit discs, and the edges by hand. The second coat was smoother and I sanded it by hand with 400 grit sandpaper. Since this lacquer cannot dissolve previous coats, sanding is necessary to make subsequent coats stick. After sanding, I removed the sawdust by using the spray gun as a blower and wiping over the cracks with a dry microfibre cloth. It’s necessary to wait about 2 hours between spraying and sanding. Over one weekend of 3 nights and 2 days I was able to do all 5 applications, including all the preparation and cleanup.

I thought it looked pretty awesome at this point, and I felt really satisfied because I had finished all the tasks which I wasn’t familiar with. The finish looks and feels great in my opinion. It’s really smooth visually and to the touch, preserves the natural colour of the wood, and as far as I can tell is very durable. It even has a strong tendency to resist dust, much more so than my old desk, which I suspect was finished with nitrocellulose lacquer. It took a few days to stop smelling like poison, and a week to fully harden and resist scratches.

Next step: final assembly

The finishing of the tabletop was now complete, so the next and final step was to assemble the desk. For this, as well as my overall reflections on the project, check out part 5.

Making a new desk (part 5): Assembly and final reflections

2021-02-20T00:10:00+11:00

In part 4 I wrote about the process of choosing and applying a finish to my timber slab desk tabletop. There was now only a ceremonial amount of work left to attach the tabletop to the frame, and my desk would be complete.

Attaching the frame

Nobody knows what these cork screw things are called

Rather than screwing the frame directly into the slab, I used socket head screws as an intermediate layer between the screws and the wood. I used fast 5 minute curing epoxy to glue the sockets into holes in the slab. This was more effort, but I didn’t want any risk of cracking the slab, and I figured this way would distribute the force more evenly on a larger area of wood.

The finished product

Final reflections

Wow, this was a long journey! The whole project took a lot longer and was much more expensive than I expected (or cared to admit) before I started. Truthfully, I knew it would be a challenge and would take a long time – I had no detailed plan when I started, so I knew I would only know at the end what it would take. Buying a massive chunk of wood was a kind of forcing function – a way of committing myself to the project.

Cost

An incomplete set of approximate costs (AUD):

Desk frame: $700
Timber slab (cut and dressed): $600
Melamine for the mould: $39
Epoxy: $150
Sanding disks and foam backing pads: $150
Finishing products: $260
Spray gun, masks, and goggles: $150
Fastenings for assembly: $40
Access to many tools and materials that I already had or borrowed, about 100 hours of my time, and 20 hours of Bruno’s time: priceless

That’s a total of about $2089. If I remember correctly, Go Natural Timbers suggested a ballpark figure of $2500 for the cost of producing the tabletop themselves. This number was slightly shocking to me when I heard it, but having done this project I have a new appreciation for the costs that would be involved. A professional workshop can no doubt have many of the materials much cheaper. For example, I bought packs of 10 sanding disks of each grit only to use 1-2, and a 4 litre pack of epoxy to use only 2 litres. They would buy everything in large quantities and only use as much as necessary for each job. But on the other hand they have to pay for the labour, maintain a workshop, etc, and turn a profit. Overall I probably saved a small proportion of the cost by doing most of the work myself, but I definitely wouldn’t embark on this kind of project to save money!

Experience

There were a lot of challenges but I loved every bit. I learned how to use a number of new tools, including a belt sander, random orbital sander (I now also know what that is), spray gun, and blowtorch. I learned about the different types of epoxy and how to use them, and about the different types of wood finishes. I only wish that I had a degree in chemistry so I could understand the ingredients in epoxy and finishing products.

Result

In my opinion, my new desk is awesome, and it serves its purpose wonderfully. I feel much more comfortable using it than my old desk thanks to the adjustable height and the ability to stand, and the quality of the surface is as good as I could have hoped.

Besides its utility, It looks beautiful. This is mostly just down to it being made from a huge piece of wood – but I’m happy that I didn’t fuck it up too much. Whenever I am bored, or waiting for code to compile, I gaze at the grain and try to imagine the tree and how it grew that way.

Despite that it has many imperfections. There are filled cracks where the epoxy is permanently soft, delaminations between layers of epoxy, cracks that were not filled, epoxy shrinkage, bubbles, a splotchy surface on the edges, a minor warp, and probably more issues that I’ve forgotten or haven’t noticed. But none of these things ruin it. They remind me that I made it myself.

Correcting lens distortion using FFMpeg

2020-03-24T19:22:00+11:00

In recent less apocalyptic times, I used to play dodgeball - a team sport involving throwing foam balls at members of the opposing team. I hope that the COVID-19 outbreak passes soon and I can play it again, but in the meantime the closest I can get is watching videos of it and doing image processing on them. Our team records our matches for later analysis using a GoPro camera. The game is played in a rectangular court surounded by a net, and we mount the camera in the corner of the net so that the net does not obstruct the camera’s view of the court, and the camera’s field of view includes the entire court.

There are a number of issues with the recorded video which I wanted to fix:

The lens is a fisheye lens, and the captured video contains barrel distortion. This is not good for watching balls being thrown, because it makes it difficult to see if the trajectory of the ball was a straight line, or a curve.
The net that the camera is attached to is somewhat flexible and is frequently hit by balls and players. This causes the camera to shake violently, which makes for shaky videos.

For this post, I’ll focus on just the first issue of lens distortion.

I decided to make a first attempt using FFMpeg. FFMpeg is easy to get started with since it has a CLI and doesn’t require writing any code. It can read and write basically any media format, and also has a selection of filters that can be used to transform videos, some of which seemed relevant to my task. It’s also open source, which means it’s yours to do what you want with.

To test the various filters, I took a picture of the OpenCV chessboard calibration pattern, which looks like this:

OpenCV chessboard pattern

The photo of the pattern (which was on a TV screen, which is flat), looks like this:

Original photo of chessboard pattern

If you didn’t know that the test image was a chess board, it wouldn’t be obvious that in real life its all straight lines and right angles. My aim is to take this image from the camera, and produce an output that looks (geometrically) like the test image.

lenscorrection filter

The lenscorrection filter warps the image to correct for lens distortion according to the supplied parameters appropriate for the camera and lens. It acceps two parameters $k_1$ and $k_2$ , which correspond to a quadratic and cubic correction factor applied to the radius of a pixel from the center of the image.

Stack overflow

Obviously I’m not the first person to have this problem, and consequently I found a stack overflow thread where other people had posted various values for the same and slightly different cameras. Most of these did not work well at all, but one of them worked somewhat:

$ ffmpeg -i chess-raw.png -vf lenscorrection=k1=-0.227:k2=-0.022 chess-lenscorrection-so.png

chessboard corrected with lenscorrection filter using parameters from stack overflow

You could reasonably say that this is much worse than the raw image, even though in a sense it is closer to the ideal.

Hugin

I tried to use the Hugin lens calibration tool to find suitable values of $k_1$ and $k_2$ . The model of lens distortion that the lenscorrection filter uses is called poly5, and the value of $r_d$ (the radius in the original distorted image) is given as a function of $r_u$ (the radius in the corrected output image) as follows:

$r_d=r_u(1+k_1r_u^2+k_2r_u^4)$

Meanwhile, Hugin uses the following model (which is called ptlens):

$r_d=r_u⋅(ar_u^3+br_u^2+cr_u+1−a−b−c)$

To try to find common ground between these two models, we need to dispense with $k_2$ , because there is no $r_u^5$ term in ptlens. Similarly $a$ and $c$ have to go, because there is no $r_u^2$ or $r_u^4$ term in poly5. So setting $k_2=a=c=0$ the two equations simplify to the following:

$r_d=r_u(1+k_1r_u^2)$ $r_d=r_u⋅(1+br_u^2−b)$

If $b=k_1$ , these equations are “almost” the same. Unfortunately I couldn’t get rid of the $br_u$ term, but this is the closest thing I could find to an equivalence between the two models. Presumeably, the mismatched linear term would simply scale the image. So I took some pictures of apartment blocks and asked Hugin to find a ptlens model using only $b$ . Hugin gave the value $-0.08101$ , so I used that value as $k_1$ in the lenscorrection filter. This was the result:

$ ffmpeg -i chess-raw.png -vf lenscorrection=k1=-0.08101:k2=0 chess-lenscorrection-hugin.png

chessboard corrected with lenscorrection filter using parameters from Hugin

Obviously, this didn’t work well at all. I’m not sure where I went wrong.

This filter is simple and reasonably fast, but I could not find values for $k1$ and $k2$ which did a particularly good job of correcting for the distortion on my camera. Also, it only performs nearest neighbour interpolation, which results in visible aliasing in the output (I resized the images to 320x240 so that this is obvious).

lensfun filter

The lensfun filter is a wrapper for the lensfun library, which performs correction for many types of lens distortion including barrel distortion. It also includes a database of cameras and lenses and their measured characteristics. I found that the latest development version had a database entry for the GoPro HERO5 Black camera that I was using.

After some experimentation, I worked out that the parameters in the database were appropriate only for certain camera settings. The camera has a “Field of View” setting, with 3 different settings using a fisheye projection (more specifically, an imperfect stereographic projection, as I learned from the lensfun database), and also a “linear” setting (which results in standard rectilinear projection, but with a much smaller field of view). The camera also has a video stabilisation feature which results in a 10% crop of the recorded video (although the stabilisation itself did not work well). I found that selecting the “Wide” setting and turning off the stabilisation resulted in a video that was correctly undistorted by lensfun using the parameters in the database.

$ ffmpeg -i chess-raw.png -vf 'lensfun=make=GoPro:model=HERO5 Black:lens_model=fixed lens:mode=geometry:target_geometry=rectilinear' chess-lensfun.png

chessboard corrected with lensfun

scale parameter

Correcting for geometric lens distortion is a process that warps the image - i.e. it “moves” pixels from the source image to a different location in the destination image - or put another way, it maps pixels in the destination image to a different point in the source image. This means that the rectangular source image will not necessarily be mapped to a rectangle in the destination image. So there is a compromise to be made when choosing the scale of the output - either the output can be rectangular and have no blank areas (at the expense of discarding some of the input image), or it can include the entire input image (at the expense of having some blank areas in the output). Lensfun has a parameter called scale which controls this compromise. Unfortunately the FFMpeg filter wrapping lensfun did not have such an option. So I made a patch to add the option and pass it through to lensfun. This patch has been applied (hooray), so the following now works:

$ ffmpeg -i chess-raw.png -vf 'lensfun=make=GoPro:model=HERO5 Black:lens_model=fixed lens:mode=geometry:target_geometry=rectilinear:scale=0.4' chess-lensfun-scaled.png

chessboard corrected with lensfun, scaled to display entire input

interpolation

The default interpolation is bilinear, which is acceptable, and looks much better than the nearest neighbour interpolation as used in the lenscorrection filter. But lensfun also supports lanczos interpolation, which in theory should be better. I suspect there is a bug in it though, because the result doesn’t look as good as the default:

$ ffmpeg -i chess-raw.png -vf 'lensfun=make=GoPro:model=HERO5 Black:lens_model=fixed lens:mode=geometry:target_geometry=rectilinear:interpolation=lanczos' chess-lensfun-lanczos.png

chessboard corrected with lensfun, using lanczos interpolation

The lensfun filter is significantly slower than the lenscorrection filter, but did a much better job (more accurate corrections, and better interpolation). It also provides the ability to choose from multiple projections for the output (e.g. correct for the imperfections in the lens but maintain the stereographic projection, or output a equirectangular projection instead, etc), which I found interesting.

Aside: interesting learnings about lens distortion and projections

The term “distortion” comes with a negative connotation, but there are many reasonable ways to project a view of a 3D world onto a 2D image, each with different compromises. These projections are mappings from the angle at which light enters the camera lens $\theta$ (relative to the direction the lens is facing), to a distance $r$ (r for radius) from the center of the image. For example:

rectilinear projection $r=tan(\theta)$ : straight lines in the world appear straight in the image, but areas far from the center are stretched alot, and it is impossible to diplay points 90 degrees or more from the center (i.e. the image cannot show what is directly to the side or behind the camera). Most images use this projection, and most image processing algorithms assume that it is used.
stereographic projection $r=2tan(\theta/2)$ (approximately what the GoPro HERO5 Black produces with the “wide” FoV setting): Maintains angles as seen from the lens, does not stretch the edges of the image as much as the rectilinear projection, and works for any angle under 180 degrees (i.e. any direction except directly backwards). Does not maintain straight lines.

There are many other projections - see the lensfun list of projections and Wikipedia’s fisheye lens article.

Similarly, there are many different models for correcting the projection produced by real cameras and lenses (which may not be a simple mathematical formula) to suit one of the standard projections. These are usually polynomials applied to the radius of a pixel. Lensfun supports 4 different models for example. The lenscorrection filter appears to use the same model as lensfun’s LF_DIST_MODEL_POLY5. The lensfun database entry for my camera uses the different LF_DIST_MODEL_POLY3 model. Lensfun makes a relatively small correction to convert the image to a standard stereographic projection before separately converting it to the rectilinear projection.

v360 filter

After I did most of these experiments, the v360 filter was added to ffmpeg, which is very exciting. Like lensfun, it can convert between various common projections. Unlike lensfun, it does not do polynomial corrections to account for real world differences from standard projections, and it does not have a database of cameras and lenses. Instead, there are parameters to specify the standard projection and field of view of the input, and of the output. I found that the GoPro website helpfully lists the horizontal, diagonal, and vertical field of view for each of the field of view settings on my camera, and I know from reading the lensfun database that my camera creates images that are closest to a stereographic projection.

$ ffmpeg -i chess-raw.png -vf 'v360=input=sg:ih_fov=122.6:iv_fov=94.4:output=flat:d_fov=149.2:pitch=-90:w=320:h=240' chess-v360.png

chessboard corrected with v360, scaled to display entire input

This is not perfect (since my camera does not produce a perfect stereographic projection), but in my opinion it doesn’t look too bad. The curviness is less noticeable in the central area of the image, so if you adjust the output field of view enough that there are no unmapped areas, it looks better:

$ ffmpeg -i chess-raw.png -vf 'v360=input=sg:ih_fov=122.6:iv_fov=94.4:output=flat:d_fov=121:pitch=-90:w=320:h=240' chess-v360-zoom.png

chessboard corrected with v360, scaled to fill entire output

Roughly, v360 works in three stages. Firstly, it maps each input pixel to a vector, which represents the direction where the light came from (this is the inverse of the input projection). Then it optionally changes the camera angle according to the yaw/pitch/roll options (i.e. the direction vector for each pixel is rotated equally). This is different from cropping/translating the projected image because it moves the center of the image which all the projections are relative to. As a result, the resulting projected image looks exactly like it would have looked if the camera was facing in a different direction. The final step is to map these vectors to the destination image according to the chosen output projection and field of view. Here’s an example of using the rotation parameters to turn the virtual camera downwards by 15 degrees:

$ ffmpeg -i chess-raw.png -vf 'v360=input=sg:ih_fov=122.6:iv_fov=94.4:output=flat:d_fov=149.2:pitch=-105:w=320:h=240' chess-v360-down.png

chessboard corrected with v360, rotated down by 15 degrees

When I originally wrote this post, I found a few small but annoying bugs in the filter, which have since been fixed:

Although it can deduce the horizontal/vertical field of view if only the diagonal field of view is privided, it did not correctly do this unless the input projection was “fisheye” or “flat” (rectilinear), because that code was not implemented at the time. I worked around this problem by directly providing the horizontal and vertical field of view as parameters.
The stereographic projection had a built in pitch of 90 degrees, so if the input was stereographic, one had to use a pitch of -90 degrees to prevent the virtual camera from facing upwards instead of forwards.
The yaw and roll options both seemed to perform roll (i.e. twisting the camera lens) - while yaw was impossible (turning the camera sideways).

I also found some limitations (which remain):

The default output image dimensions are not sane.
The scale of the output is determined but the output field of view, so it is up to you to determine what that should be.

Despite not performing polynomial corrections and not having a database of lenses, v360 has a few advantages over lensfun. It is much faster, perhaps due to the presence of a SIMD optimised implementation in assembly. The rotations are useful if the camera wasn’t facing quite the right way, and produce much better output in this case than cropping. The lanczos interpolation works well. Its scope is smaller than lensfun and the code in my opinion is easier to read, if you like doing that.

Speed, quality, features: pick 2

If you don’t care about speed, you could use both lensfun (to perform accurate correction for a particular real world lens), and v360 (to use its perspective rotation feature):

$ ffmpeg -i chess-raw.png -vf 'lensfun=make=GoPro:model=HERO5 Black:lens_model=fixed lens:mode=geometry:target_geometry=fisheye_stereographic,v360=input=sg:ih_fov=122.6:iv_fov=94.4:output=flat:d_fov=140:pitch=-105:w=320:h=240' chess-lensfun-v360.png

chessboard corrected with lensfun, and rotated with v360

Conclusion

For my use case, I’ve found that using v360 is the best compromise. My camera produces images that are close enough to the stereographic projection that if I convert them to rectilinear using v360 they appear straight, at least if you aren’t thinking about lens distortion. The perspective rotation feature is useful if the camera wasn’t quite level, the interpolation works well, and it is faster than lensfun. The right compromise depends on your needs.

In the future, I might write a similar post about video stabilisation. I’m also currently working on a project using libavcodec, OpenCL, and OpenCV that I hope will be capable of video decoding, lens correction, stabilisation, and reencoding all on the GPU, which should be much faster than all these methods which run on the CPU.

Update (April 11, 2020)

updated the section on the v360 filter to point out that the bugs I found when I wrote this post have been fixed

Update (March 5, 2021)

The next post is up… but so far it’s more about GPU acceleration than video stabilisation. In the future I will get to the video stabilisation part!

Adventures making gnome-shell run in Valgrind: part 1

2019-12-30T20:13:20+11:00

There is a nasty sort of bug that can occur in languages without memory safety - bugs where memory is accessed incorrectly. There are a few reasons why these bugs are nasty, worse than many other types of bugs:

The consequences are often severe (usually the entire process crashes, or worse behaves unpredictably).
They often create security vulnerabilities.
They are difficult to debug. The symptoms may appear intermittently, and may manifest far away from the cause.

gnome-shell crashes

I came across such a bug in gnome-shell soon after the release of GNOME 3.24 in 2017. The symptom was that the gnome-shell process would crash a few times a day, at seemingly random times. Since I was using it as a Wayland compositor, this resulted in the entire session and all applications being closed. I was proud of my setup, having spent a few years working out how to completely avoid using Windows, and with a Linux setup using Wayland. Therefore, this bug was extremely annoying, and filled me with rage and an irrational desire to find and correct the problem at any cost.

First I reported the bug to the Arch Linux bug tracker. It was clear from the stacktrace that the crash was occuring in GJS, which is the GNOME Javascript runtime. Jan de Groot, the maintainer of the GJS package in Arch Linux, provided some test builds with different compilation flags, but these didn’t fix the problem. Soon after, a patch was provided by Philip Chimento, the upstream maintainer of GJS, to fix some memory access issues which he thought might be causing the bug. Some other users were satisfied that the problem was solved, but in my case the crashes kept happening. At this point it was clear to me that the problem did not lie in Arch Linux in particular, so I decided to engage with GJS upstream to try to debug the issue.

In order to make useful bug reports, I had to learn how to do a few things:

Compile packages from source in Arch Linux, especially with debug symbols (so that stacktraces contained locations in source code, not hex addresses in binary blobs), and with patches (to test bugfixes).
Read system logs efficiently (filtering by unit, showing logs only from the last boot, etc).
Read coredump files to extract stacktraces from crashed process dumps
Use GDB to attach to a running process and examine the state in the event of a crash
Switch TTYs in order to use a terminal even if the entire shell/UI was unusable for some reason (e.g., it was paused on a breakpoint)

Over the next few weeks, there were various bug reports of similar shell crashes. A series of patches appeared to fix the majority of those problems. All of these problems were intermittent and difficult to reproduce, and in most cases it appeared that the best the developer could do was to guess the cause and “[throw] patches into the void to see if they stick”.

Valgrind

I don’t write much C code, but I remember that in first year university I learned how to debug improper memory access with Valgrind. Valgrind runs your program and instruments memory access and allocation, providing warnings when your program accesses memory in incorrect or suspect ways. Where without Valgrind bad memory access might cause a program to crash in unrelated code some indeterminate time in the future, with Valgrind it would cause an error to be logged immediately, including a stacktrace of the bad memory access. It’s hard to overstate how useful Valgrind is in these cases.

There were two particular shell crash bugs which affected me and appeared to be the most difficult to fix, and on both those tickets the developers suggested that users who were able to reproduce the bug should run the shell under Valgrind and post a log. This would provide a clear indication to the developer of what caused the problem, even if they were not able to easily reproduce the bug.

This sounded easy - find the command used to run the shell, and put Valgrind in front of it. Unfortunately, it was not so simple:

code in various packages called from the same process triggered enormous quantities of Valgrind warnings. All these packages had to be compiled from source with the --enable-valgrind argument to configure or similar. This made the compiled binaries include special Valgrind annotations with exceptions for code that intentionally did things that triggered Valgrind warnings. In some cases that didn’t work and it was necessary to find a Valgrind suppression file elsewhere and use it manually when running Valgrind.
Running the shell directly did not have the same effect as running it from within a session (gnome-session). To have the shell work normally, it was necessary to create a session that ran the shell in Valgrind.
When gnome-shell was run as normal but in Valgrind, it tended to crash or hang and fail to start successfully. It turned out that the crashes/hangs happened reliably when the taskbar shell extension was in use - and also that the shell crash only happened when it was in use.

The first two problems were easily (if tediously) solveable, but the hangs/crashes presented a more significant barrier. How to debug a problem if the debugging tools prevent the problem from hapenning? I wasn’t the only user with this problem - other users also reported that they tried and failed to collect a Valgrind log because the shell simply hung forever.

This was sad to see - I assume that very many users had experienced the crash. A small proportion had found the stacktrace and looked up one of the bug reports. A smaller proportion had contributed useful information to them, and still less had attempted to reproduce the bug in Valgrind. These users could have provided the necessary information to solve the issue, but even that much effort was fruitless because Valgrind couldn’t easily be made to work.

A new quest

This problem was demoralising but also motivational. I didn’t understand the inner workings of GJS or SpiderMonkey and was not in a position to understand or solve the specific problems that were causing these crashes. But I could see that the inability of users to collect Valgrind logs presented a significant barrier to fixing the issues. This would also be the case for any other bug caused by bad memory access in the gnome-shell process (which was alot of bugs, most of them very bad bugs).

So my new quest was to discover why the shell did not run correctly under Valgrind. Nobody else seemed to know why that was the case or have enough time and interest to investigate it. I thought perhaps this could be evidence of deeper issues (code that doesn’t work in Valgrind probably doesn’t work reliably in general). Also, I thought that this was the underlying problem leading to the crashes, in a sense. There were various bugs observable in normal use and most of them were not directly caused by the Valgrind problem, but they were all difficult to fix as a result of the Valgrind problem, so in the long term the best path to fix such crashes was to fix the Valgrind issues so that memory access bugs could be diagnosed more easily.

To continue my investigation, I had to learn the relationship between a number of projects that seemed related to this issue (and who’s code ran in the problematic gnome-shell process):

DConf: key/value database used for storing, reading, and subscribing to changes in user settings.
GSettings: Wraps DConf (at least on Linux) and provides a cross platform API for reading, writing, and subscribing to changes in user settings. This is the standard API used for this purpose in GNOME.
GJS: The GNOME Javascript interpreter. Uses SpiderMonkey to run javascript in GNOME projects, and adds extra APIs including an interface to GSettings.
gnome-shell and extensions: Implement the shell UI. Mostly the UI code is written in javascript, which is run by GJS, and uses the GSettings API to access user settings.

Also, I had to learn a few more debugging techniques:

Valgrind can be run such that it integrates with GDB, and provides breakpoints when bad memory access occurs (using the --vgdb flag)
Valgrind has a --track-fds option, which logs a stacktrace whenever a file is opened
A stacktrace of the Javascript code in gnome-shell can be obtained from a running process by using the gjs_dumpstack function in GDB. This provides higher level context of what the process is doing (e.g. creating a clock widget as opposed to freeing a Javascript closure).

Symptom 1: immediate crash

The most easily reproduceable problem when running the shell in Valgrind was a crash that occured immediately after starting the shell, before the UI was visible. I found two open tickets about similar issues with the same stacktrace (i.e. crashes in normal use, without Valgrind). I also found other tickets going back at least a year (all unsolved) with similar symptoms and the same stacktrace. The stacktrace looked something like this:

#0 xkb_keymap_ref at src/keymap.c:59
#1 clutter_evdev_set_keyboard_map at evdev/clutter-device-manager-evdev.c:2399
#2 meta_backend_native_set_keymap at backends/native/meta-backend-native.c:427

In most of these cases, the issue was intermittent, and seemed to depend on the shell extensions installed. So it seemed like I had stumbled upon an opportunity, because I could reproduce the problem 100% reliably with Valgrind.

I posted more detailed information about how I debugged the cause starting from this comment. What I discovered after alot of breakpoints and calls to gjs_dumpstack was that:

The crash occurred because a system call to create a timer was failing with EMFILE (too many open files)
This timer was being opened in order to instantiate a Gnome.WallClock widget in JS land
The Gnome.WallClock was being instantiated inside a callback for when user settings are changed
The open files limit was reached primarily because instances of Gnome.WallClock were instantiated repeatedly until the limit (on my system, 1024) was reached

I felt that I was a step closer to understanding the problem, but It was not clear to me why that callback was being run so many times (Why would anyone want 1024 clock widgets? Why would a setting be changed 1024 times without any user input?). Without Valgrind, only a handful of the clock widgets were instantiated.

At least now I had a way of working around the problem. If I commented out the code that instantiated a Gnome.WallClock, the issue dissappeared. So I did that temporarily, and moved on to the next problem.

Symptom 2: hangs

The next problem was that the shell would hang when run in Valgrind with the taskbar extension enabled, and never reach the state where the UI was visible. The gnome-shell process was using 100% of a CPU core, so I assumed that the hang was caused by some sort of infinite loop.

I created a bug report for this behaviour. With alot of help from Philip Chimento I was able to use a mixture of pausing the process in GDB, breakpoints, gjs_dumpstack, and JS log statements to find the loop (or at least, the most obvious loop). There is a DConf setting disable-extensions which controls whether or not to disable shell extensions that are not verified to work with the current gnome-shell version. There is code in gnome-shell that responds to changes of that setting. When it changes, all extensions are disabled, and then the appropriate set of extensions is enabled again.

The problem was that the signal handler attached to changes of this setting was being called even though the setting had not actually changed. The loop happened because something that ran in the signal handler (e.g. in the process of disabling and reenabling all shell extensions) caused the original signal to be emitted again. The result was that the shell entered an infinite loop of disabling and reenabling all of the shell extensions.

Subsequently I discovered an existing bug report detailing a hang when running gnome-shell with the taskbar extension enabled on a BTRFS file system. My guess was that this was the same issue, and that in the right set of circumstances it could be triggered in normal use, without Valgrind.

This discovery threw new light onto the first symptom. After investigating that symptom further, it became clear that a similar issue was to blame. Signal handlers were called to handle changed settings even though the settings had not changed, and they were called (directly or indirectly) from inside the same signal handlers. It just so happenned that one of the things done in one of the signal handlers involved creating a timer, and as a result hitting the open file limit was the first thing to break.

Symptom 3: duplicated UI widgets

When running the shell in Valgrind with the workaround for the xkb_keymap_ref crash and with the taskbar extension disabled, I also noticed that one of the widgets created by another of my shell extensions was duplicated many times. Instead of there being one item on the status bar with CPU temperatures, there were several. This wasn’t a large problem, but it later turned out that it was related.

Fixing the bug

At this point I had found the nature of these various bugs and could explain why they occured. What remained was to determine how best to fix the problem.

The most obvious solution was to follow a pattern that had already been used in most cases in the gnome-shell Javascript code, save for a few exceptions. Any callback to a signal about changed settings should first read the new value and check if it was different from the previous value, and return early if it had not changed, thereby avoiding any side effects. In this bug report about the disable/enable extensions loop I posted some patches that did this. They solved the problem at least in some cases, but they didn’t seem like a good solution for a few reasons:

They complicated the JS code, because whenever a signal handler was used, the value of the settings being listened to had to be stored in JS, duplicating the existing state in DConf.
They required all JS code run in the shell (including code in third party extensions) to consistently perform these checks. Even if I did find and fix all the problematic callbacks in gnome-shell and in all of the extensions available today, the problem would probably return in the future when new callbacks were added, and people assumed that a changed signal meant that a setting had actually changed.
They felt like a workaround made at the wrong level of abstraction. Why should high level single threaded JS code need to workaround spurious signals caused by compromises in multiprocess C code?

My next thought was to go one level down the stack, to GJS and its wrapped version of the Gio/GSettings API. I suggested that similar checks should be performed transparently in GJS so that signals were only sent to client code when the values had changed. Philip did not like this idea because it would mean that the GJS interface to Gio/GSettings would no longer be a thin wrapper with no added behaviour, and would instead be a different API with different behaviour, requiring separate documentation. This approach would increase the complexity of the system. This assessment seemed reasonable to me. So I turned my attention deeper into the stack.

Finding the cause of the spurious signals

The various symptoms I had come across were not all triggered in the same way, as far as DConf/GSettings were concerned. I made two bug reports against GSettings:

spurious changed signals about specific keys (two different types of stacktraces)
spurious changed signals about all keys at once

At the time, DConf was (arguably) unmaintained. The original author and maintainer Allison Karlitskaya (formerly Allison Lortie, Ryan Lortie) appeared to have moved on from the project, and the only recent changes were related to the build system and not to functionality. Luckily two of the Glib maintainers, Mathias Clasen and Philip Withnall, helped me out. Mathias reviewed one of my patches, and Philip put me in contact with Allison, who reviewed one of the other patches. The first set of patches solved the problem, but also introduced other issues, from memory leaks to potential database consistency bugs.

As a brief aside, DConf has the following architecture (simplified, as is relevant):

There is a single writer service, which maintains the canonical database state and is solely responsible for making changes to the database and writing it to disk. It responds to requests sent over D-Bus to change settings, and emits signals over D-Bus when settings change.
There is a library called “engine” that can be linked with client software to access the database. Reads are done directly from the database on disk, while requests to write or to subscribe to changes are submitted over D-Bus to the writer service.
The engine library presents an optimistic (as in, optimistic concurrency) API to GSettings. Reads are synchronous and are handled within the same thread/process, but writes and subscribe requests also succeed immediately - before the central writer service has received and responded to the request.

With the help of Allison’s feedback, particularly relating to DConf’s consistency model, I identified 3 distinct fixable issues which caused spurious changed signals, and wrote a patch to address each:

The central writer service emitted changed signals for keys that were written, even if the new value was the same as the old value. The fix was to not do that.
The engine library emitted changed signals optimistically when write requests were made from that process, even if the new value was the same as the old value (according to that process’s view of the database). Again the fix was to not do that.
The engine library emitted changed signals for all keys in the database if a write occured in between when a subscription was requested and when it was confirmed by the writer service (because of the possibility that the change should have caused a signal, but didn’t because the writer service had not yet created the subscription). The fix was to keep track of keys with in progress subscription requests in the engine, and emit changed signals only for those keys in this case.

Bug fixed!

These fixes didn’t make it impossible for spurious changed signals to occur, but they made it much less likely, and even less likely for an infinite loop to occur as a result. I think they make it impossible for an infinite loop to occur unless the client code repeatedly sets keys to different values, or repeatedly subscribes and unsubscribes to the same key while writes are submitted at the same time (since otherwise an equilibrium would always be reached at a certain set of values and subscriptions).

With all of these patches, I could run gnome-shell in Valgrind with the taskbar extension enabled, and it worked normally. Also, startup time was significantly faster on account of not repeatedly enabling and disabling all extensions on startup (which previously happened 2-4 times on each startup even without Valgrind). I also found that the third symptom of duplicated UI widgets was gone, and presumeably also a similar issue with the Caja extension which had previously been traced to spurious GSettings signals.

Releasing the fixes

Having written these patches, confirmed that they fixed the problem, and reaching what I thought was a good enough consensus that they were the right approach, It seemed to me that the only thing left was to test them thoroughly, and to have them merged and released.

What followed was a longer and harder process than I expected - a process which is still ongoing. Part 2 of this post will go into the details of how I managed to have some of those patches released, and what still remains to merge the others.

The situation today is that the patch for issue 3 (see above) was merged in this pull request, along with a subsequent follow up pull request. These changes were released in GNOME 3.30 in 2018-09, and it appears that they were enough to solve the xkb_keymap_ref crash, which hasn’t been reopened since. I’ve just merged the pull request to solve issue 1, which is now on track to be released in GNOME 3.36 in 2020-03. Issue 2 is addressed by another pull request which still requires some cleanups before it can be merged.

Reflection

The original intermittent crash (or at least one of them) remained present for many months and caused me (and probably others) great annoyance. It was eventually fixed separately from my adventures with DConf. I don’t know exactly when, how, or by who, but I stopped experiencing it I think after the release of GNOME 3.26.

I hope that when all these fixes to DConf are released, and running gnome-shell in valgrind works without issue, that debugging similar problems is easier than it was when I experienced them. I’m glad that in this process I solved various other issues in GNOME that had been difficult to debug and had significantly detracted from the experience of many of its users.

Besides that, I’m glad to have been through this mad goose chase, and to have learned everything that I needed to learn along the way. Much of that knowledge and experience has been useful to me in other ways since.

Acknowledgements

Thankyou to Philip Chimento (ptomato), Allison Karlitskaya (desrt), Philip Withnall (pwithnall), and Mathias Clasen (mclasen) for the help and encouragement they gave me during this process. I couldn’t have done it without them.