Parallel Vector Drawing to a CGBitmapContext

» Acorn
» Retrobatch
» Mastodon
» Bluesky
» Micro.blog
» Instagram
» Github
» Maybe Pizza?
» Archives
» Feed
» Micro feed

November 17, 2025

I recently came across a post by Aurimas Gasiulis on Parallel vector graphics rasterization on CPU, which I found pretty intriguing. Gasiulis wrote a vector shape renderer named “Blaze” and made it work across multiple threads. Very cool stuff.

But I was shocked when performance charts were provided, which included comparisons to rendering with Apple’s Core Graphics library. Blaze was around 20x faster at rendering than CG.

This sounds like a wonderful nerd snipe to me.

Does Core Graphics (CG) already do parallel drawing? Not that I’m aware of, at least when it comes to rendering vector shapes. What if we just … draw on multiple threads with the same CGContextRef? That seems like a bad idea and as far as I’m aware, CGBitmapContext isn’t thread-safe. And it turns out that yes - this is a bad idea. Segfaults abound.

What if we make multiple CGContextRefs for each thread, but they all reference the same backing store? And what if we clip to horizontal stripes like the Blaze renderer does?

Well, that seems to work just fine, and it’s a nice speed boost. A quick little test in Acorn where it rendered 28,200 shapes took about 0.93 seconds single-threaded, and only 0.11 seconds with the parallel drawing.

Both are very fast for rendering that many shapes. But if we make the rendering a bit more complex by giving each of those shapes a 5pt drop shadow? Here we get 17.8 seconds for a single-thread render, and 2.7 seconds for a parallel render. Even though the multiplier is about the same (~8× speedup), 15 seconds saved is a lot.

Were there any rendering glitches? There were very, very tiny anti-aliased edges that were not perceptible to the eye, and the images were otherwise identical (and I used Acorn’s “Compare Two Front Windows” command (new in 8.3!) to figure this out). The implementation is very simple as well:


CGContextRef FMCGBitmapContextCopySharingData(CGContextRef originalCTX) {
    CGContextRef ctx =
        CGBitmapContextCreate(CGBitmapContextGetData(originalCTX),
        CGBitmapContextGetWidth(originalCTX),
        CGBitmapContextGetHeight(originalCTX),
        CGBitmapContextGetBitsPerComponent(originalCTX),
        CGBitmapContextGetBytesPerRow(originalCTX),
        CGBitmapContextGetColorSpace(originalCTX),
        CGBitmapContextGetBitmapInfo(originalCTX));

    return ctx;
}

void FMCGBitmapContextParallelDraw(CGContextRef context, void (^b)(CGContextRef chunkContext, CGRect chunkRect)) {

    size_t height = CGBitmapContextGetHeight(context);
    size_t chunkSize = 100;
    size_t chunks = (height + chunkSize - 1) / chunkSize;

    dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0);

    dispatch_apply(chunks, queue, ^(size_t chunk) {

        CGContextRef chunkContext = FMCGBitmapContextCopySharingData(context);

        CGRect chunkRect = CGRectMake(0, chunkSize * chunk, CGBitmapContextGetWidth(chunkContext), chunkSize);

        NSGraphicsContext *currentNSContext = [NSGraphicsContext graphicsContextWithCGContext:chunkContext flipped:NO];
        [NSGraphicsContext saveGraphicsState];
        [NSGraphicsContext setCurrentContext:currentNSContext];

        CGContextSaveGState(chunkContext);

        CGContextClipToRect(chunkContext, chunkRect);

        b(chunkContext, chunkRect);

        CGContextRestoreGState(chunkContext);
        [NSGraphicsContext restoreGraphicsState];
    });
}

Can this be made any faster? Maybe by reusing the temporary contexts? But an 8× speed improvement is good enough for me for now. Will I use this code in production? I’m not sure yet, but I’m tempted.