Moving to Metal Episode V, Weird Performance Hacks

» Acorn
» Retrobatch
» Mastodon
» Bluesky
» Micro.blog
» Instagram
» Github
» Maybe Pizza?
» Archives
» Feed
» Micro feed

December 23, 2017

Part I, Part II, Part III, Part IV.

Over on Twitter I brought up how some things are actually slower when using IOSurfaceRefs vs. using CGBitmapContexts to store bitmap data. For example, Acorn's painting engine is over 6 times faster when drawing through Core Image to a CGBitmapContext vs. drawing to an IOSurface. I was able to fix this problem by trying out some new async drawing APIs added in High Sierra, and since the Metal changes are only going to show up if you're on 10.13 or later, this was a reasonable solution.

Performance was much better, and possibly even better than before (I didn't take measurements, sorry) but drawing was still pretty slow for small brushes when the spacing was set close together. This has always been slower than I'd like, so it wasn't a new thing to worry about. But I was still thinking about it as I was doing dishes the other night (ideas always come at strange times) and I thought maybe if instead of doing individual dabs for the brushing, what if I batched up a handful and drew them all at once? It would take a bit of rearchitecting to make it happen correctly, but I could hack something in as a proof of concept.

It took about 10 minutes to implement as a hack, and brushing was noticeably faster right away. On my iMac, it was 4.5x faster when drawing to an IOSurface. Whoa.

This is the kind of thing I like to see, and what's cool is that this optimization would be useful for folks running on 10.11 and 10.12, right? So I turned off Metal and IOSurface drawing, which caused brushing to perform incredibly bad. In fact it was way slower than it would be normally. OK, so we'll have to make that optimization conditional on what kind of pixel store I'm drawing to*.

Another weird performance trick I've encountered, and I'm honestly baffled by, is drawing all layers to an IOSurface first and then handing that off to CIContext for drawing to the screen. It's easy enough to then save those bits as a cache so blending doesn't have to happen as often, but it actually makes things faster when painting with a brush, or applying filters. Why is this? Aren't things being composited to a texture anyway? It's basically a double composite, right? It's more work- why are things going faster?

I think getting real answers about this won't happen until I can sit down with some engineers at WWDC.

But the basic takeaway is that switching to Metal doesn't mean everything automatcially goes faster. Some things certainly do such as panning and applying filters, but other things need to be rethought to get the max performance out of them.

And then there's the issue of actually trying Metalcorn (get it, Metal + Acorn?) on another machine. Up to now it's been on a couple of iMacs. But I threw in on a 2012 MBP the other day and wow, it was janky and not at all what I was expecting. Rendering problems related to alpha channels, jittery behavior, and general funkyness.

When rendering through GL + CGBitmapContexts I always got predictable behavior that scaled linearly with the CPU power of your machine. Not so with Metal.

* You might be wondering why the painting optimization didn't work when drawing to CGBitmapContexts. Well, I have some ideas but I don't feel confident enough to voice them with any sort of authority. Bus speeds vs local memory vs destination contexts? Also there was an upper limit to the number of dabs I could batch up before I saw graphics artifacts / corruption appear. GPUs are weird.