The Shape of Everything
A website mostly about Mac stuff, written by Gus Mueller
» Acorn
» Twitter
» Maybe Pizza?
» Code
» Archive

The Core Image format kCIFormatBGRA8 means the pixel data will be ordered BGRA in memory (alpha last). To match that with Core Graphics, you'll have to use kCGImageAlphaPremultipliedFirst | kCGBitmapByteOrder32Little. Yes, it's alpha first for CG but last for CI.

Which doesn't make sense, until you stare at it for a long time. It's right there though. Alpha first (ARGB), but little endian for a 32 bit word. Not 16 bit, which at first glance you might assume to be the case. Which I did.

January 3, 2018

In Part II I mentioned how drawing through MTKView wasn't happening as frequently as I was asking it, and in Part IV I mentioned that if I specifically call the -draw method of MTKView when needing frequent updates, things ran a lot smoother.

I wasn't happy with that fix because it needed to be sprinkled around my code base whenever there were operations that caused a high load on the system. And what's fine on my machine might not be fine on a six year old machine. But OK, I'll do it because it obviously needs to be done. But I'm not happy about it, because it just felt wrong.

A few days ago I found myself needing to subclass MTKView (for a reason I'll get to in a moment), and while exploring its headers I noticed it implemented CALayerDelegate, which had me wonder I could move my -draw call into one of the delegate methods? It seemed like -displayLayer: was the obvious choice, so I implemented that in my subclass, called the super method and [self draw] immediately after, and everything began running perfectly smooth. Hurray!

I still don't understand why MTKView isn't doing this automatically for me, but my solution seems to be working out everywhere so I'll probably stop looking for reasons.

There's two ways to draw with MTKView. The first is to set a delegate of the class which implements -drawInMTKView:, or you can subclass MTKView and do your drawing in -drawRect:. I originally chose the delegate approach since I already had the drawing operations set up in another class.

Experienced Cocoa devs will notice that the -drawInMTKView: only passes in a view to draw to, and not an update rect of the region that needs to be changed. I thought that was a bit odd at first, but once you understand the architecture of how Metal displays pixels you'll soon realize that you're not drawing to the same texture on every pass like you would with regular NSView subclasses or even OpenGL. Instead, there's usually three textures that are used for getting pixels up to the display. One for your app to draw to, one for the GPU, and then one for the display. It's covered pretty well (with this timestamp link) in What's New in Metal, Part 2 from WWDC 2015.

And even if you subclass MTKView and implement -drawRect: which passes a region to be updated, you'll soon discover that the region handed to you is the entirety of the view. Which makes the CGRect part of that method call kind of pointless.

I get it though, you've got three different textures that need to be managed and it'll need to remember which region was updated on the previous draws and the textures aren't always given in the same order. And those textures come and go as your view resizes or maybe just because of other reasons. So MTKView throws it's hands up and says "Draw everything, all the time".

I still think that's suboptimal though, and I'm not afraid of managing those updates, so I wrote a (surprisingly small) bit of code to do just that and I ended up seeing this when I turned off display sync (via CAMetalLayer):

That's the Quartz Debug app showing a peak of 205 frames per second worth of drawing in Acorn. That was on a Late 2015 iMac, with a specially crafted brush, but wowsers anyway.

Previously: Part I, Part II, Part III, Part IV, Part IV.

December 23, 2017

Part I, Part II, Part III, Part IV.

Over on Twitter I brought up how some things are actually slower when using IOSurfaceRefs vs. using CGBitmapContexts to store bitmap data. For example, Acorn's painting engine is over 6 times faster when drawing through Core Image to a CGBitmapContext vs. drawing to an IOSurface. I was able to fix this problem by trying out some new async drawing APIs added in High Sierra, and since the Metal changes are only going to show up if you're on 10.13 or later, this was a reasonable solution.

Performance was much better, and possibly even better than before (I didn't take measurements, sorry) but drawing was still pretty slow for small brushes when the spacing was set close together. This has always been slower than I'd like, so it wasn't a new thing to worry about. But I was still thinking about it as I was doing dishes the other night (ideas always come at strange times) and I thought maybe if instead of doing individual dabs for the brushing, what if I batched up a handful and drew them all at once? It would take a bit of rearchitecting to make it happen correctly, but I could hack something in as a proof of concept.

It took about 10 minutes to implement as a hack, and brushing was noticeably faster right away. On my iMac, it was 4.5x faster when drawing to an IOSurface. Whoa.

This is the kind of thing I like to see, and what's cool is that this optimization would be useful for folks running on 10.11 and 10.12, right? So I turned off Metal and IOSurface drawing, which caused brushing to perform incredibly bad. In fact it was way slower than it would be normally. OK, so we'll have to make that optimization conditional on what kind of pixel store I'm drawing to*.

Another weird performance trick I've encountered, and I'm honestly baffled by, is drawing all layers to an IOSurface first and then handing that off to CIContext for drawing to the screen. It's easy enough to then save those bits as a cache so blending doesn't have to happen as often, but it actually makes things faster when painting with a brush, or applying filters. Why is this? Aren't things being composited to a texture anyway? It's basically a double composite, right? It's more work- why are things going faster?

I think getting real answers about this won't happen until I can sit down with some engineers at WWDC.

But the basic takeaway is that switching to Metal doesn't mean everything automatcially goes faster. Some things certainly do such as panning and applying filters, but other things need to be rethought to get the max performance out of them.

And then there's the issue of actually trying Metalcorn (get it, Metal + Acorn?) on another machine. Up to now it's been on a couple of iMacs. But I threw in on a 2012 MBP the other day and wow, it was janky and not at all what I was expecting. Rendering problems related to alpha channels, jittery behavior, and general funkyness.

When rendering through GL + CGBitmapContexts I always got predictable behavior that scaled linearly with the CPU power of your machine. Not so with Metal.

* You might be wondering why the painting optimization didn't work when drawing to CGBitmapContexts. Well, I have some ideas but I don't feel confident enough to voice them with any sort of authority. Bus speeds vs local memory vs destination contexts? Also there was an upper limit to the number of dabs I could batch up before I saw graphics artifacts / corruption appear. GPUs are weird.

December 21, 2017

Bloomberg: Apple Plans Combined iPhone, iPad & Mac Apps to Create One User Experience.

I have thoughts and feelings about this report by Mark Gurman. From the article:

"The same approach hasn’t worked nearly as well on Apple’s desktops and laptops. The Mac App Store is a ghost town of limited selection and rarely updated programs. Now Apple plans to change that by giving people a way to use a single set of apps that work equally well across its family of devices: iPhones, iPads and Macs."

I find myself upset about the App Store quite often, but I think calling it a ghost town is a bit much. Does it have all the best apps available? No, it does not. Does it have frequently updated and exclusive apps of its own? Yes, it does.

What’s more, Apple customers have long complained that some Mac apps get short shrift. For example, while the iPhone and iPad Twitter app is regularly updated with the social network’s latest features, the Mac version hasn’t been refreshed recently and is widely considered substandard. With a single app for all machines, Mac, iPad and iPhone users will get new features and updates at the same time.

There's an easy solution for updating the Mac version of Twitter to have the same features as its iOS peers. Twitter has to care enough to update it. That's it. It's not as if there needs to be massive engineering efforts put behind it. It's not as if the road hasn't already been explored and the server APIs already exposed (which they must have done for the iOS version). They just need to put some effort and care to it. Tweetie for the Mac, which Twitter for the Mac is based on, was built from the ground up by a single person. All Twitter had to to was maintain it. And Twitter, Inc. couldn't be bothered.

And we see the same behavior from other vendors happening on the iPad where a shared framework already exists (Instagram being the prime example). The opening line of that paragraph could easily have been written as "What's more, Apple customers have long complained that some iPad apps get short shrift".

"The plans are still fluid, the people said, so the implementation could change or the project could still be canceled."

I hate lines like this, as it gives complete cover for the reporter if nothing described ever comes to pass.

"It’s unclear if Apple plans to merge the separate Mac and iOS App Stores as well, but it is notable that the version of the store running on iPhones and iPads was redesigned this year while the Mac version has not been refreshed since 2014."

Again, someone has to care. In this case it's Apple. The Mac App Store hasn't been refreshed since 2014 because Apple doesn't see it as a priority. So we get broken APIs for developers, no gifting, no video previews for the platform where Quicktime was invented.

How a shared UI framework is going to make big companies somehow care is beyond me. This mysterious new framework isn't going to magically give iOS the ability to use mouse input, or let all Macs gain touch input via the screen. Work will still need to be done making the UIs work properly on their respective platforms. And someone's going to need to care enough to make that happen.

"Several years ago, the company began designing its own processors for iOS devices. It has started doing the same for the Mac, recently launching a T2 chip in the iMac Pro that offloads features like security and power management from the main Intel processor onto Apple-designed silicon. Much the way Apple plans to unify apps, it could also one day use the same main processor on Macs and iOS devices."

The dream for Macs running on an A-series chip from Apple will never die (I for one, would welcome this, assuming performance would be acceptable).

I feel like this article from Gurman could have been reduced down to: "We think Apple might some day have a shared UI framework for iOS and MacOS. Apple could even create some sort of cross store bundling or a single store with a single binary for all platforms when using this framework (even though there's nothing stopping Apple from doing this today). That sounds neat and wouldn't it be cool if all platforms also used the same processor to boot? This may or may not happen starting next year, and it could very likely be canceled as well. Apple declined to comment on our sensational story."

What about the crux of the article, that Apple is working on a shared UI framework between iOS and MacOS? I wouldn't find it surprising. I could also see it being written completely in Swift (though personally I'd rather it be in Obj-C for maximum interop with existing frameworks).

But history is filled with cross platform UIs and write once run anywhere dreams. None of them turned out insanely great.

I think cross platform UI classes like MTKView (which inherits from NSView or UIView depending on the platform you're on) is a great idea, and a great way to share common code. I'm not sure that sharing a single hierarchy of classes across the platform is going to go the way folks think it will.

December 19, 2017

Marc Edwards: Colour management, part 1 and part 2.

"If someone asked you to build a coffee table and they specified the legs as a height of 50, what do you think that would mean? 50 kilometres? 50 feet? 50 inches? 50 millimetres? Probably 50 centimetres. You can’t know for sure, but you can guess, based on the table’s intended use — 50 kilometres, 50 feet, and 50 inches are way too big for a coffee table, and 50 millimetres is way too small."

Marc does a great job at explaining color spaces for developers, and non-devs as well. If you're a developer who deals with color at all, I highly recommend you read these posts.

December 18, 2017

Part I, Part II, Part III.

A status update of sorts.

I got the redraw issues with MTKView figured out. The problem was that I would ask the Metal view to update 400 times over 5 seconds, and it wouldn't. Presumably this was because of a high load on the system while doing a flood fill or magic wand operation, because once I stopped or slowed down a bit everthing caught up and started drawing again. I think MTKView treats setNeedsDisplayInRect: as more of an advisement, and if you really want things to draw then you also need to call draw on the instance. Because of the way things are setup in Acorn I could make this happen with every canvas update call, but for now I'm only doing so with high load operations like flood fill and brushing.

Deep color is also working through Metal now, and everything is great with 128 and 32 bit images. 64 bit images though… well, there's a mismatch between what the pixel format Acorn currently uses for this format and what IOSurface will take.

Acorn 6 and earlier uses uint16_t to store a pixel component (16 bits per component x 4 for RGBA = 64 bits) because that's super easy to grab the values out of for picking out colors or finding opaque bounds of images.

CGBitmapContext supports 16 bit RGBA (kCGImageAlphaPremultipliedLast), and Core Image does as wel (kCIFormatRGBA16), but IOSurface doesn't.

IOSurface does however support 16 bit ARGB, which would be easy enough to switch to, but CGBitmapContext and Core Image don't.

The only format that all three support are 16 bit half float RGBA.

Half what? What's a half float? Well, it's a format that GPUs are very keen on but there's no built in type for it in C, Objective-C, or Swift. This makes reading the individual pixel values kind of a pain.

There's some fancy bit twiddling magic out there for treating a half float as a uint16 and then converting that into a regular 32 bit float, but you can also use vImage's vImageConvert_Planar16FtoPlanarF to do the same with some a little bit of setup. That's what I'll probably end up doing. (Intel also has some Intrinsics for doing these, but they crash the clang compiler when I try).

I'm not keen on changing a pixel format in a .1 update of Acorn, but with enough testing it should be alright. I just have to carefully create some new OpenCL routines and then all will be good. Right?

December 16, 2017

Part I, Part II.

One obvious question that hasn't been asked of me yet, but the answer of which I will go on and on about (especially if you were unlucky enough to be sitting next to me last Thursday at Cyclops for our bi-monthly dev meetup, come join us!), is why now? Why has it taken Acorn so long to begin using IOSurfaceRefs for images?

The answer is slightly complicated, involving older codebases and moving tech and me hating OpenGL and a couple of other reasons, but it basically comes down to one thing:

I'm an idiot.

Or to put some kinder words on it, my understanding of how IOSurfaces work was incomplete.

Let's take a look at what Apple has to say. The first sentence from IOSurface's documentation is as follows:

"The IOSurface framework provides a framebuffer object suitable for sharing across process boundaries."

IOSurface is neat. A shared bitmap that can cross between programs, and it's got a relatively easy API including two super critical functions named IOSurfaceLock and IOSurfaceUnlock. I mean, if you're sharing the data across process boundaries then you'll need to lock things so that the two apps don't step on each other's toes. But of course if you're not sharing it across processes, then you can ignore those locks, right? Right?

Of course not, as I eventually found out.

The thing was, I was already mixing IOSurfaceRefs and CGBitmapContexts successfully in Acorn without any major hickups. I could make an IOSurface, grab it's base address (which is where the pixels are stored), and point a CGBitmapContext ref at it and go on my merry way. I could draw to it, and clear it, and make CGImageRefs which would then turn into CIImageRefs for compositing, and everything was awesome.

What I couldn't do though, was make a CIImage directly from that IOSurface. Every time I tried, I'd end up with an image that was either 100% blue, or 100% red. I had convinced myself that these were some sort of mysterious debugging messages, but I just hadn't come across the correct documentation letting me know what it was. So once or twice a year I would mess with it, get nowhere, and go back to the way that worked.

Well a couple of weeks ago I was trying again, and I got more frustrated than usual. I searched Google and GitHub for IOSurface and CGBitmapContext (in anger!), but I couldn't find anything that was relevant to what I wanted to do. More Anger. This should work! Then I thought… what about if I search my own computer using Spotlight? Maybe it'll turn something up…

And then a single file came back, named IOSurface2D.mm, which was some obscure sample code from Apple that I had received at one point a number of years ago.

I opened it, I looked, and I was happy and angry and relieved and sooo very mad at myself.

Yes, you can use a CGBitmapContext with an IOSurface without locking it. But then some other frameworks are eventually going to grab that same IOSurface for drawing and they are going to lock it and then some crazy black magic is going to swoop in and completely ruin your image. Even if you aren't using it across processes. So you better make sure to lock it, even if you're not actively drawing to it, or else things are going to go south.

And that's what I did. All I needed to do was call IOSurfaceLock and Unlock before doing anything with it, and everything was smooth and happy. And I quickly found that if I turn off beam-synced updates in OpenGL I could peg Quartz Debug's FrameMeter to over 90fps.

That was nice. And it was about time.

I've updated my FMMicroPaintPlus sample code to use this new technique, which you can find in FMIOSurfaceAccumulator.m

Since that discovery I've moved Acorn off OpenGL to Metal 2 as well as using newer Core Image APIs introduced in 10.13 (if you are on previous OS releases, it'll use the old way of drawing).

And now for a completely uninformed discussion about IOSurface

What is this black magic? Why does locking an IOSurface before wrapping a CGContext around it matter? Where, exactly, does the memory for the IOSurface live? Is it on the GPU or is it in main memory? Or is it both?

I can take a guess, and I'm probably wrong, but it's the only thing I've got right now. I think that IOSurface is mirrored across the GPU and main memory. And after you've unlocked it for drawing then something in the background will shuttle the data or subregions of it to or from the GPU. You can address the memory as if it's local, and everything just works.

If this is true, then I think that's amazing. Apple will have made a wonderful tech that transparently moves bits around to where it's needed and I don't even have to think about fiddling with the GPU.

Apple just need to add a note to the documentation that locks are needed even if you aren't sharing it across process boundaries.

December 14, 2017

This is continuing from my previous post on moving Acorn to Metal.

Over the past week I've made some pretty good strides on having Acorn use Metal 2 and the new Core Image APIs introduced in 10.13 High Sierra. I've reworked how the pixels draw from the canvas NSView subclass to the display by introducing a little shim view which rests above the canvas and does the actual rendering. (Previously the canvas was an NSOpenGLView subclass). So if you're running Acorn on MacOS 10.12 or earlier you will get the previous OpenGL rendering, and if you're on 10.13 or later you'll get the super fast Metal 2 + IOSurface rendering.

And wow is it buttery smooth. Opening up a 100 megapixel image and then panning, brushing, and zooming around is super fun and has made this work 100% worth it. I've also found a couple of other optimizations that will also help folks out on previous OS releases as well, so there are wins to go all around.

Of course, this is all just testing on my dev machines for the time being. Who knows what's going to happen when it gets out onto everyone's machines- GPUs are notorious for being flaky. But with that in mind I've also added a preference to switch back to the OpenGL renderer, as well as a software renderer (and because of the architecture I came up with, was only about a dozen lines of code to support).

All my regression tests are now run against the both renderers as well, and I found a couple of instances where drawing deviated from CPU to GPU, but it wasn't anything unexpected.

So what's left to do?

So far, I've not been able to get deep color to display correctly when using Metal and IOSurface. I can set the pixel format to MTLPixelFormatBGR10A2Unorm, and the first few frames render correctly, but then things quickly go south from there with the introduction of funky florescent colors and black boxes showing up. I've got more digging to do, but I think it might actually be issues with IOSurface and not Metal. That's just more of a hunch at this point though.

The other show stopping bug I'm running into is drawing through MTKView isn't happening as frequently as I'd like when I seem to be taking up a lot of CPU power. I have a feeling this has to do with my ignorance on Core Animation and some sort of transaction system I'm not familiar with.

This problem mainly shows up when using the magic wand tool. Using the magic wand you can click on the canvas and drag out to change the tolerance for the pixels you'd like to be selected. For each mouse event I get I recalculate the tolerance, perform the operation on the pixels, and then tell the view to update the changed region.

If I've got the renderer setup to use OpenGL or software, then I get a 1:1 mapping of the tool calling setNeedsDisplayInRect: to update a region, and then drawRect: being called to actually do the new drawing.

If I'm using Metal, then I can call setNeedsDisplayInRect: 20 times before drawRect: is actually ever called. And it only seems to happen when I stop moving the mouse and the load on the CPUs go down. This makes it look like things are going slower then they actually are, or that the tool isn't even working.

I'm also seeing something which might be related to this when using the Window ▸ Zoom Window menu item. When Zoom is called and I'm using Metal, drawRect: is never called during the animation an instead my image is squished or expanded, depending on which way the window is moving. That's no good.

But everything else is good. Really good in fact.

December 14, 2017

Primate Labs has just acquired VoodooPad.

I originally wrote VoodooPad in 2003, and then sold it to Plausible Labs in 2013 so I could focus on Acorn. Besides rewriting the encryption, Plausible never really updated VoodooPad. This seemed a shame to me, and I felt my customers were let down by this lack of updates.

But now VoodooPad is in the hands of Primate Labs, and I'm hopeful something will happen with it. I've known John Poole (the founder of Primate Labs) for a number of years, and I trust him. His company has a number of apps, and most importantly has shown that they know how to ship updates.

I had no idea this was coming, but I'm super happy it did. I still use VoodooPad every day and I'd love to see an update.

December 8, 2017

I'm taking a little break from building out the New App to get started on Acorn 6.1. I don't do code names for releases anymore, but if I did this one would be called Acorn "Whatever Gus F'n Wants to Do" Version 6.1.

So what do I want to do in 6.1? A couple of things. No new user facing features, updating the pixel plumbing, and an option for bringing color back to the UI (color has actually already started for 6.0.4, with a secret defaults pref: defaults write com.flyingmeat.Acorn6 colorPalette 1. If you turn it on it's obviously not finished, but you might like it better regardless).

Why would I want to update the "pixel plumbing" and just what does this plumbing really mean?

Acorn versions 4 through 6 store the memory for your layers in your computer's main memory- ie, not on the GPU. Drawing is still done through the GPU via OpenGL, but in most cases I make sure that the pixel processing happens on the CPU.

There's a couple of really good reasons to have the pixel processing happen on the CPU. The main reason I've done this is for fidelity. For many years GPUs have been more concerned about speed than accuracy. Well, I care about accuracy so that's why Acorn has filters running on the CPU.

Another reason is so Acorn has easy and fast access to the pixels for operations which can't be done on the GPU in a reasonable manner. Things like seed fill operations (which make up flood fill, magic wand, instant alpha) need quick access to the pixels and I haven't found a great way to run that on the GPU yet. It's not an inherently parallel operation.

And the the last major reason is just about the amount of memory Acorn can gobble up. I've had people create and edit terapixel images in Acorn (Mega: 1 million pixels. Giga: 1 billion pixels. Terra: 1 trillion pixels). It might be slow, but it's possible. My fear is that if someone tries to do that with the GPU, then that just won't be possible because of the limited amount of memory available there. (The obvious solution to this is to fallback to CPU rendering in these cases, but to be honest I haven't really explored that yet.)

This past summer at WWDC I got some good news about Core Image on the GPU- most of my concerns about fidelity are gone in MacOS 10.13 High Sierra. So I decided then that Acorn 6.1 would try and switch to Metal when running on 10.13.

Then 10.13 was released, and I put out the usual maintenance releases to fix little bugs that come with any major OS update. But something very odd happened as well- 10.13 customers were reporting slow brushing with images up to a certain size. I couldn't get the problem to reproduce on my test images, so I had folks send in the images they were having problems with and all of a sudden I got the problems to reproduce. The problem on my end was that the images I was testing with were too big.

What was going on? Core Image was taking my images created in main memory and copying them to (what I presume to be) IOSurfaceRefs. This is fine for smaller images, but when you get to bigger images those copies can take a while. But there's an upper limit to the amount of memory Core Image is willing to copy before it says screw it, and my test images were over that limit. So instead of making copies to an internal CI only buffer, it would then reference the original memory for these giant images. So brushing on big images was ironically faster.

While I've got some workarounds in Acorn 6.0.4 to keep copying down to a minimum, it isn't a solution I'm happy with. Instead I should change how Acorn stores the memory for images to something which has less of an impedance mismatch with Core Image.

So not only is Acorn 6.1 switching to Metal on 10.13 High Sierra, it will also be switching to using IOSurfaceRefs to store the pixels in. These are two very big changes and I've made some great progress over the past week with this, so I'm 99.9% sure it'll ship this way.

So that's what the new pixel plumbing will look like. Moving Acorn from local memory backed images (CGImageRef + CGBitmapContext) and pushing through OpenGL, to IOSurfaceRef backed images and pushing things through Metal 2.

It's been fun so far, and hopefully I'll have a test build later this month or early 2018 for people to play with.