Around 10 years ago – wow, time flies! – I had a conversation with the genius, friend and fellow Spaniard Íñigo Quílez about the extreme difficulty of including 3d models in 4kb demos.
Thinking in ideas for a demo to submit to the 2018 edition of the js1k contest, I reminded this conversation with Íñigo. Knowing that in 4kb it is so difficult, my first reaction was to think that it would be impossible in only 1kb. Soon after I thought that maybe it would be possible with a model at a low level of detail, but enough to make something resembling an interesting shape like a human head. So I had to try. Challenge accepted.
After I could accomplish the first step – something resembling a human head – I continued using all my knowledge to keep increasing the model quality up to the point of my submission to js1k. The development has been quite a journey that I will try to describe in this post.
The Rendering Method
The demo has 3 main parts: a rendering method, a data decompressor and the data itself. The rendering method is a parallel projected cylindrical heightmap.
A heightmap is a greyscale image where each pixel's intensity represents elevation in a 3d surface:
If you apply the heightmap to a cylinder, then you have the cylindrical heightmap:
Finally, with parallel projection what I mean is that the demo is not using perspective, and so it becomes a really simple rendering method – probably the simpler 3d rendering method that allows to rotate an object.
This method is an old demoscene effect. I first saw it in Despair by Iguana from 1996, and I thought it could be the first implementation, but I contacted the author and he pointed me to previous demos including it: Parallax by Zif, Neural Assault by Rage and Hardwired by The Silents & Crionics – thanks Jare for the info!
Having the cylindrical heightmap, I needed a good 3d model. A cylindrical heightmap is limited in the types of shapes it can reproduce, so a good model should be one that adapts well to it.
The Nefertiti bust not only adapts really well to a cylindrical heightmap, but it has a lot of great properties that makes it perfect for the demo. It is well known and recognizable, it is one of the finest pieces of art in history, it is extremelly symmetrical – that allowed me to store data only for half of it and mirror the other half –, it is graceful and, last but not least, it is readily scanned and available.
With the scanned data I adjusted the rotation searching for the maximum symmetry and then I generated and edited this heightmap that mixes parts of the left and right side of the bust:
The Data Decompressor
As heightmaps are images, the data decompressor is a lossy image decompressor.
The most efficient lossy image compression methods use entropy encoding. Unfortunately, based on my experiments, I believe that it is not worth to use entropy encoding for js1k due to the contest limitations. Given this, the existing options are limited and I tried a good amount of them:
- Simple texture compression methods – S3TC, PVRTC – doesn't provide enough compression.
- A decoder for ASTC – the state of the art in texture compression – takes too much space.
- Vector based approaches provide reasonable/good results, but there is a quite big amount of face and head features that need to be captured to make Nefertiti recognizable. After several attempts with different types of curves and gradients I didn't get any outstanding result.
- Multi-scale methods feel natural for this image that has so many soft areas, however the increased code size and the lack of entropy encoding make them seemingly not suitable.
- Lossless compressors suitable for js1k like jscrush are not good enough to be used in place of entropy encoding.
There was one more option. Years ago I was experimenting in this same field, trying to fit a decompressor and data for an image in the minimum space, with the highest quality. And I came up with a block compressor based in sinusoidal circular waves. I even created an attempt of js1k entry that I never submitted.
The idea from sinusoidal circular waves came up from thinking in ways to generate blocks containing multiple types of gradients, angles, curves and folds with fixed size data and without any kind of quantisation nor entropy encoding. These are some examples of the blocks Nefertiti's code generates:
And these are the waves where the blocks are clipped from:
As it is not a proper transform, it is only useful in a limited size context like js1k, for any other purpose it would be really useless, but hey, it works great for js1k!
For images, blocking artifacts are not too bad and they can even increase the perceived sharpness. For heightmaps they are really bad, as you can clearly see in 3d the grid pattern they generate. And they are specially bad if you have a face as it is the case. So I had to implement a deblocking filter.
The deblocking I have implemented is applied to all blocks as the equivalent of a bilinear interpolation:
I wanted to concentrate the detail in the face, and to do so I switched the grid intervals to be non-regular as shown in the image, moving detail from surrounding areas into the face. The image at the right is the one that I ended compressing:
Compressing the Heightmap
Before adding the deblocking the compression was straightforward: brute-forcing block by block to get the global best solution was taking a couple of seconds.
But after adding the deblocking the thing becomes highly exponential. Each block affects its surroundings. A change in a block can propagate and cause a change in one of its neighbours and so on, creating a wave of changes. So, how to compute it?
For optimization purposes, my favourite and first choice algorithm is hill climbing. In some cases it works great, in this case I was having issues. It was reaching local minimums very fast and the solutions were bad. One issue was even causing visual glitches: a bad block was fixed by a neighbour block and so leaving sometimes a glitch in the process and becoming interlocked.
Trying to find a solution for these issues, I had a magical idea. I have tried to search in Google if it has been discovered before or not without results, but I can't be sure that I have been the first. If any expert reading this – is there anybody reading this at all? – and knows about it, please contact me! Anyway, the idea:
Find 2 or more evaluation functions that share a global minimum for your problem. Choose one of those functions randomly at each step of the hill climbing.
The evaluation functions that I have used are mean squared error, mean absolute error and others. All issues dissapeared by using this method. It is by far the very best strategy that I have ever used with hill climbing. Using it increased 3dB the PSNR.
Optimizing the decompressor to the heightmap
The following code is the one generating the blocks:
49 + p % .787 * 139 * Math.sin(p - ((p % 67 - 33 - i * 2) ** 2 + (p % 53 - 26 - j * 3) ** 2) / 132) ** (3 + p % 6 * 2) + p % .909 * 198
The numbers in red and bold are configurable parameters for the decompressor. By adjusting these parameters you can generate blocks that fit better or worse the target image. I automated a search for good parameters, and so the decompressor is adapted to the heightmap.
Comparison with JPEG
Just as a reference a comparison with JPEG:
To be fair, I should mention that this tells more about how bad is JPEG at high compression rates than how good my compressor is. I did this by compressing the heightmap resized to 128x128. To help JPEG, I tried resizing it to 64x64, and the PSNR increased, but even so it was far from my compressor.
I compared also against WebP. WebP was clearly better than JPEG, but yet worse than my compressor.
The other option available in browsers is H.264. I thought in using a single frame video stored in H.264 but I was unable to generate a video file little enough to fit in 1kb. It might be possible, but my knowledge about video compression is limited.