Animation Compression
Simple concepts that helped flesh out animation compression as a graduate student.
Hello World!
Who am I?
My name is Kerwin Ghigliotty Rivera but you can call me Rein, I am a Tools Programmer and I have worked in the Games Industry for some time now specializing in UI tools for different aspects in gaming.I received a Master’s Degree in Game Programming from the DePaul University in Chicago, IL and I have been working in Graphics, Engines and Tools in my spare time as part of research.
What do I want to do with this post?
Just document my research steps for any would be game developer.Why should you listen to me?
You shouldn’t, as any information on the internet you should take what is posted here with a grain of salt and try it out yourself.
Animation Compression
Note: We are not going to be talking about Track Compression, Skinning, Rigging, Inverse Kinematics, Blending, Curves or Warping in this post, just how to handle compression of data for animated objects.
1. What is a skeletal animation
Skeletal animation refers to a list of keyframes at specific intervals for the bones or joints of a model. The term keyframe here means that for each bone we are listing its position, rotation and scale at a specific frame. The more keyframes the more precision in the animation this can be good and bad we will get to that in a moment.
Let’s say that our animation has the frames 1, 5 and 10 already made by our artists. When we read keyframe data, accessing frame 5 is just a memcopy because we already have the data for that frame, but getting frame 8 is complicated because we don't have that as given, we need to get frame 5’s data and frame 10’s data with memcopy and then do a blend (interpolation) between the two and apply the delta time to get the appropriate position at the frame.
This is where having more frames is good, meaning less cycles to go through since it is just 1 memcopy instruction and more precision in our animation as we don’t let the engine assume how it’s going to look like between frames 1 - 10, but this also means that we are adding more data to our animation which is the bad aspect and it comes into play when we talk about Track Compression (not here by the way).
2. What do we indent to do with compression?
First of all, lets look at how our data for each keyframe looks like:
Translation (Position)
This is the x, y and z components of the bone’s position in our space
Scale
This determines the size adjustment of the bone at any given frame, in case we need to stretch by x, y or z
Rotation
This is a quaternion determining the current rotation of the bone in 3D space determined by x, y, z and w
Given this information, if we want some level of precision we need to have these values as decimal values. So we are going to use floats as double are too expensive unless you require extreme levels of precision (we do not).
And so our initial data looks like this:
Now we see that our initial data’s size is 40 bytes. Let’s say that our animation is comprised of 30 frames, then that means 30 * Bone == 40 bytes * 30 = ~1.17KBs per bone. And modern AAA games use 100s of bones per model so that is ~117KBs per pose.
Our intention here is to reduce the memory footprint our data has, to allow for better performance.
3. Major Points in Compression
We will come back to these at the end but I wanted to make sure we have them in mind going forward.
Compression Ratio - How much did we save by compressing
Quality - How much data did we give up to compress this
Compression Performance - How long does it take to compress
Decompression Performance - How long does it take to read
4. How do we approach compression
We need to understand how games work, normally we do not scale objects in animation unless you are doing VFX or some specific animation that requires it, so for our purposes we will assume that scale is a uniform value and in most cases it will be 1.
So we can eliminate that scale vector and just apply a uniform value.
Now our data looks like this:
We removed 8 bytes already so it goes down to 30 bytes (~0.9KBs for our 30 frames, 90KB for our 100 bones)
That’s already a 25% decrease in size which is good but we can do better
5. Introducing Float Quantization
The idea for quantization here is to take our float values and transform them into integers with a few steps.
If your values are not normalized, identify the minimum and maximum ranges for our original 32 bit float values.
Example: -50.5 ←→ 112.5
Get the range for the normalization.
range of values = max - min ⇒ 112.5 - (-50.5) = 163
Normalize those values to 0 ←→ 1 (for an OriginalValue of 23.75)
CalculatedValue = OriginalValue - min ⇒ 23.75 - (-50.5) = 74.25
Map the calculated value to this new range
NormalizedValue = CalculatedValue / range of values ⇒ 74.25 / 163 = 0.4555
Map NormalizedValue to a 16 bit integer by multiplying by the maximum value of a 16 bit integer.
The range for these values would be -32767 ←→ 32767
MappedValue = NormalizedValue * 32767 ⇒ 0.4555 * 32767 = 14925
Rounding to nearest and this is key as we lose a lot of information by rounding incorrectly
To get the original value we need to perform these operations in reverse order, but our decoder needs to know the original range, min and max values if our initial values are not normalized.
NormalizedValue = MappedValue / 32767 ⇒ 14925 / 32767 = 0.4555
CalculatedValue = NormalizedValue * range of values = 0.4555 * 163 = 74.25
OriginalValue = CalculatedValue + min ⇒ 74.24 + (-50.5) = 23.75
This means that our data can now be
We removed 8 bytes again so it goes down to 24 bytes (~0.72KBs for our 30 frames, 72KB for our 100 bones), a great 40% down from our original 40 bytes.
6. Quaternion Compression
This one is a bit tricky as some people say reducing the Quaternion to 32 bits (10 bits per element and 2 bits for an index) is the most optimal but in the post I followed it mentioned that there were some shaking for complex animations due to rounding errors and the like and that 48 bits per Quaternion is the absolute minimum we should try, I will stick with 56 bits per Quaternion as that is enough to prove the concept and easier to show.
Here is where the “smallest three” trick comes in from this article
The idea is that since a Quaternion represents a rotation, its length must be 1
This to be exact:
We can instead move things around and extrapolate the following
Where the x, y and z are smaller than w, hence the “smallest three” name.
This is in the case that w is the largest absolute value, otherwise use x, y or z whichever is largest absolute value.
And we don’t need to worry about signs simply because if you get a negative value you can just negate the entire quaternion and in quaternions ⇒ q(x,y,z,w) == q(-x,-y-z,-w)
Now our data looks like this (where a, b and c are the 3 smallest and index is the index of the largest component so we can get it using the other 3)
In my personal testing (with an 11 bone model) I found that it works nicely with some precision lost and size is reduced by around half.
//RESULTS FROM OUTPUT WINDOW
// Bone: 5
pTmp->poBone[5].T = Vect( 0.000030f, -0.008568f, 0.000247f );
pTmp->poBone[5].Q = Quat( -0.592112f, 0.386528f, 0.386527f, 0.592112f );
pTmp->poBone[5].S = Vect( 1.000000f, 1.000001f, 1.000000f );
// Bone: 5
pTmp->poBone[5].T = Vect( 0.000030f, -0.008568f, 0.000247f );
pTmp->poBone[5].Q = Quat( -0.592112f, 0.386528f, 0.386527f, 0.592112f );
pTmp->poBone[5].S = Vect( 1.000000f, 1.000001f, 1.000000f );
// COMPRESSED Bone: 5
pTmp->poBone[5].T = Vect( 2, -561, 16 );
pTmp->poBone[5].S = 1;
Smallest three = ( -0.592112f, 0.386528f, 0.386527f);
Index = 3 (w)
// DECOMPRESSED Bone: 5
pTmp->poBone[5].T = Vect( 0.000031f, -0.008560f, 0.000244f );
pTmp->poBone[5].Q = Quat( -0.592112f, 0.386528f, 0.386527f, 0.592112f );
pTmp->poBone[5].S = Vect( 1.000000f, 1.000000f, 1.000000f );
pTmp->poBone[5].T = Vect( 2, -561, 16 );
pTmp->poBone[5].S = 1;
Smallest three = ( -0.592112f, 0.386528f, 0.386527f);
Index = 3 (w)
// DECOMPRESSED Bone: 5
pTmp->poBone[5].T = Vect( 0.000031f, -0.008560f, 0.000244f );
pTmp->poBone[5].Q = Quat( -0.592112f, 0.386528f, 0.386527f, 0.592112f );
pTmp->poBone[5].S = Vect( 1.000000f, 1.000000f, 1.000000f );
Exported Results
If we go a step further we can take the same approach as the position and apply quantization to the quaternion.
Trying these changes now produces this
//RESULTS FROM OUTPUT WINDOW
// Bone: 11
pTmp->poBone[11].T = Vect( 0.003649f, 0.000000f, 0.002990f );
pTmp->poBone[11].Q = Quat( 0.707107f, 0.000000f, 0.707107f, -0.000000f );
pTmp->poBone[11].S = Vect( 1.000000f, 1.000000f, 1.000000f );
// COMPRESSED Bone: 11
pTmp->poBone[11].T = Vect( 120, 0, 98 );
pTmp->poBone[11].S = 1;
Smallest three = ( 0, 23169, 0);
Index = 0 (x)
// DECOMPRESSED Bone: 11
pTmp->poBone[11].T = Vect( 0.003662f, 0.000000f, 0.002991f );
pTmp->poBone[11].Q = Quat( 0.707130f, 0.000000f, 0.707083f, 0.000000f );
pTmp->poBone[11].S = Vect( 1.000000f, 1.000000f, 1.000000f );
// Bone: 11
pTmp->poBone[11].T = Vect( 0.003649f, 0.000000f, 0.002990f );
pTmp->poBone[11].Q = Quat( 0.707107f, 0.000000f, 0.707107f, -0.000000f );
pTmp->poBone[11].S = Vect( 1.000000f, 1.000000f, 1.000000f );
// COMPRESSED Bone: 11
pTmp->poBone[11].T = Vect( 120, 0, 98 );
pTmp->poBone[11].S = 1;
Smallest three = ( 0, 23169, 0);
Index = 0 (x)
// DECOMPRESSED Bone: 11
pTmp->poBone[11].T = Vect( 0.003662f, 0.000000f, 0.002991f );
pTmp->poBone[11].Q = Quat( 0.707130f, 0.000000f, 0.707083f, 0.000000f );
pTmp->poBone[11].S = Vect( 1.000000f, 1.000000f, 1.000000f );
These are the results of my testing, I think they are pretty good considering that the models used in AAA titles are huge compared to my testing and animations could be lengthy.
7. Conclusion
Applying these methods resulted in a great performance increase at the loss of some precision, but enough to make it still viable.
The size was reduced to 60% of the total size and can still be improved, for example: you can do some bit packing and turn the index for the smallest three trick into 2 bits (00,01,10,11) which would remove 6 bits from the animation (not a lot but for sending through network every bit counts) you can also perform bit packing on the a, b and c and even translation elements to make them take less space (a, b and c being 15 bits and index being 3 bits for 48bits), I chose to keep this as is to keep some precision and this is just me scratching the surface I am sure there are ways to improve it further.
These are my animations for each major step:
Original Animation @ 40 Bytes
Compressed Animation @ 24 bytes
Compressed Animation Final @ 16 bytes (given padding)
As you can see, all 3 animations behave the same (I promise). Which is what we want.
Now going back to our compression points:
Compression Ratio - How much did we save by compressing
We save around 60% of memory footprint per keyframe
Quality - How much data did we give up to compress this
Not enough to make it unviable, and it can still be improved further
Compression Performance - How long does it take to compress
Minimal cycles as we just need to apply quantization
Decompression Performance - How long does it take to read
Minimal Cycles as we just need to reverse the quantization
Material I researched for this exercise:
For more interesting animation stuff in video games please check these YouTube channels as they helped me understand a lot on video game concepts.