So bytecode injection into a fla file is neat, yes, but many people are wondering if there is a single useful thing it can do. Well that is certainly unfair. Isn't being cool enough anymore? Should I notify Hollywood?
Ok, here is something semi useful. If you want to put data into flash in a very compact way, chucking it in an array isn't going to do it. We can look at the vector data for shapes for a comparison, as that can get quite large.
A simple box in the drawingAPI is generally made up of four points and an origin (though in the swf stores 8 points). So that is 10 numbers, and all of them are usually floats (meaning they have a decimal). If you store those in an array each number takes 9 bytes (8 for a float, and a push type byte), plus some array overhead. So you are looking at something like 100 bytes just to define the shape of a box, no color, no stroke, etc. Ouch.
In order to get that shape data smaller, the first thing you should do is multiply each number by 20 (the highest resolution in swf is 1/20th of a pixel, called a twip), and round it to an integer (no decimal). Storing an integer in an array only takes 5 bytes (four plus a push type). Then when reading your array, you just divide everything by 20 and you get pixels. This brings the same box down to a little more than 50 bytes - hey that was easy.
Still if you look at typical data, there will be a lot of zeros in there. The number 13.5 will be stored as something like 0000000270. In fact most of your numbers under a million will tend to have a lot of zeros in them. One thing you can do here, what the swf format does in fact, is find the minimum number of bits needed to represent every number, and only store that number of bits. Working still in decimal for a minute, you could take your list of numbers (the ones multiplied by 20), and it might look like this:
423
270
4522
34
23423
343
Clearly the fifth one there is the problem child. It needs five digits, the others don't. Still if you can save those all in five digits rather than the original ten, that is cutting the size in half. So now you might store it as a long sequence that looks like:
56004230027004522000342342300343
First, the 5 says there are 5 digits in each number. Second, the 6 says there will be six entries. Third, the entries, 5 digits each. You can see it better like this:
5 6 00423 00270 04522 00034 23423 00343
That is exactly how you do it, except you use binary instead of decimal. Of course you don't want to use binary yourself, because that is only sexy if someone is watching. And no one likes to watch someone using binary. So you write a little program that converts numbers to a binary nBit stream, which is very sexy - and people like to watch that.
Ok, for those who are still reading, this is the reward part - because I'm actually going to start talking about the subject I'm supposed to be taking about - bytecode and data. When you write your little program to convert your data to nBits (that is the official name I've given this type of encoding, because that is the variable name the swf format chooses for the 'number of bits'. I have no idea if that is right, so don't use it at a party until someone else does). Sorry, did I start a sentence? Ok, here is the predicate, but first a refresher: 'When you write the nBit program' ... you will notice that every fifth byte you write is 07, or 00000111 as they say in binary. So at least 20% of your data will be overhead. No problem you think, what else can we push, something bigger, instead of integers perhaps? Well here is the table:
push 0: // string len+1 bytes
push 1: // sFloat 4 bytes
push 2: // null 0 bytes
push 3: // not used 0 bytes
push 4: // register 1 byte
push 5: // bool 1 byte
push 6: // dFloat 8 bytes
push 7: // int 4 bytes
push 8: // lookup 1 byte
Ok, double floats look good, strings look interesting. So how do you push such a float? When you push an int, its only a little bit tricky, mostly because when the top bit is set, the number becomes negative. However you can just use ~num+1 to get that right. The problems with floats are 1) they are very very hard to derive from bits because the decimal moves, they are made of parts, etc, and 2) there are certain numbers that are just invalid.
So the idea is that instead of calculating floats you write those bits directly into bytecode, and who cares what float they are supposed to represent. You just want the bits, which you will mask, shift, and convert to little integers. Ok, so go write that real quick and come back - really - it isn't that hard. . . . Ok, did you test it? If you did test it, you may have noticed one little thing - it didn't work. Actually the swf player will convert that value to a float before anything else, which doesn't give up its bits easily, especially when it's value is NaN. Oh well, sorry about that.
That does leave strings though. So why not just paste a string into actionscript, like:
x="don't make me write another program that doesn't work you goober".
Well that is fine if all your numbers are between about 32 and 128. The other ones can be a little tricky to paste. You may be old enough to know that some of them can even cause a little bell to ring, or the carriage to return with a zingzang sound. There are also many 'international' versions of Ascii (which only include European languages it seems). This tells us pasting will be problematic, and programmers should really get out more often. Above and beyond that, there is UTF8 and Unicode, which explains why the world prefers programmers who choose to remain indoors.
UTF8 is weird. It is a variable number of bytes per character, and some of those bytes are guaranteed to never be certain values. That kind of leaves it out for our purposes, but writing international characters in bytecode would be a good way to ensure their integrity and include them in the fla. You might as well, most people wouldn't even notice.
For our purposes we will use single byte characters. Strings are always null terminated (0) in swf, so putting hex data into them is easy as pie - just write out your bytes and add a zero. The keeners amongst you (and anyone still reading has to be a keener) will be wondering what happens when your data includes a zero in it. No worries, the player will just terminate your string there and it won't work. Ok, so strings are out too, which leaves you with ints, however they are easy to put into flash as hex values.
Just kidding, but not as much as I would have liked. What you can do is modify your data in such a way that you will not get any unwanted zeros. As a lame example of this, you could escape your zeros. So pick a number for an escape flag, say 0x6D (I've always hated that number) and whenever you see it, the next number will determine if the pair represents 0x6D, or zero.
Of course, if you are going that route anyway you can squeeze some extra compression out of it. You could use the second number(s) to represent an index in a table, and a repetition count - then use that for the most common segments of data. Don't use table index zero, and don't use a zero count, voila!. This type of thing can be very effective, and though it somewhat CPU intensive to compress (from an actionscript point of view), you are generating your data before hand so that doesn't matter. It is however, very fast to uncompress, it is smaller, and it has no zeros. Life is good.
I will post an example of this in the next little while, as I add that option in AsDraw (right now it uses 32bit signed ints). More importantly, I'll do some speed and size tests - I'm not sure what those will look like. The code will probably be in flasm though.
As an aside, below is the way you push data into an array in swf (all numbers hex). More info in the Swf6 spec from the good people at Macromedia.
// [1,2,3,4,5];
// 96 1E00 07 05000000 07 04000000 07 03000000 07 02000000 07 01000000 07 05000000 42 17 00
96 Push
1E 00 length of push data (always the lsb first, so that is 0x001E)
07 Push type (int) - see above table
05000000 - data(5) - they pop off in order, so the last index is always pushed first
07 Push type (int)
04000000 - data(4) - ints are always stored backwards, but your nBits data can be stored
07 Push type (int) - left to right as you read it if you like. You don't care about the
03000000 - data(3) - int values at all, so it makes no difference
07 Push type (int)
02000000 - data(2)
07 Push type (int)
01000000 - data(1)
07 Push type (int)
05000000 - number of items in array
42 InitArray - creates the array of 'popped' length, and pop,pop... values.
- this value gets pushed.
17 Pop - this pulls it off the stack, normally you would assign it to a var, etc, here.
00 End of actions.
Last thing..
If you want a look at a pretty fast nBits routine written in actionscript, you can look at the _gBits and _gBitsN (negative) methods here.
Whew, now back to drinking.
posted on Thursday, October 16, 2003 4:49 AM