New compression method: Exomizer 2
After I finished working on optimizing aPLib's decoder, I was told that pucrunch did compress better, and while looking for info about pucrunch (it turns out that sometimes aPLib wins, but most times pucrunch is a bit better), I found another packer that clearly beats both on compression: exomizer 2.
http://hem.bredband.net/magli143/exo/
Since there was no Z80 decompressor, I made it, and also made an optimizer to take the output and remove two unneeded bits and rearrange the bit organization, so depackers can be optimized a bit more.
The depackers sizes are between 169-190 bytes, and they also need space to generate a table of 156 bytes, but that table can be discarded after the decompression is done, so if you place it in a buffer it shouldn't take any extra memory. Speed is slow, because the algorithm is complex.
http://www.speccy.org/metalbrain/exo_v3.zip
The normal versions of the depackers use the output from exomizer, in raw mode forwards, so you produce the data file with command line:
exomizer raw -o output input
The optimized versions also require to use my optimizer, so the command lines would be:
exomizer raw -o temporal input
exoopt temporal output
As for the difference between deexo.asm and deexo_simple.asm, the simple version requires the table to be aligned in a 256 boundary, and doesn't handle literal sequences (which are rare, and you may also force exomizer not to generate them using the parameter -c).
http://hem.bredband.net/magli143/exo/
Since there was no Z80 decompressor, I made it, and also made an optimizer to take the output and remove two unneeded bits and rearrange the bit organization, so depackers can be optimized a bit more.
The depackers sizes are between 169-190 bytes, and they also need space to generate a table of 156 bytes, but that table can be discarded after the decompression is done, so if you place it in a buffer it shouldn't take any extra memory. Speed is slow, because the algorithm is complex.
http://www.speccy.org/metalbrain/exo_v3.zip
The normal versions of the depackers use the output from exomizer, in raw mode forwards, so you produce the data file with command line:
exomizer raw -o output input
The optimized versions also require to use my optimizer, so the command lines would be:
exomizer raw -o temporal input
exoopt temporal output
As for the difference between deexo.asm and deexo_simple.asm, the simple version requires the table to be aligned in a 256 boundary, and doesn't handle literal sequences (which are rare, and you may also force exomizer not to generate them using the parameter -c).
Post edited by Metalbrain on
Comments
I did a few tests on large (~40K) blocks and it compresses to ~1K less than aPLib or puCrunch.
Thanks!
123609 bytes for bitbuster_extreme0.1
118341 bytes for MegaLZ
117791 bytes for pletter 0.5a
114715 bytes for aPLib
113927 bytes for pucrunch
110121 bytes for exomizer 2 (no literal sequences)
110116 bytes for exomizer 2 (no literal sequences,optimized)
110089 bytes for exomizer 2
110085 bytes for exomizer 2 (optimized)
Antonio Villena has optimized the depackers, so they're now a bit smaller and also faster:
http://www.speccy.org/metalbrain/exo_v4.zip
PD: Depackers sizes are now between 163-187.
What I want to be able to do is to decompress pieces; not all my data at once. I realise you can't do this with an ordinary compressed file. And if you compress lots of small files separately, each will have its own tree/table for decompression, which takes up more space.
But maybe compress one or more separate files using one shared compression table/tree into one archive, then the ability to pull out one file at a time from the archive? For level data in a game, where there are many similarities between levels (for good compression) but you want to decompress one level at a time.
- IONIAN-GAMES.com -
It depends on the compression technique used. The LZ techniques aren't usually based on tables or trees, but just at looking at previous data. A bigger piece of data will usually compress better than two different ones because the second half can use the first half as reference, not because they share the same tree and therefore save space on it.
Having said that, exomizer 2 is a bit special, because it DOES have a small table (26 bytes, later expanded into a bigger table) at the beginning to decode the most used lenghts. Aplib or pucrunch will probably serve you better if you have many small pieces of data.
I don't know any method that would serve your particular interests for your game. Maybe it's better if you develop one yourself. Hmmm... I'm thinking about a "dictionary" added at the beginning of each level (providing the same "lookback context" for all levels), and then removing the common part for every level... but everything depends on the data. How similar would be those similarities?
What you say about about trees seems more related to statistical methods such as Huffman or arithmetic coding... but those are usually adaptive, building and modifying the tree as they progress instead of storing and using it for the whole file.
Does the decompression always need to see everything it has already decompressed, or just some of it? Can you work your way through the data from the beginning, decompressing it all, but into a small buffer (e.g. 1K) that wraps around and over-writes itself as it goes. So you throw away older stuff and just keep the most recent 1K, then stop when you get to the end of your chosen piece?
So for example if your level data is 1K per level, and you have 10 levels, to start with you compress 10K of data. If you want to get level 5, you tell it to decompress up to 5K of data into a rolling 1K buffer. Then you know, once it has worked its way through 5K, you are still holding the last 1K of data it decompressed and that is level 5.
(But is it quick enough to use in a game between levels?)
The alternative would be to compress 6K of data, extract it to a (blacked-out) screen, copy some level data into higher memory, then clear the screen. But then you are limited to 6K files.
- IONIAN-GAMES.com -
actually, it would not so hard. if you use some algorithm of LZ77 family, you could do as you proposed.
BUT - LZ77 works that way it replaces sequence of bytes by pair [offset, length] if same sequence was seen in 'past'. In this case 'past' means maximum value of offset. So you need buffer big enough to keep <offset> bytes. Smaller maximum offset means smaller buffer and also seeing less of other data. So, generally speaking, the LZ77 is not good choice. Still it should outperform RLE significantly.
On other side, LZ78 tries to create sequence dictionary and replace every occurence of sequence by reference to dictionary. But it is big mistake to consider dictionary as something what makes data bigger. Actually, the principle is very same as in LZ78 : to replace occurence of sequence by pair [pointer to somewhere, length of sequence].
The difference is that LZ77 has sequential access to data while LZ78 has random access approach.
In your case LZ78 would be better choice, i think.
I have seen a simple text packer fo zx which used kind of dictitonary compression.
The purpose of compressor was similar like yours: it allows to decompress a particular message while exploiting similiraties across all messages.
The compressor is strictly byte based. It uses special character as separator so input to compressor looks like At first, it scan whole input and adds separator and every character as first N entries to dictionary (For english case-sensitive messages N could be around 70.).
Dictionary has 256 entries so 256-N entries are still empty.
Now, compressor scans occurencies of byte pairs in input and select the most common byte pair.
Then it adds pair as next entry in dictionary and replace all pairs by number of dictionary entry. This is repeated until dictionary is full.
It is important to understand that not only characters but also dictionary entries can be paired. So it is not uncommon for later entry to actually represent quite deep binary tree.
The <separator> can either included or excluded from this process thats on your choice:
- if you include <separator> into pairing process you get better compression but you have to start decompression from beginning of data, count passed <separator>s and throw everything away until you reach desired message
- on other side, if you exclude <separator> from compression it will remain as separator between compressed messages and searching for particular message is very easy and much faster.
The decompression procces is quite simple:
You have dictionary of 256 entries, each occupating two bytes. First N+1 entries is occupied by <separator> and N characters present in unpacked data. These entries has character in the first byte and second byte is unused. The other entries are dictionary phrases consisting of two other entries (which can be either character or phrase).
You take a byte from compressed message and check if it is character.
If so, output character and continue with next byte.
If byte is phrase then take first byte and recursively call decompressing a byte. When you return (from possible many subsequent recursive calls) do the same with second byte and then return.
I dont how wide code space you need for your level but in case you would have less than 100 tiles the algorithm should be quite efficient. Also there is always posibility to work not with bytes but wider/narrower codespaces, say like 7 bits or 10 bits. Obviously, it complicates implementation.
Speaking about efficiency, what i described above is, less or more, Tolkien 5.0 text compressor used by Proxima in 90's for some of their adventures. Author of Tolkien, George K., stated compression ratio to be up to 50%. Tolkien was used at least in 'Heroes' and 'Jmeno ruze' both using case sensitive texts with czech diacritics (so the code-space could be slightly smaller than 90 chars) and while i dont know exact compression ratio the amount of text in games is rather impressive.
Ops! :oops:
I screwed it up when re-adding old comments to the new code, adding in the normal (non-simple) version an old line that shouldn't be and f*cks up everything. It's been fixed already and you may download again from the same link.
It might have been about an Amiga port of Rainbow Islands?
In the article it describes how the compression and decompression of the level data was handled.
I think it mentioned it could takes hours to compress but only seconds to decompress?
Does anyone remember what the magazine was?
At the time it inspired me to write a huffman coding routine using turbo pascal.
That would be an interesting article if anyone knows the link...
We've just made a small new release. The code hasn't changed at all, but after being asked a pair of times about the depacking routines license, I've decided to put them under the LGPL 2.1 . The license and updated files can be found here:
https://sourceforge.net/p/emuscriptoria/code/HEAD/tree/deexo/
http://goo.gl/q2j0NZ - make ZX Spectrum choose your own adventure games, no programming needed (release 3)
I've switched to Exomizer for my TR-DOS loader at some point[1] for better compression, and I remember making a trivial (tm) modification to the code so that IY is replaced by IX and vice versa, allowing me to save 1-2 bytes elsewhere. Meanwhile, the source code of the loader isn't released (there's hardly any point), and the software I've used it with so far is mostly closed source as well. In LGPL terms, I suppose this amounts to 'static linking'[2] of a 'modified library' with 'proprietary software' and therefore is a violation of this licence. Is this correct?
[1] For the record, there was no mention of a licence and/or usage terms within the old unpacker source at the time of this switch!
[2] Additionally, a clarification of the meaning of 'static/dynamic linking' on the Speccy would be nice. ORG HardcodedAddress: INCBIN "exomizer.bin"...? (:
http://goo.gl/q2j0NZ - make ZX Spectrum choose your own adventure games, no programming needed (release 3)
Sorry for the late response, I didn't really expect any kind of opposition to this.
1 - If you were using an old version from before the license was added, there's no need for you to follow it.
2 - If someone wants to break the license, I promise I won't cry. I won't contact any lawyer either, or send any threatening or nasty email. I just did this just because some people were asking for it, and chose LGPL because it's a well known one (and it's not as restrictive as GPL: I would perfectly understand opposition against this one).
However, I also think that releasing the modified file isn't really that hard.
3 - I don't think dynamic linking makes sense in Spectrum. I guess technically it might be possible to do it (as some kind of "proof of concept"), but I think it would be too complex and wouldn't bring any advantage.
Meanwhile for those that do not want to break it, it wouldn't hurt to give at least a rough idea of what does and does not comprise breaking this sort of license in terms of the Speccy, don't you think?
In any case, here's the modified source as used in my loader :)
I mean, how many lines of code is this? And how many lines does the licence now consists of? Doesn't it all seem a bit ridiculous when you look at it like that?
- IONIAN-GAMES.com -
I strongly disagree.
Most people prefer to create tools only for themselves, or sell them without source code and forbid people to make changes to them. Nobody complains when it happens since it's obviously their right to do it.
However when a few people take the time to produce tools useful to others, release them for free, and allow others to modify them freely for their own needs (although any changes must be given back), there's always someone saying that's not good enough!
BTW almost all my own Speccy code is released under an informal "free to use, with credit", including my ZX7 decompressor (only my ZX7 compressor has a formal open source license). So I'm not defending this opinion here in self-interest. I just happen to think it's unfair to criticize anyone for releasing their work as open source instead of closed source...
Code size is irrelevant. What matters is, this is no trivial code. If it was, people could simply write their own. If anyone wants to use this, then it deserves to be released under whatever distribution conditions the author has chosen.
Look I'm not objecting to the idea of being made to feed back any changes I might make to this code. That's just being fine and public-spirited. But if that's all you want then it's a one-liner in your own terms and conditions.
By using the LGPL it's also forcing the person using it in a game to pass a ton of junk (the source code, a copy of the LGPL licence) onto any download or sale of that game, which to accompany a 48K file on a cassette is just crazy.
- IONIAN-GAMES.com -
What's the big deal? Even if the zipped source code + LGPL license text is another 48Kb, I'm fairly sure everyone's bandwidth can handle it :)
Nobody expects a physical Speccy cassette to contain source code and LGPL license text. Just mention LGPL somewhere in the inlay and provide a URL or email that buyers can contact to obtain the source code. This is clearly within the "spirit" of this license!
But when it comes to a library, or in this case a single function, you're actually taking away the right of the person who uses it to distribute their product how they want. You're saying "if you want to use my hundred lines of code, you have to give away the thousands of lines you've written too". That's not being helpful. That's being awkward.
If that's not what you want then don't use that licence. Just write a few lines that say what you do want instead.
- IONIAN-GAMES.com -
Yes, that's the idea behind open source in general.
That's not how LGPL works. You have just described GPL, which is not the same as LGPL.
LGPL license basically says:
1.) If you take someone else's LGPL library and modify it, then you must retribute the same way as LGPL. It's not fair to take advantage of someone else's work by distributing your result but hiding your source code changes.
2.) If you take someone else's LGPL library and modify it, but in a way to pretend you didn't, then it makes no difference. All conditions above still applies.
3.) If your program uses someone else's library only as a "black box", simply the way it was intended to be used, then that's OK. Your program doesn't have to be LGPL.
The LGPL license is simply a formal description of this idea.
For instance:
* If you use Metalbrain's decompressor code with some changes, it will fall into condition 1 above.
* If you copy/paste Metalbrain's decompressor code without any changes, but then you write some additional code to access his internal routines directly in order to obtain a different behavior (for instance a decompressor that handles specific cases differently), it's condition 2 above.
* If you copy/paste Metalbrain's decompressor code and your program simply calls it "as is" to decompress something, using it in a standard way, it's condition 3 above.
If open source licenses were only a few lines of informal text, then lots of people would abuse and take advantage of others work, whenever there was money to be made. They would try to find loopholes and claim those few lines of license didn't precisely specify what to do in these cases.
The main open source licenses are detailed for a reason. And when these license texts are updated, it's usually to close new loopholes exploited by others. For instance, one of the reasons GPL was updated to version 3 was preventing an exploit by Microsoft involving Novell...
Even so, open source licenses try to be as clear and concise as possible. Try comparing them with the size of a typical commercial software EULA!
1. Compile that function separately as a binary.
2. Write my own code to load that binary separately from my own code, in a way that allows a different version of it to also be loaded (and relocated where necessary) before calling it. This is so that it can be considered 'dynamically linked' and 'independent'.
3. Offer to provide, in the same media (i.e on cassette), the source code of that unpacker, and a copy of the LGPL licence, to anyone who uses my software if they ask for it.
- IONIAN-GAMES.com -
If that is how this code is meant to be used then the LGPL is the wong licence.
- IONIAN-GAMES.com -
No.
The terms "static linking" and "dynamically linking" are not even mentioned in the LGPL license.
Instead, LGPL section 6a says it's enough to provide the modified library source code, and your own program's "object code and/or source code", such that if others change the library again (but keep the same interface) and replace it, the program will still work. This condition is intended to ensure nobody can modify someone else's library but pretend they didn't (i.e my condition 2 above).
In modern systems, the easiest way to comply to this condition is dynamically linking with LGPL libraries. This is the reason most LGPL explanations recommend dynamic linking.
In a Speccy machine, the easiest way to comply is simply copy/pasting a LGPL library, using it only as a "black box", and providing the library source code only. Generating Speccy programs don't require a linker, therefore anyone will be able to make changes to this library, recompile it to the same address, and it should just work. You will be compliant with LGPL section 6a easily.
It doesn't matter how you compile it, as long as the final result is compliant with section 6a (see above).
You don't need to "simulate" dynamic linking in a Speccy program in order to comply with LGPL.
Again, dynamically linking is just people's interpretation on the easiest way to comply with LGPL in a modern system. There's no reason to do it in the Speccy, when you can simply comply directly to the LGPL license instead.
The LGPL license doesn't impose "in the same media". If you follow my previous suggestions for cassette releases, you should be compliant with section 6c just fine.
I see Metalbrain's reply about this, that it's okay to use the older source in a non-LGPL way, but I get the impression that it's more like his interpretation and things might be different in the 'legal world'