
Originally Posted by
MegaADY
My whole character set is : 0123456789;=? And I think the most repeated character is 0 . So I thought about this scheme:
0 - 0
00 - c
000 - k
0000 - M
and so on .. do you have any ideea ?
Very difficult to say without seeing more examples of your data.
The method described by Luciano will give you a guaranteed 2:1 compression irrespective of what the data is. If you need to compress the data more than that and have instances where characters (other than 0 ) repeat for 2 or more consecutive places then how about this for an idea.
As you say, you only have a character set of 13 characters and Luciano has already pointed out that you could fit two of your characters into a single byte.
My idea is similar but one half of the byte contains the code for the character and the other half of the byte the number of counts for that character eg
nnnncccc where n = number of characters and c = character
08700900200 (11 bytes) would be 10 18 17 20 19 20 12 20 = 8 bytes = 72%
however
22377777773354500000661666 (26 byte) would be
22 13 77 23 15 14 15 50 26 11 36 = 11 bytes = 42%
Obviously you cant have a count greater than 15 but if you had say twenty five "Zeros" then that would compress as F0 A0 = 25 chars into two bytes = 8%
As I say, compression depends on the data, but that is true for any compression routine. Worst case, your 40 characters take 40 bytes, best case your 40 characters are all identical eg
40 x 2 and would compress to three bytes
F2 F2 A2 = 15 x 2 + 15 x 2 + 10 x 2 = 40 x 2
You need to analyse your data for repeating patterns to see if that would help.
Easy to encode.
Get first character and nnnn = 1
while next character is the same and nnnn is < 15 increase nnnn
store it and get the next character and nnnn = 1.
To decode
For x = 1 to nnnn
character = cccc
Thinking a bit further, this method could also be used for a larger number of characters eg A to Z (upper case only) is 26 characters so could be stored in 5 bits allowing the remaining 3 bits to be used as a counter meaning that upto 7 consecutive letters could be compressed to a single byte.
Thanks for makng me think about this.... its given me an idea for something I am already working on 
Regards
Keith
www.diyha.co.uk
www.kat5.tv
Bookmarks