PicBasic text compress routine


Closed Thread
Results 1 to 12 of 12
  1. #1
    Join Date
    Jun 2004
    Posts
    43

    Default PicBasic text compress routine

    Hello ! I am working on a project in which I need to store some result aquired in a serial EEPROM. The problem is that I need a lot of space in the memory, and I don't want to use cascaded eeproms or flash cards. So I thought about a text compressing routine. Can anyone help me with this ? Thank you

  2. #2
    Join Date
    Nov 2005
    Location
    Perth, Australia
    Posts
    429


    Did you find this post helpful? Yes | No

    Default

    Ive got a feeling that the pbp code required to compress and de-compress text would be awfully long. I hope you have a lot of program space left on your PIC.
    Last edited by Kamikaze47; - 25th November 2005 at 05:54.

  3. #3
    Join Date
    Jun 2004
    Posts
    43


    Did you find this post helpful? Yes | No

    Default

    I know, that's why I use only 1k of 8k program memory.

  4. #4
    tdavis80's Avatar
    tdavis80 Guest


    Did you find this post helpful? Yes | No

    Talking

    any chance that any of the data would contain 'often used' words?

    for example: If it was storing text that seemed to have lots of DayOf MonthOf , YearOf info, you could parse the data so those words are stored as a single byte. If you used bytes > 127 as 'compressed' words, you could compress about 127 words.

    Tuesday, November 1, 2005 = 25 bytes
    [128], [129] 1, [130] = 8 bytes

    The above example assumes that all of the received text would have an ascii value less than 128 so that you could use the high half of a byte for compressed characters.

    peudoCode:

    if ascii of character < 128 then character is normal
    if ascii of character > 127 then
    table_position = character - 127 ' example: 128 - 127 = 1
    lookup table_position to find uncompessed word
    endif

    does this make sense?

  5. #5
    Join Date
    Jun 2004
    Posts
    43


    Did you find this post helpful? Yes | No

    Default

    My whole character set is : 0123456789;=? And I think the most repeated character is 0 . So I thought about this scheme:

    0 - 0
    00 - c
    000 - k
    0000 - M
    and so on .. do you have any ideea ?

  6. #6
    Join Date
    Oct 2004
    Location
    Italy
    Posts
    695


    Did you find this post helpful? Yes | No

    Default

    Hi,

    - What is the possible range of the values?

    Example:

    0.00 (First possible value)
    0.01
    0.99
    ...
    ...
    199.99 (Last possible value).


    - How many stored values do you need?

    - What is the available space in the EEPROM? (bytes)


    Luciano

  7. #7
    Join Date
    Jun 2004
    Posts
    43


    Did you find this post helpful? Yes | No

    Default

    I need to store strings of 40 characters like
    938437437=002;3482?0000284=332

    Theese are not values ... just 40 chars strings
    I have 512k eeprom
    and I need to store as much as possible

  8. #8
    Join Date
    Oct 2004
    Location
    Italy
    Posts
    695


    Did you find this post helpful? Yes | No

    Default

    Hi,

    Your characters are in the range from ASCII 48 to ASCII 63.
    You can put two characters in one byte.

    The characters : > < are not used.
    You can use them to implement your idea.

    Use : for 00
    Use < for 000
    Use > for 0000

    Best regards,

    Luciano

    Last edited by Luciano; - 25th November 2005 at 17:09.

  9. #9
    Join Date
    Feb 2003
    Posts
    432


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by MegaADY
    My whole character set is : 0123456789;=? And I think the most repeated character is 0 . So I thought about this scheme:

    0 - 0
    00 - c
    000 - k
    0000 - M
    and so on .. do you have any ideea ?
    Very difficult to say without seeing more examples of your data.

    The method described by Luciano will give you a guaranteed 2:1 compression irrespective of what the data is. If you need to compress the data more than that and have instances where characters (other than 0 ) repeat for 2 or more consecutive places then how about this for an idea.

    As you say, you only have a character set of 13 characters and Luciano has already pointed out that you could fit two of your characters into a single byte.

    My idea is similar but one half of the byte contains the code for the character and the other half of the byte the number of counts for that character eg

    nnnncccc where n = number of characters and c = character

    08700900200 (11 bytes) would be 10 18 17 20 19 20 12 20 = 8 bytes = 72%

    however

    22377777773354500000661666 (26 byte) would be
    22 13 77 23 15 14 15 50 26 11 36 = 11 bytes = 42%

    Obviously you cant have a count greater than 15 but if you had say twenty five "Zeros" then that would compress as F0 A0 = 25 chars into two bytes = 8%

    As I say, compression depends on the data, but that is true for any compression routine. Worst case, your 40 characters take 40 bytes, best case your 40 characters are all identical eg
    40 x 2 and would compress to three bytes
    F2 F2 A2 = 15 x 2 + 15 x 2 + 10 x 2 = 40 x 2

    You need to analyse your data for repeating patterns to see if that would help.

    Easy to encode.
    Get first character and nnnn = 1
    while next character is the same and nnnn is < 15 increase nnnn
    store it and get the next character and nnnn = 1.

    To decode
    For x = 1 to nnnn
    character = cccc

    Thinking a bit further, this method could also be used for a larger number of characters eg A to Z (upper case only) is 26 characters so could be stored in 5 bits allowing the remaining 3 bits to be used as a counter meaning that upto 7 consecutive letters could be compressed to a single byte.

    Thanks for makng me think about this.... its given me an idea for something I am already working on

    Regards
    Keith

    www.diyha.co.uk
    www.kat5.tv

  10. #10
    tdavis80's Avatar
    tdavis80 Guest


    Did you find this post helpful? Yes | No

    Talking

    As I read your example another idea comes to mind.

    Are the strings predictable? Are the non-numerics always in the same place?
    If yes, ignore them and only store the numbers.

    for the numbers, use packed decimal format. It is VERY easy to encode/decode with picbasic. Packed decimal stores 2 numbers per byte by using a nibble per digit.

    Example:

    14332569 = 8 bytes
    Store the hex as: 14 33 25 69 = 4 bytes

    You can encode/decode by simply shifting the data on the byte.

    If you 40 chars were 36 numbers plus 4 alphas, the result would be 18 bytes which you could decode and reinsert the alphas in the appropriate places.

    If the alphas could appear anywhere but are always 1 of a small set of possibilities your still in luck. The packed decimal uses 0-9 but still leaves you a-f for your own use (6 special characters).

    another example:

    Lets assume ";" = A "=" = B

    Then 12254324;3654=6554 (18 bytes) would encode as:

    hex= 12 25 43 24 A3 65 4B 65 54 (9 bytes)

    are we getting closer?

    ps: I just read your comment about your entire character set. This is good news because it means you could definately use the packed decimal format for 2:1 encoding with low code overhead.
    Last edited by tdavis80; - 26th November 2005 at 05:04.

  11. #11
    Join Date
    Oct 2004
    Location
    Italy
    Posts
    695


    Did you find this post helpful? Yes | No

    Exclamation

    Is your data the TRACK 2 of a credit card?

    See this link and read about TRACK 2:
    http://www.gae.ucm.es/~padilla/extrawork/tracks.html

    Also in this forum you have started another thread where
    you are asking how to send out data over a DC power line.
    Is that the power line of a credit card reader?

    Thread about "Serial comms on VCC line"
    http://www.picbasic.co.uk/forum/showthread.php?t=2806


    I hope I am wrong.

    Luciano

  12. #12
    Join Date
    Jun 2004
    Posts
    43


    Did you find this post helpful? Yes | No

    Default

    Thank you guys for helping me with the compression stuff. This week I will start to analyze the data to see what is the propper method of compression.
    Track2 ? No, it's just a meteo station,whit a lot of home made modules, made by someone who was fired and left no documentation of what he's done, and I have to dig into his work, and improove it .
    Thank you all !

Similar Threads

  1. Picbasic VS C Compiler
    By koossa in forum mel PIC BASIC Pro
    Replies: 8
    Last Post: - 11th October 2005, 21:44
  2. Replies: 22
    Last Post: - 12th July 2005, 17:39
  3. Can an asm interrupt contain gosub to a picbasic routine?
    By sougata in forum mel PIC BASIC Pro
    Replies: 5
    Last Post: - 21st April 2005, 19:49
  4. PicBasic Fundamentals
    By Billyc in forum General
    Replies: 9
    Last Post: - 4th May 2004, 10:04
  5. PicBasic Pro & PicBasic syntax different
    By Billyc in forum General
    Replies: 5
    Last Post: - 16th April 2004, 21:19

Members who have read this thread : 1

You do not have permission to view the list of names.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts