Optimizing CRC16 routine.


Closed Thread
Results 1 to 20 of 20

Hybrid View

  1. #1
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,617


    Did you find this post helpful? Yes | No

    Default

    Bert,
    That is exactly like I had it when trying to exclude the outer array handling loop from the timing. I ran 9 bytes thru and got ~1530 cycles. 1530/9=170 cycles per byte and your test shows 169 - so that matches pretty well. I DID get a slight difference when I changed the poly and I guess that depends on the actual value of the byte going thru the routine, if you run the 9 byte sequence 123456789 thru it you should see a slight difference between the two polynomial values.

    Mike,
    So obviously BoostC compiles this to faster code but doesn't the BoostC compiled code you posted use GOTO as well?

    Question is why there's such a difference between Darrel's code and the one I'm using. The poly itself does make a slight difference, I've verified that but it's still way off. I haven't actually verified the claimed 825us (I've no reason to doubt it) but I'll do that tonight just to make sure.

    Thanks a lot guys, I appreciate the help!
    /Henrik.

  2. #2
    Join Date
    Aug 2005
    Location
    Michigan, USA
    Posts
    224


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by HenrikOlsson View Post
    Mike,
    So obviously BoostC compiles this to faster code but doesn't the BoostC compiled code you posted use GOTO as well?
    I posted 16F1827 code. A slightly unfair comparison because 18F devices don't have a "lsrf" instruction.

    I didn't realize you were using an 18F until you posted your assembler output, and so I tested the following code which is similar to yours, for an 18F in BoostC;

    Code:
        MB_CRC ^= work;             //
        for(MB_j=0; MB_j<8; MB_j++) //                    
        { if(MB_CRC.0)              //
          { MB_CRC >>= 1;           //
            MB_CRC ^= 0xA001;       //
          }
          else                      //
          { MB_CRC >>= 1;           //
          }
        }                           // Next bit in the byte
    Here's the BoostC output which uses "bra" instructions and runs in about 115 cycles (worst case);

    Code:
    0008            label1
    0008  500A      	MOVF gbl_work, W
    000A  1A05      	XORWF gbl_MB_CRC, F
    000C  6A09      	CLRF gbl_MB_j
    000E            label2
    000E  0E08      	MOVLW 0x08
    0010  6009      	CPFSLT gbl_MB_j
    0012  D7FA      	BRA	label1
    0014  A005      	BTFSS gbl_MB_CRC,0
    0016  D008      	BRA	label3
    0018  90D8      	BCF STATUS,C
    001A  3206      	RRCF gbl_MB_CRC+D'1', F
    001C  3205      	RRCF gbl_MB_CRC, F
    001E  0E01      	MOVLW 0x01
    0020  1A05      	XORWF gbl_MB_CRC, F
    0022  0EA0      	MOVLW 0xA0
    0024  1A06      	XORWF gbl_MB_CRC+D'1', F
    0026  D003      	BRA	label4
    0028            label3
    0028  90D8      	BCF STATUS,C
    002A  3206      	RRCF gbl_MB_CRC+D'1', F
    002C  3205      	RRCF gbl_MB_CRC, F
    002E            label4
    002E  2A09      	INCF gbl_MB_j, F
    0030  D7EE      	BRA	label2
    Last edited by Mike, K8LH; - 17th January 2011 at 14:31.

  3. #3
    Join Date
    Aug 2005
    Location
    Michigan, USA
    Posts
    224


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by HenrikOlsson View Post
    Mike,
    So obviously BoostC compiles this to faster code but doesn't the BoostC compiled code you posted use GOTO as well?
    Please forgive me if it seems I was suggesting one compiler is better than another. I'm just trying to help you characterize loop timing but I don't have PBP.

    Cheerful regards, Mike
    Last edited by Mike, K8LH; - 17th January 2011 at 14:28.

  4. #4
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,617


    Did you find this post helpful? Yes | No

    Default

    Mike,
    Not at all - at least I didn't read it like that and your help is much appreciated! I'm not that good at assembler though (hence the need for PBP ;-) ) Sorry for not mentioning I was using an 18 series PIC, I should've thought about that.

    Ioannis,
    MB_i is declared as a BYTE.

    At this point the best I've got is (was) this:
    Code:
    MB_k = MB_Buffer_Ptr - 3                   ' Find out where in the array to stop.
    FOR MB_i = 0 to MB_k                       ' Iterate thru the frame, stop before the last two bytes which are the actual CRC.
        MB_CRC = MB_CRC ^ MB_Buffer[MB_i]      ' Bitwise EXOR the current low byte of the CRC "value" with the byte in the buffer
     
        MB_j = 0
     
        DO WHILE MB_j.3 = 0
            If MB_CRC.Bit0 THEN                ' If the LSB is set
                MB_CRC = MB_CRC >> 1           ' We shift CRC on bit to the right....
                MB_CRC = $A001 ^ MB_CRC        ' and EXOR it with out polynomial for MODBUS CRC-16 ($A001)
            ELSE
                MB_CRC = MB_CRC >> 1           ' If the LSB is not set we simply shift CRC one bit to the right.
            ENDIF
        MB_j = MB_j + 1
        LOOP
    NEXT                                       ' Next byte in the frame
    RETURN
    This works and does the 9 byte test-array in 1496 cycles which is faster than with the FOR-NEXT loop I originally had. I then tried changing the meat of it to:
    Code:
    DO WHILE MB_j.3 = 0
            MB_CRC = MB_CRC >> 1
            If MB_CRC.BIT0 THEN MB_CRC = $A001 ^ MB_CRC
        MB_j = MB_j + 1
    LOOP
    This runs a little bit faster, 1394 cycles, but it calculates the wrong CRC value. This is because, the shifts is done before checking BIT0 while in the working code it checks BIT0 THEN shifts and XOR's (or not).

    Since the shift right command uses the RRCF instruction which shifts thru the CARRY flag I tried:
    Code:
    DO WHILE MB_j.3 = 0
      MB_CRC = MB_CRC >> 1
      If STATUS.0 THEN MB_CRC = $A001 ^ MB_CRC  'Check carry, XOR if set.
      MB_j = MB_j + 1
    LOOP
    This runs in 1414 cycles and produces the correct CRC value. Can anyone see any potential issues with doing it like this?

    All in all we've gone from 1730 to 1414 cycles, that's a ~22% increase - not bad! I wonder what more we can squeeze out of it? ;-)

    Thanks guys!
    /Henrik.
    Last edited by HenrikOlsson; - 17th January 2011 at 16:42.

  5. #5
    Join Date
    Nov 2003
    Location
    Greece
    Posts
    4,167


    Did you find this post helpful? Yes | No

    Default

    I am pretty sure I put the shift inside the loop.

    OK, why Darrel's routine is faster? Maybe is the shift? Can it be that Left Shift is faster than Right?

    I see no other obvious reason.

    Interresting.

    Ioannis

  6. #6
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,617


    Did you find this post helpful? Yes | No

    Default

    I am pretty sure I put the shift inside the loop
    Absolutely, but shifting before checking bit0 is different from checking bit0 before shifting ;-)

    I haven't yet tried Darrels version but I have looked at the ASM code it compiles to and as far as I can see it's now "longer" than what my current code compiles to. I'm actually starting to doubt it'll do the 9 byte test sequence in 825 cycles. Darrel, are you there?

    Thanks!
    /Henrik.

  7. #7
    Join Date
    Jul 2003
    Posts
    2,405


    Did you find this post helpful? Yes | No

    Default

    Run Darrels' example through MPSIM with the stopwatch to see if it really takes only 825 cycles to process 9 bytes through his CRC routine.

    Yours looks pretty darn tight as-is.
    Regards,

    -Bruce
    tech at rentron.com
    http://www.rentron.com

Members who have read this thread : 0

You do not have permission to view the list of names.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts