Optimizing CRC16 routine.


Closed Thread
Results 1 to 20 of 20
  1. #1
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,523

    Default Optimizing CRC16 routine.

    Hi everyone,
    This is kind of a doublepost as I originally put this question as a response in this thread without noticing the thread was in the wishlist section of the forum. Hopefully it gets a bit more attention here.

    Allright, I'm working a MODBUS stack for PBP. I found a CRC routine (I think it was in another thread here) which I've modified slightly and it works nicely. The problem is that it is, or seems, quite slow. Here's the current code:
    Code:
    MB_CRC = 65535
    MB_k = MB_Buffer_Ptr - 3                   ' Find out where in the array to stop.
    FOR MB_i = 0 to MB_k                       ' Iterate thru the frame, stop before the last two bytes which are the actual CRC.
        MB_CRC = MB_CRC ^ MB_Buffer[MB_i]      ' Bitwise EXOR the current low byte of the CRC "value" with the byte in the buffer
     
        For MB_j = 0 to 7                      ' Cycle thru the low byte of the CRC value                      
            If MB_CRC.Bit0 THEN                ' If the LSB is set
                MB_CRC = MB_CRC >> 1           ' We shift CRC on bit to the right....
                MB_CRC = $A001 ^ MB_CRC        ' and EXOR it with out polynomial for MODBUS CRC-16 ($A001)
            ELSE
                MB_CRC = MB_CRC >> 1           ' If the LSB is not set we simply shift CRC one bit to the right.
            ENDIF
        Next                                   ' Next bit in the byte
    NEXT                                       ' Next byte in the frame
    RETURN
    This thing takes 1730 cycles, including a GOSUB and RETURN, to calculate the CRC value for 9 bytes (123456789). In the tread I linked to above Darrel has a, what looks to me, very similar routined which is said to do it in 825 cycles. For reference, here's Darrels code:
    Code:
    CRC       VAR WORD
    CRC_IN    VAR BYTE
    Idx       VAR BYTE
    CRC_Len   CON 16
    CRC_Poly  CON $1021  ; x16 + x12 + x5 + 1
    ;-----CRC-16 -----------------------------------------------------
    CRC16:
      CRC.HighByte = CRC.HighByte ^ CRC_IN
      FOR Idx = 0 to 7
        IF CRC.0(CRC_Len-1) THEN
          CRC = (CRC << 1) ^ CRC_Poly
        ELSE
          CRC = CRC << 1
        ENDIF
      NEXT Idx
    RETURN
    As you can see it's very simmilar, it uses another polynominal (which I don't think should make any difference (?)) and it shifts to the left instead of the right so it's another kind of CRC than what I need or I'd use it directly.

    Does anyone have any idea why mine is so much slower than Darrels and/or what I might do to speed it up? MODBUS frames can get quite large and currently it takes more than 12000 cycles to go thru a frame of 67 bytes.

    Thanks!
    /Henrik.

  2. #2
    Join Date
    Aug 2010
    Location
    Maryland, USA
    Posts
    869


    Did you find this post helpful? Yes | No

    Default

    Not saying I get this whole CRC thing, just looking at the 2 snippits, the biggest thing I see is you have a nested loop and Darrel doesn't. But that seems like it would change things by more than double. (approx difference between the 2)

    Try using his poly instead of yours in your code to see if that changes your timing. I don't know if you can just do that, but if so it will eliminate that as a factor.
    -Bert

    The glass is not half full or half empty, Its twice as big as needed for the job!

    http://foamcasualty.com/ - Warbird R/C scratch building with foam!

  3. #3
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,523


    Did you find this post helpful? Yes | No

    Default

    Hi Bert,
    I tested with $1021 instead of $A001 as the poly and it changed from 1730 to 1682 cycles - still nowhere nere ~800.

    Yes, mine iterates thru the array of 9 bytes (in this case) in one go while Darrels would have to be called 9 times with a new byte in CRC_In each time. I don't think that is what makes it that much slower.

    I'm new to this CRC thing as well and don't yet fully understand it enough to try to come with a completely different aproach.

    Thanks!
    /Henrik.

  4. #4
    Join Date
    Aug 2010
    Location
    Maryland, USA
    Posts
    869


    Did you find this post helpful? Yes | No

    Default

    What if you combine these 2 lines:
    Code:
     MB_CRC = MB_CRC >> 1          
     MB_CRC = $A001 ^ MB_CRC
    to be this:
    Code:
     MB_CRC = (MB_CRC >> 1) ^ $A001 ; or try DT's poly here just for consistancy
    This would kinda make sense to me as why yours might be ~twice as long, PBP may be creating double code to do it in 2 steps.
    -Bert

    The glass is not half full or half empty, Its twice as big as needed for the job!

    http://foamcasualty.com/ - Warbird R/C scratch building with foam!

  5. #5
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,523


    Did you find this post helpful? Yes | No

    Default

    Hi,
    Thank you for your ideas, I appreciate it.

    Unfortunately it's the opposite of what you think. Originally I did have those two lines combined but splitting the statement into two increased the performance by ~9% (it went from 1894 to 1730 cycles for the 9 test-bytes) it also saved me 8 bytes of program space - which is secondary but still nice.

    /Henrik.

  6. #6
    Join Date
    Aug 2005
    Location
    Michigan, USA
    Posts
    224


    Did you find this post helpful? Yes | No

    Default

    Greetings Henrik,

    Is there a chance you have a lot of overhead associated with array handling? Have you looked at the assembly language generated by that routine?

    I checked out the inner loop using BoostC and found worst case cycle times of 123 cycles in a "for/next" loop and 100 cycles in a "do/while" loop.

    Cheerful regards, Mike

    Code:
      crcword ^= work;         // 123 cycles (BoostC)
      for(char i=0; i<8; i++)  //                      
      { crcword >>= 1;         //
        if(status.C)           //
          crcword ^= 0x0A01;   //
      }
    Code:
      crcword ^= work; i = 0;   // 100 cycles (BoostC)
      do {                      //
        crcword >>= 1;          //
        if(status.C)            //
          crcword ^= 0x0A01;    //
        i++;                    //
      } while(i.3 == 0);        //
    Last edited by Mike, K8LH; - 16th January 2011 at 18:57.

  7. #7
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,523


    Did you find this post helpful? Yes | No

    Default

    Hi Mike,
    I'm using TMR1 to measure the execution time of the routine. I just moved the starting and stopping of the timer so it's around the inner FOR-NEXT loop. In other words the timer is started before the inner FOR and stopped after the inner NEXT effectively excluding the array handling of the outer FOR-NEXT loop from the timing.

    This time I got 1533 cycles.

    So, as far as I can see it is the inner loop that is eating away the time and I still don't understand why. I'll try a DO-WHILE construct instead of the FOR-NEXT and see what I'll get.

    Thanks!
    /Henrik.

  8. #8
    Join Date
    Aug 2005
    Location
    Michigan, USA
    Posts
    224


    Did you find this post helpful? Yes | No

    Default

    ... <snip> ... I'm using TMR1 to measure the execution time of the routine ... <snip> ...
    Could that be a problem? Do you have anything similar to the MPLAB Simulator Stopwatch?

    ... <snip>... This time I got 1533 cycles ... <snip> ...
    My goodness, that doesn't sound quite right. That's close to what you got when you were processing nine bytes. Could there be a problem with the method you're using to count cycles?

    I would expect you to get code similar to the assembler code generated from my BoostC "Do/While" loop;

    Code:
      crcword ^= work; i = 8;   //
    0040  0845  	MOVF crc16_00000_arg_work, W
    0041  06B7  	XORWF gbl_crcword, F
    0042  3008  	MOVLW 0x08
    0043  00C6  	MOVWF crc16_00000_1_i
    
      do {                      //
    0044        label5
    
        crcword >>= 1;          //
    0044  36B8  	LSRF gbl_crcword+D'1', F
    0045  0CB7  	RRF gbl_crcword, F
    
        if(status.C)            //
    0046  1C03  	BTFSS gbl_status,0
    0047  284C  	GOTO	label6
    004C        label6
    
          crcword ^= 0x0A01;    //
    0048  3001  	MOVLW 0x01
    0049  06B7  	XORWF gbl_crcword, F
    004A  300A  	MOVLW 0x0A
    004B  06B8  	XORWF gbl_crcword+D'1', F
    
      } while(--i);             //
    004C  03C6  	DECF crc16_00000_1_i, F
    004D  1D03  	BTFSS STATUS,Z
    004E  2844  	GOTO	label5
    }
    Last edited by Mike, K8LH; - 16th January 2011 at 18:53.

  9. #9
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,523


    Did you find this post helpful? Yes | No

    Default

    Hi Mike,
    I tried a DO-WHILE construct and I'm quite surprised by the results. I moved the starting and stopping back to where I had it originally and with the DO-WHILE I get 1496 cycles instead of 1730 for the 9 test-bytes. A 67 byte array, previously taking ~12600 cycles now executes in ~10800. Getting there....

    Here's the current ASM code:
    Code:
    0012A0                    M _MB_CRC_16
    0012A0 6831               M         setf    _MB_CRC
    0012A2 6832               M         setf    (_MB_CRC) + 1
    0012A4 0E03               M         movlw   003h
    0012A6 5C3F               M         subwf   _MB_Buffer_Ptr, W
    0012A8 6E42               M         movwf   _MB_k
    0012AA 6A40               M         clrf    _MB_i
    0012AC                    M L00108
    0012AC 0004               M         clrwdt
    0012AE 5040               M         movf    _MB_i, W
    0012B0 5C42               M         subwf   _MB_k, W
    0012B2 A0D8               M         btfss   STATUS, C
    0012B4 EF80 F009          M         goto    L00109
    0012B8 5040               M         movf    _MB_i, W
    0012BA 0F56               M         addlw   low (_MB_Buffer)
    0012BC 6EE9               M         movwf   FSR0L
    0012BE 0E01               M         movlw   (_MB_Buffer) >> 8
    0012C0 6AEA               M         clrf    FSR0H
    0012C2 22EA               M         addwfc  FSR0H, F
    0012C4 CFEF F013          M         movff   INDF0, T1
    0012C8 5013               M         movf    T1,  W
    0012CA 1A31               M         xorwf   _MB_CRC, F
    0012CC 6A41               M         clrf    _MB_j
    0012CE                    M L00110
    0012CE 0004               M         clrwdt
    0012D0 B641               M         btfsc   _MB_j, 003h
    0012D2 EF7D F009          M         goto    L00111
    0012D6 0004               M         clrwdt
    0012D8 A031               M         btfss   _MB_CRC, 000h
    0012DA EF78 F009          M         goto    L00112
    0012DE 90D8               M         bcf     STATUS, C
    0012E0 3232               M         rrcf    _MB_CRC + 1, F
    0012E2 3231               M         rrcf    _MB_CRC, F
    0012E4 0E01               M         movlw   low (0A001h)
    0012E6 1A31               M         xorwf   _MB_CRC, F
    0012E8 0EA0               M         movlw   (0A001h) >> 8
    0012EA 1A32               M         xorwf   _MB_CRC + 1, F
    0012EC EF7B F009          M         goto    L00113
    0012F0                    M L00112
    0012F0 90D8               M         bcf     STATUS, C
    0012F2 3232               M         rrcf    _MB_CRC + 1, F
    0012F4 3231               M         rrcf    _MB_CRC, F
    0012F6                    M L00113
    0012F6 2A41               M         incf    _MB_j, F
    0012F8 D7EA               M         bra     L00110
    0012FA                    M L00111
    0012FA 2A40               M         incf    _MB_i, F
    0012FC A0D8               M         btfss   STATUS, C
    0012FE D7D6               M         bra     L00108
    001300                    M L00109
    001300 0012               M         return
    /Henrik.

  10. #10
    Join Date
    Aug 2005
    Location
    Michigan, USA
    Posts
    224


    Did you find this post helpful? Yes | No

    Default

    The inner loop starts at label "L00110" and uses about 18 instruction cycles (worst case) each loop iteration for a whopping total of right around 144 cycles per byte processed.

    I have not check the rest of the code. I must get back to my studies.

    Cheerful regards, Mike
    Last edited by Mike, K8LH; - 16th January 2011 at 19:35.

  11. #11
    Join Date
    Aug 2010
    Location
    Maryland, USA
    Posts
    869


    Did you find this post helpful? Yes | No

    Default Trying to get up to speed

    Hi Henrik,
    here is the snippit I am testing:
    Code:
    Main:      'DO WHATEVER YOU WANT HERE
    tmr1=0
    t1con = 1
        For MB_j = 0 to 7                      ' Cycle thru the low byte of the CRC value                      
            If MB_CRC.Bit0 THEN                ' If the LSB is set
                MB_CRC = MB_CRC >> 1           ' We shift CRC on bit to the right....
                MB_CRC = $A001 ^ MB_CRC        ' and EXOR it with out polynomial for MODBUS CRC-16 ($A001)
            ELSE
                MB_CRC = MB_CRC >> 1           ' If the LSB is not set we simply shift CRC one bit to the right.
            ENDIF
        Next                                   ' Next bit in the byte
    t1con = 0
    stopnow:
    goto stopnow
    This takes 169 cycles according to TMR1 setup as you see here. I have tried with both $A001 and $1021. Both are the same. Does this fall in line with what you have?

    BTW, MPLAB stopwatch reports the same

    Next I think I will try out DT's snippet
    Last edited by cncmachineguy; - 16th January 2011 at 19:58. Reason: added
    -Bert

    The glass is not half full or half empty, Its twice as big as needed for the job!

    http://foamcasualty.com/ - Warbird R/C scratch building with foam!

  12. #12
    Join Date
    Aug 2005
    Location
    Michigan, USA
    Posts
    224


    Did you find this post helpful? Yes | No

    Default

    I just realized that PBP assembler code is using two word "goto" instructions for those short branches instead of one word "bra" instructions. That makes it worst case 20 cycles instead of 18 cycles per interation through the inner loop (160 cycles per byte). Ouch!

  13. #13
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,523


    Did you find this post helpful? Yes | No

    Default

    Bert,
    That is exactly like I had it when trying to exclude the outer array handling loop from the timing. I ran 9 bytes thru and got ~1530 cycles. 1530/9=170 cycles per byte and your test shows 169 - so that matches pretty well. I DID get a slight difference when I changed the poly and I guess that depends on the actual value of the byte going thru the routine, if you run the 9 byte sequence 123456789 thru it you should see a slight difference between the two polynomial values.

    Mike,
    So obviously BoostC compiles this to faster code but doesn't the BoostC compiled code you posted use GOTO as well?

    Question is why there's such a difference between Darrel's code and the one I'm using. The poly itself does make a slight difference, I've verified that but it's still way off. I haven't actually verified the claimed 825us (I've no reason to doubt it) but I'll do that tonight just to make sure.

    Thanks a lot guys, I appreciate the help!
    /Henrik.

  14. #14
    Join Date
    Nov 2003
    Location
    Greece
    Posts
    3,825


    Did you find this post helpful? Yes | No

    Default

    Hi Henrik.

    How did you declare MB_i variable? Byte or Word?

    Also instead of this:

    Code:
    MB_CRC = 65535
    MB_k = MB_Buffer_Ptr - 3                   ' Find out where in the array to stop.
    FOR MB_i = 0 to MB_k                       ' Iterate thru the frame, stop before the last two bytes which are the actual CRC.
        MB_CRC = MB_CRC ^ MB_Buffer[MB_i]      ' Bitwise EXOR the current low byte of the CRC "value" with the byte in the buffer
     
        For MB_j = 0 to 7                      ' Cycle thru the low byte of the CRC value                      
            If MB_CRC.Bit0 THEN                ' If the LSB is set
                MB_CRC = MB_CRC >> 1           ' We shift CRC on bit to the right....
                MB_CRC = $A001 ^ MB_CRC        ' and EXOR it with out polynomial for MODBUS CRC-16 ($A001)
            ELSE
                MB_CRC = MB_CRC >> 1           ' If the LSB is not set we simply shift CRC one bit to the right.
            ENDIF
        Next                                   ' Next bit in the byte
    NEXT                                       ' Next byte in the frame
    RETURN
    try this one:

    Code:
    MB_CRC = 65535
    MB_k = MB_Buffer_Ptr - 3                   ' Find out where in the array to stop.
    FOR MB_i = 0 to MB_k                       ' Iterate thru the frame, stop before the last two bytes which are the actual CRC.
        MB_CRC = MB_CRC ^ MB_Buffer[MB_i]      ' Bitwise EXOR the current low byte of the CRC "value" with the byte in the buffer
     
        For MB_j = 0 to 7                      ' Cycle thru the low byte of the CRC value                      
            MB_CRC = MB_CRC >> 1
            If MB_CRC.Bit0 THEN  MB_CRC = $A001 ^ MB_CRC               ' If the LSB is set
                                   ' and EXOR it with out polynomial for MODBUS CRC-16 ($A001)
        Next                                   ' Next bit in the byte
    NEXT                                       ' Next byte in the frame
    RETURN
    Ioannis

  15. #15
    Join Date
    Aug 2005
    Location
    Michigan, USA
    Posts
    224


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by HenrikOlsson View Post
    Mike,
    So obviously BoostC compiles this to faster code but doesn't the BoostC compiled code you posted use GOTO as well?
    I posted 16F1827 code. A slightly unfair comparison because 18F devices don't have a "lsrf" instruction.

    I didn't realize you were using an 18F until you posted your assembler output, and so I tested the following code which is similar to yours, for an 18F in BoostC;

    Code:
        MB_CRC ^= work;             //
        for(MB_j=0; MB_j<8; MB_j++) //                    
        { if(MB_CRC.0)              //
          { MB_CRC >>= 1;           //
            MB_CRC ^= 0xA001;       //
          }
          else                      //
          { MB_CRC >>= 1;           //
          }
        }                           // Next bit in the byte
    Here's the BoostC output which uses "bra" instructions and runs in about 115 cycles (worst case);

    Code:
    0008            label1
    0008  500A      	MOVF gbl_work, W
    000A  1A05      	XORWF gbl_MB_CRC, F
    000C  6A09      	CLRF gbl_MB_j
    000E            label2
    000E  0E08      	MOVLW 0x08
    0010  6009      	CPFSLT gbl_MB_j
    0012  D7FA      	BRA	label1
    0014  A005      	BTFSS gbl_MB_CRC,0
    0016  D008      	BRA	label3
    0018  90D8      	BCF STATUS,C
    001A  3206      	RRCF gbl_MB_CRC+D'1', F
    001C  3205      	RRCF gbl_MB_CRC, F
    001E  0E01      	MOVLW 0x01
    0020  1A05      	XORWF gbl_MB_CRC, F
    0022  0EA0      	MOVLW 0xA0
    0024  1A06      	XORWF gbl_MB_CRC+D'1', F
    0026  D003      	BRA	label4
    0028            label3
    0028  90D8      	BCF STATUS,C
    002A  3206      	RRCF gbl_MB_CRC+D'1', F
    002C  3205      	RRCF gbl_MB_CRC, F
    002E            label4
    002E  2A09      	INCF gbl_MB_j, F
    0030  D7EE      	BRA	label2
    Last edited by Mike, K8LH; - 17th January 2011 at 14:31.

  16. #16
    Join Date
    Aug 2005
    Location
    Michigan, USA
    Posts
    224


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by HenrikOlsson View Post
    Mike,
    So obviously BoostC compiles this to faster code but doesn't the BoostC compiled code you posted use GOTO as well?
    Please forgive me if it seems I was suggesting one compiler is better than another. I'm just trying to help you characterize loop timing but I don't have PBP.

    Cheerful regards, Mike
    Last edited by Mike, K8LH; - 17th January 2011 at 14:28.

  17. #17
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,523


    Did you find this post helpful? Yes | No

    Default

    Mike,
    Not at all - at least I didn't read it like that and your help is much appreciated! I'm not that good at assembler though (hence the need for PBP ;-) ) Sorry for not mentioning I was using an 18 series PIC, I should've thought about that.

    Ioannis,
    MB_i is declared as a BYTE.

    At this point the best I've got is (was) this:
    Code:
    MB_k = MB_Buffer_Ptr - 3                   ' Find out where in the array to stop.
    FOR MB_i = 0 to MB_k                       ' Iterate thru the frame, stop before the last two bytes which are the actual CRC.
        MB_CRC = MB_CRC ^ MB_Buffer[MB_i]      ' Bitwise EXOR the current low byte of the CRC "value" with the byte in the buffer
     
        MB_j = 0
     
        DO WHILE MB_j.3 = 0
            If MB_CRC.Bit0 THEN                ' If the LSB is set
                MB_CRC = MB_CRC >> 1           ' We shift CRC on bit to the right....
                MB_CRC = $A001 ^ MB_CRC        ' and EXOR it with out polynomial for MODBUS CRC-16 ($A001)
            ELSE
                MB_CRC = MB_CRC >> 1           ' If the LSB is not set we simply shift CRC one bit to the right.
            ENDIF
        MB_j = MB_j + 1
        LOOP
    NEXT                                       ' Next byte in the frame
    RETURN
    This works and does the 9 byte test-array in 1496 cycles which is faster than with the FOR-NEXT loop I originally had. I then tried changing the meat of it to:
    Code:
    DO WHILE MB_j.3 = 0
            MB_CRC = MB_CRC >> 1
            If MB_CRC.BIT0 THEN MB_CRC = $A001 ^ MB_CRC
        MB_j = MB_j + 1
    LOOP
    This runs a little bit faster, 1394 cycles, but it calculates the wrong CRC value. This is because, the shifts is done before checking BIT0 while in the working code it checks BIT0 THEN shifts and XOR's (or not).

    Since the shift right command uses the RRCF instruction which shifts thru the CARRY flag I tried:
    Code:
    DO WHILE MB_j.3 = 0
      MB_CRC = MB_CRC >> 1
      If STATUS.0 THEN MB_CRC = $A001 ^ MB_CRC  'Check carry, XOR if set.
      MB_j = MB_j + 1
    LOOP
    This runs in 1414 cycles and produces the correct CRC value. Can anyone see any potential issues with doing it like this?

    All in all we've gone from 1730 to 1414 cycles, that's a ~22% increase - not bad! I wonder what more we can squeeze out of it? ;-)

    Thanks guys!
    /Henrik.
    Last edited by HenrikOlsson; - 17th January 2011 at 16:42.

  18. #18
    Join Date
    Nov 2003
    Location
    Greece
    Posts
    3,825


    Did you find this post helpful? Yes | No

    Default

    I am pretty sure I put the shift inside the loop.

    OK, why Darrel's routine is faster? Maybe is the shift? Can it be that Left Shift is faster than Right?

    I see no other obvious reason.

    Interresting.

    Ioannis

  19. #19
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,523


    Did you find this post helpful? Yes | No

    Default

    I am pretty sure I put the shift inside the loop
    Absolutely, but shifting before checking bit0 is different from checking bit0 before shifting ;-)

    I haven't yet tried Darrels version but I have looked at the ASM code it compiles to and as far as I can see it's now "longer" than what my current code compiles to. I'm actually starting to doubt it'll do the 9 byte test sequence in 825 cycles. Darrel, are you there?

    Thanks!
    /Henrik.

  20. #20
    Join Date
    Jul 2003
    Posts
    2,405


    Did you find this post helpful? Yes | No

    Default

    Run Darrels' example through MPSIM with the stopwatch to see if it really takes only 825 cycles to process 9 bytes through his CRC routine.

    Yours looks pretty darn tight as-is.
    Regards,

    -Bruce
    tech at rentron.com
    http://www.rentron.com

Members who have read this thread : 0

You do not have permission to view the list of names.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts