Optimizing CRC16 routine.


Closed Thread
Results 1 to 20 of 20

Hybrid View

  1. #1
    Join Date
    Aug 2005
    Location
    Michigan, USA
    Posts
    224


    Did you find this post helpful? Yes | No

    Default

    Greetings Henrik,

    Is there a chance you have a lot of overhead associated with array handling? Have you looked at the assembly language generated by that routine?

    I checked out the inner loop using BoostC and found worst case cycle times of 123 cycles in a "for/next" loop and 100 cycles in a "do/while" loop.

    Cheerful regards, Mike

    Code:
      crcword ^= work;         // 123 cycles (BoostC)
      for(char i=0; i<8; i++)  //                      
      { crcword >>= 1;         //
        if(status.C)           //
          crcword ^= 0x0A01;   //
      }
    Code:
      crcword ^= work; i = 0;   // 100 cycles (BoostC)
      do {                      //
        crcword >>= 1;          //
        if(status.C)            //
          crcword ^= 0x0A01;    //
        i++;                    //
      } while(i.3 == 0);        //
    Last edited by Mike, K8LH; - 16th January 2011 at 18:57.

  2. #2
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,615


    Did you find this post helpful? Yes | No

    Default

    Hi Mike,
    I'm using TMR1 to measure the execution time of the routine. I just moved the starting and stopping of the timer so it's around the inner FOR-NEXT loop. In other words the timer is started before the inner FOR and stopped after the inner NEXT effectively excluding the array handling of the outer FOR-NEXT loop from the timing.

    This time I got 1533 cycles.

    So, as far as I can see it is the inner loop that is eating away the time and I still don't understand why. I'll try a DO-WHILE construct instead of the FOR-NEXT and see what I'll get.

    Thanks!
    /Henrik.

  3. #3
    Join Date
    Aug 2005
    Location
    Michigan, USA
    Posts
    224


    Did you find this post helpful? Yes | No

    Default

    ... <snip> ... I'm using TMR1 to measure the execution time of the routine ... <snip> ...
    Could that be a problem? Do you have anything similar to the MPLAB Simulator Stopwatch?

    ... <snip>... This time I got 1533 cycles ... <snip> ...
    My goodness, that doesn't sound quite right. That's close to what you got when you were processing nine bytes. Could there be a problem with the method you're using to count cycles?

    I would expect you to get code similar to the assembler code generated from my BoostC "Do/While" loop;

    Code:
      crcword ^= work; i = 8;   //
    0040  0845  	MOVF crc16_00000_arg_work, W
    0041  06B7  	XORWF gbl_crcword, F
    0042  3008  	MOVLW 0x08
    0043  00C6  	MOVWF crc16_00000_1_i
    
      do {                      //
    0044        label5
    
        crcword >>= 1;          //
    0044  36B8  	LSRF gbl_crcword+D'1', F
    0045  0CB7  	RRF gbl_crcword, F
    
        if(status.C)            //
    0046  1C03  	BTFSS gbl_status,0
    0047  284C  	GOTO	label6
    004C        label6
    
          crcword ^= 0x0A01;    //
    0048  3001  	MOVLW 0x01
    0049  06B7  	XORWF gbl_crcword, F
    004A  300A  	MOVLW 0x0A
    004B  06B8  	XORWF gbl_crcword+D'1', F
    
      } while(--i);             //
    004C  03C6  	DECF crc16_00000_1_i, F
    004D  1D03  	BTFSS STATUS,Z
    004E  2844  	GOTO	label5
    }
    Last edited by Mike, K8LH; - 16th January 2011 at 18:53.

  4. #4
    Join Date
    Oct 2005
    Location
    Sweden
    Posts
    3,615


    Did you find this post helpful? Yes | No

    Default

    Hi Mike,
    I tried a DO-WHILE construct and I'm quite surprised by the results. I moved the starting and stopping back to where I had it originally and with the DO-WHILE I get 1496 cycles instead of 1730 for the 9 test-bytes. A 67 byte array, previously taking ~12600 cycles now executes in ~10800. Getting there....

    Here's the current ASM code:
    Code:
    0012A0                    M _MB_CRC_16
    0012A0 6831               M         setf    _MB_CRC
    0012A2 6832               M         setf    (_MB_CRC) + 1
    0012A4 0E03               M         movlw   003h
    0012A6 5C3F               M         subwf   _MB_Buffer_Ptr, W
    0012A8 6E42               M         movwf   _MB_k
    0012AA 6A40               M         clrf    _MB_i
    0012AC                    M L00108
    0012AC 0004               M         clrwdt
    0012AE 5040               M         movf    _MB_i, W
    0012B0 5C42               M         subwf   _MB_k, W
    0012B2 A0D8               M         btfss   STATUS, C
    0012B4 EF80 F009          M         goto    L00109
    0012B8 5040               M         movf    _MB_i, W
    0012BA 0F56               M         addlw   low (_MB_Buffer)
    0012BC 6EE9               M         movwf   FSR0L
    0012BE 0E01               M         movlw   (_MB_Buffer) >> 8
    0012C0 6AEA               M         clrf    FSR0H
    0012C2 22EA               M         addwfc  FSR0H, F
    0012C4 CFEF F013          M         movff   INDF0, T1
    0012C8 5013               M         movf    T1,  W
    0012CA 1A31               M         xorwf   _MB_CRC, F
    0012CC 6A41               M         clrf    _MB_j
    0012CE                    M L00110
    0012CE 0004               M         clrwdt
    0012D0 B641               M         btfsc   _MB_j, 003h
    0012D2 EF7D F009          M         goto    L00111
    0012D6 0004               M         clrwdt
    0012D8 A031               M         btfss   _MB_CRC, 000h
    0012DA EF78 F009          M         goto    L00112
    0012DE 90D8               M         bcf     STATUS, C
    0012E0 3232               M         rrcf    _MB_CRC + 1, F
    0012E2 3231               M         rrcf    _MB_CRC, F
    0012E4 0E01               M         movlw   low (0A001h)
    0012E6 1A31               M         xorwf   _MB_CRC, F
    0012E8 0EA0               M         movlw   (0A001h) >> 8
    0012EA 1A32               M         xorwf   _MB_CRC + 1, F
    0012EC EF7B F009          M         goto    L00113
    0012F0                    M L00112
    0012F0 90D8               M         bcf     STATUS, C
    0012F2 3232               M         rrcf    _MB_CRC + 1, F
    0012F4 3231               M         rrcf    _MB_CRC, F
    0012F6                    M L00113
    0012F6 2A41               M         incf    _MB_j, F
    0012F8 D7EA               M         bra     L00110
    0012FA                    M L00111
    0012FA 2A40               M         incf    _MB_i, F
    0012FC A0D8               M         btfss   STATUS, C
    0012FE D7D6               M         bra     L00108
    001300                    M L00109
    001300 0012               M         return
    /Henrik.

  5. #5
    Join Date
    Aug 2005
    Location
    Michigan, USA
    Posts
    224


    Did you find this post helpful? Yes | No

    Default

    The inner loop starts at label "L00110" and uses about 18 instruction cycles (worst case) each loop iteration for a whopping total of right around 144 cycles per byte processed.

    I have not check the rest of the code. I must get back to my studies.

    Cheerful regards, Mike
    Last edited by Mike, K8LH; - 16th January 2011 at 19:35.

  6. #6
    Join Date
    Aug 2010
    Location
    Maryland, USA
    Posts
    869


    Did you find this post helpful? Yes | No

    Default Trying to get up to speed

    Hi Henrik,
    here is the snippit I am testing:
    Code:
    Main:      'DO WHATEVER YOU WANT HERE
    tmr1=0
    t1con = 1
        For MB_j = 0 to 7                      ' Cycle thru the low byte of the CRC value                      
            If MB_CRC.Bit0 THEN                ' If the LSB is set
                MB_CRC = MB_CRC >> 1           ' We shift CRC on bit to the right....
                MB_CRC = $A001 ^ MB_CRC        ' and EXOR it with out polynomial for MODBUS CRC-16 ($A001)
            ELSE
                MB_CRC = MB_CRC >> 1           ' If the LSB is not set we simply shift CRC one bit to the right.
            ENDIF
        Next                                   ' Next bit in the byte
    t1con = 0
    stopnow:
    goto stopnow
    This takes 169 cycles according to TMR1 setup as you see here. I have tried with both $A001 and $1021. Both are the same. Does this fall in line with what you have?

    BTW, MPLAB stopwatch reports the same

    Next I think I will try out DT's snippet
    Last edited by cncmachineguy; - 16th January 2011 at 19:58. Reason: added
    -Bert

    The glass is not half full or half empty, Its twice as big as needed for the job!

    http://foamcasualty.com/ - Warbird R/C scratch building with foam!

  7. #7
    Join Date
    Aug 2005
    Location
    Michigan, USA
    Posts
    224


    Did you find this post helpful? Yes | No

    Default

    I just realized that PBP assembler code is using two word "goto" instructions for those short branches instead of one word "bra" instructions. That makes it worst case 20 cycles instead of 18 cycles per interation through the inner loop (160 cycles per byte). Ouch!

Members who have read this thread : 0

You do not have permission to view the list of names.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts