Optimizing CRC16 routine.

**HenrikOlsson** · - 17th January 2011, 06:14

Bert,
That is exactly like I had it when trying to exclude the outer array handling loop from the timing. I ran 9 bytes thru and got ~1530 cycles. 1530/9=170 cycles per byte and your test shows 169 - so that matches pretty well. I DID get a slight difference when I changed the poly and I guess that depends on the actual value of the byte going thru the routine, if you run the 9 byte sequence 123456789 thru it you should see a slight difference between the two polynomial values.

Mike,
So obviously BoostC compiles this to faster code but doesn't the BoostC compiled code you posted use GOTO as well?

Question is why there's such a difference between Darrel's code and the one I'm using. The poly itself does make a slight difference, I've verified that but it's still way off. I haven't actually verified the claimed 825us (I've no reason to doubt it) but I'll do that tonight just to make sure.

Thanks a lot guys, I appreciate the help!
/Henrik.

**Mike, K8LH** · - 17th January 2011, 13:58

Originally Posted by HenrikOlsson

Mike,
So obviously BoostC compiles this to faster code but doesn't the BoostC compiled code you posted use GOTO as well?

I posted 16F1827 code. A slightly unfair comparison because 18F devices don't have a "lsrf" instruction.

I didn't realize you were using an 18F until you posted your assembler output, and so I tested the following code which is similar to yours, for an 18F in BoostC;

Code:

    MB_CRC ^= work;             //
    for(MB_j=0; MB_j<8; MB_j++) //                    
    { if(MB_CRC.0)              //
      { MB_CRC >>= 1;           //
        MB_CRC ^= 0xA001;       //
      }
      else                      //
      { MB_CRC >>= 1;           //
      }
    }                           // Next bit in the byte

Here's the BoostC output which uses "bra" instructions and runs in about 115 cycles (worst case);

Code:

0008            label1
0008  500A      	MOVF gbl_work, W
000A  1A05      	XORWF gbl_MB_CRC, F
000C  6A09      	CLRF gbl_MB_j
000E            label2
000E  0E08      	MOVLW 0x08
0010  6009      	CPFSLT gbl_MB_j
0012  D7FA      	BRA	label1
0014  A005      	BTFSS gbl_MB_CRC,0
0016  D008      	BRA	label3
0018  90D8      	BCF STATUS,C
001A  3206      	RRCF gbl_MB_CRC+D'1', F
001C  3205      	RRCF gbl_MB_CRC, F
001E  0E01      	MOVLW 0x01
0020  1A05      	XORWF gbl_MB_CRC, F
0022  0EA0      	MOVLW 0xA0
0024  1A06      	XORWF gbl_MB_CRC+D'1', F
0026  D003      	BRA	label4
0028            label3
0028  90D8      	BCF STATUS,C
002A  3206      	RRCF gbl_MB_CRC+D'1', F
002C  3205      	RRCF gbl_MB_CRC, F
002E            label4
002E  2A09      	INCF gbl_MB_j, F
0030  D7EE      	BRA	label2

**Mike, K8LH** · - 17th January 2011, 14:08

Originally Posted by HenrikOlsson

Mike,
So obviously BoostC compiles this to faster code but doesn't the BoostC compiled code you posted use GOTO as well?

Please forgive me if it seems I was suggesting one compiler is better than another. I'm just trying to help you characterize loop timing but I don't have PBP.

Cheerful regards, Mike

**HenrikOlsson** · - 17th January 2011, 16:40

Mike,
Not at all - at least I didn't read it like that and your help is much appreciated! I'm not that good at assembler though (hence the need for PBP ;-) ) Sorry for not mentioning I was using an 18 series PIC, I should've thought about that.

Ioannis,
MB_i is declared as a BYTE.

At this point the best I've got is (was) this:

Code:

MB_k = MB_Buffer_Ptr - 3                   ' Find out where in the array to stop.
FOR MB_i = 0 to MB_k                       ' Iterate thru the frame, stop before the last two bytes which are the actual CRC.
    MB_CRC = MB_CRC ^ MB_Buffer[MB_i]      ' Bitwise EXOR the current low byte of the CRC "value" with the byte in the buffer
 
    MB_j = 0
 
    DO WHILE MB_j.3 = 0
        If MB_CRC.Bit0 THEN                ' If the LSB is set
            MB_CRC = MB_CRC >> 1           ' We shift CRC on bit to the right....
            MB_CRC = $A001 ^ MB_CRC        ' and EXOR it with out polynomial for MODBUS CRC-16 ($A001)
        ELSE
            MB_CRC = MB_CRC >> 1           ' If the LSB is not set we simply shift CRC one bit to the right.
        ENDIF
    MB_j = MB_j + 1
    LOOP
NEXT                                       ' Next byte in the frame
RETURN

This works and does the 9 byte test-array in 1496 cycles which is faster than with the FOR-NEXT loop I originally had. I then tried changing the meat of it to:

Code:

DO WHILE MB_j.3 = 0
        MB_CRC = MB_CRC >> 1
        If MB_CRC.BIT0 THEN MB_CRC = $A001 ^ MB_CRC
    MB_j = MB_j + 1
LOOP

This runs a little bit faster, 1394 cycles, but it calculates the wrong CRC value. This is because, the shifts is done before checking BIT0 while in the working code it checks BIT0 THEN shifts and XOR's (or not).

Since the shift right command uses the RRCF instruction which shifts thru the CARRY flag I tried:

Code:

DO WHILE MB_j.3 = 0
  MB_CRC = MB_CRC >> 1
  If STATUS.0 THEN MB_CRC = $A001 ^ MB_CRC  'Check carry, XOR if set.
  MB_j = MB_j + 1
LOOP

This runs in 1414 cycles and produces the correct CRC value. Can anyone see any potential issues with doing it like this?

All in all we've gone from 1730 to 1414 cycles, that's a ~22% increase - not bad! I wonder what more we can squeeze out of it? ;-)

Thanks guys!
/Henrik.

**Ioannis** · - 17th January 2011, 18:53

I am pretty sure I put the shift inside the loop.

OK, why Darrel's routine is faster? Maybe is the shift? Can it be that Left Shift is faster than Right?

I see no other obvious reason.

Interresting.

Ioannis

**HenrikOlsson** · - 17th January 2011, 20:06

I am pretty sure I put the shift inside the loop

Absolutely, but shifting before checking bit0 is different from checking bit0 before shifting ;-)

I haven't yet tried Darrels version but I have looked at the ASM code it compiles to and as far as I can see it's now "longer" than what my current code compiles to. I'm actually starting to doubt it'll do the 9 byte test sequence in 825 cycles. Darrel, are you there?

Thanks!
/Henrik.

**Bruce** · - 17th January 2011, 23:52

Run Darrels' example through MPSIM with the stopwatch to see if it really takes only 825 cycles to process 9 bytes through his CRC routine.

Yours looks pretty darn tight as-is.

Thread: Optimizing CRC16 routine.

Thread Tools

Search Thread

Display

Hybrid View

Members who have read this thread : 0

Bookmarks

Bookmarks

Posting Permissions