Bert,
That is exactly like I had it when trying to exclude the outer array handling loop from the timing. I ran 9 bytes thru and got ~1530 cycles. 1530/9=170 cycles per byte and your test shows 169 - so that matches pretty well. I DID get a slight difference when I changed the poly and I guess that depends on the actual value of the byte going thru the routine, if you run the 9 byte sequence 123456789 thru it you should see a slight difference between the two polynomial values.

Mike,
So obviously BoostC compiles this to faster code but doesn't the BoostC compiled code you posted use GOTO as well?

Question is why there's such a difference between Darrel's code and the one I'm using. The poly itself does make a slight difference, I've verified that but it's still way off. I haven't actually verified the claimed 825us (I've no reason to doubt it) but I'll do that tonight just to make sure.

Thanks a lot guys, I appreciate the help!
/Henrik.