Run Darrels' example through MPSIM with the stopwatch to see if it really takes only 825 cycles to process 9 bytes through his CRC routine.

Yours looks pretty darn tight as-is.