This should take about 13us at 16Mhz.
Code:
BitCount  VAR BYTE BANK0
testByte  VAR BYTE BANK0
LoopCount VAR BYTE BANK0

testByte = %00100100

ASM
   MOVE?CB  8, _LoopCount     ;2  loop through 8-bits
   MOVE?CB  0, _BitCount      ;2  clear the Bit Count
TestLoop
   ifdef BSR                  ;   for 18F's
      rrcf  _testByte, F      ;1     rotate LSB into carry
   else                       ;   for 16F's
      rrf   _testByte, F      ;1     rotate LSB into carry
   endif
   btfsc    STATUS, C         ;1  if a 1 rotated into carry
   incf     _BitCount, F      ;1    increment the Bit Count
   decfsz   _LoopCount, F     ;1  skip if rotated all bits
   goto     TestLoop          ;2  do the rest of the bits
ENDASM
When it's finished the number of 1's in the byte will be in BitCount.
The number of 0's, would be 8-BitCount.
And the testByte variable will have been destroyed.

hth,