Quote Originally Posted by Darrel Taylor View Post
Here's an example of measuring the time to do a 16/16 bit divide. But you can have any number of statements inbetween, as long as the time does not exceed 65535 instructions.
An aside from another thread discussing math execution time in PBP 2.50B, where it seems that ALL add/subtract/multiply/divide operations are done as though they are signed-31-bit operations vs. differentiating between byte/word/long/mixed operations.
IF I'm reading pbppi18L.Lib correctly, it seems that the worst case for a s31/s31 divide is about 1,127 cycles (signed, 2^30-1 divided by 1), best case looks to be about 713 cycles (unsigned, 1 divided by 0)
Code:
;****************************************************************
;* DIV        : 32 x 32 divide                                  *
;* Input      : R0 / R1                                         *
;* Output     : R0 = quotient                                   * 
;*            : R2 = remainder                                  *
;* Notes      : R2 = R0 MOD R1                                  *
;****************************************************************

    ifdef DIVS_USED
  LIST
DIVS	clrf	R3 + 3		; Clear sign difference indicator
	btfss	R0 + 3, 7	; Check for R0 negative
	bra	divchkr1	; Not negative
	btg	R3 + 3, 7	; Flip sign indicator
	clrf	WREG		; Clear W for subtracts
	negf	R0		; Flip value to plus
	subfwb	R0 + 1, F
	subfwb	R0 + 2, F
	subfwb	R0 + 3, F
divchkr1 btfss	R1 + 3, 7	; Check for R1 negative
	bra	divdo		; Not negative
	btg	R3 + 3, 7	; Flip sign indicator
	clrf	WREG		; Clear W for subtracts
	negf	R1		; Flip value to plus
	subfwb	R1 + 1, F
	subfwb	R1 + 2, F
	subfwb	R1 + 3, F
	bra	divdo		; Skip unsigned entry
  NOLIST
DIV_USED = 1
    endif

    ifdef DIV_USED
  LIST
DIV
      ifdef DIVS_USED
	clrf	R3 + 3		; Clear sign difference indicator	
      endif
divdo	clrf	R2		; Do the divide
	clrf	R2 + 1
	clrf	R2 + 2
	clrf	R2 + 3

	movlw	32

IF R0.byte3 = 0 AND R1.byte3 = 0 then movlw 24
and preshift R0 and R1 over 8 bits

IF R0.word1 = 0 AND R1.word1 = 0 then movlw 16
and preshift R0 and R1 over 16 bits

IF R0.word1 = 0 AND R1.word1 = 0 and R0.byte1 = 0 and R1.byte1 = 0 then movlw 8
and preshift R0 and R1 over 24 bits

	movwf	R3

divloop	rlcf	R0 + 3, W
	rlcf	R2, F
	rlcf	R2 + 1, F
	rlcf	R2 + 2, F
	rlcf	R2 + 3, F
	movf	R1, W
	subwf	R2, F
	movf	R1 + 1, W
	subwfb	R2 + 1, F
	movf	R1 + 2, W
	subwfb	R2 + 2, F
	movf	R1 + 3, W
	subwfb	R2 + 3, F

	bc	divok
	movf	R1, W
	addwf	R2, F
	movf	R1 + 1, W
	addwfc	R2 + 1, F
	movf	R1 + 2, W
	addwfc	R2 + 2, F
	movf	R1 + 3, W
	addwfc	R2 + 3, F

	bcf	STATUS, C

divok	rlcf	R0, F
	rlcf	R0 + 1, F
	rlcf	R0 + 2, F
	rlcf	R0 + 3, F

	decfsz	R3, F
	bra	divloop

      ifdef DIVS_USED
	btfss	R3 + 3, 7	; Should result be negative?
	bra	divdone		; Not negative
	clrf	WREG		; Clear W for subtracts
	negf	R0		; Flip quotient to minus
	subfwb	R0 + 1, F
	subfwb	R0 + 2, F
	subfwb	R0 + 3, F
	negf	R2		; Flip remainder to minus
	subfwb	R2 + 1, F
	subfwb	R2 + 2, F
	subfwb	R2 + 3, F
divdone
    endif

	movf	R0, W		; Get low byte to W
	goto	DUNN
  NOLIST
DUNN_USED = 1
    endif
The divide loop itself takes 34 cycles per iteration for 32 iterations. Worst case, 1,088 cycles, going into the loop 25 cycles, coming out of the loop 14 cycles.
I think, and I haven't played with it yet, in the highlighted section above, a couple of checks (in italics) could be put in there to check if the dividend and/or divisor's upper bytes (or break it down to individual bits) are cleared. If they are, then preshift R0 and R1 and lop off 8, 16, or 24 (up to 30 if checking by bit) iterations of the loop itself.
The worst case (long dividend/divisor) would still take the full 1,127 cycles (plus some cycles for the checks), but the best case (byte dividend/divisor) could be knocked down from 713 cycles to about 24 cycles.
Same could be said for the add, subtract and multiply routines, although gains would be minimal for each, and the PIC has the built-in single cycle 8x8 hardware multiplier.