Optimizing DIV


Closed Thread
Results 1 to 40 of 42

Thread: Optimizing DIV

Hybrid View

  1. #1
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Did some thinking...
    This code tries to optimize the divide loops by preshifting the other way, to the right, getting rid of the trailing zero's (i.e. 16/8 is the same as 2/1, saves 3 times thru the #divloop).
    This works 99.99999% of the time, all the way up to 32 bits. The only thing I can get to fail is 0 / 0, zero divided by zero. Special case needs special case code.
    Problem is...I can't seem to see any improvements! The instruction cycle savings is there. I can count them by hand. On my setup, if I use PBP's divide, I get about 2600 divides per minute. If I use OPT 1,2 or 3 (both byte and bit together), I get roughly the same 2600 divides per minute. My counts aren't that accurate since I'm using the gLCD and a handheld stopwatch (I quit using the simulator completely).
    The code I'm using has some minimal LCD output in it, so it may be that most of the time is being used up by the display code and not a lot left over for the divides themselves.

    EDIT: The LCD routines are definitely killing the speed...Divides per minute are more around 280,000+...

    Code:
    ASM
    	ifdef DIVS_USED
      LIST
    #DIVS
    	clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	#divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    #divchkr1
    	btfss	R1 + 3, 7	; Check for R1 negative
    	bra	#divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	#divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
    	endif
    	ifdef DIV_USED
      LIST
    #DIV
    		ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator	
    		endif
    #divdo
    	clrf	R2		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    	movlw	32
    	movwf	R3
    ;added to speed up s-31 divide op's by ignoring zero'd bytes
    	        ifdef SKI_DIV_OPT
    	        	if ( SKI_DIV_OPT == 1 | SKI_DIV_OPT == 3 )
    SkiOpt
    	movf    R0, W      ; IF R0(0)= 0 
    	bnz     #divloop
    
    	movf    R1, W      ;   AND R1(0)= 0 then 
    	bnz     #divloop
    
    	movff   R0 + 1, R0 + 0 ;      and preshift R0
    	movff   R0 + 2, R0 + 1
    	movff   R0 + 3, R0 + 2
    	clrf    R0 + 3
    
    	movff   R1 + 1, R1 + 0 ;      and R1 over 8 bits
    	movff   R1 + 2, R1 + 1
    	movff   R1 + 3, R1 + 2
    	clrf    R1 + 3
    
    	movlw   8              ;      loops - 8
    	subwf   R3, F
    	btfss   STATUS, Z      ; stop if no loop's left (0/0)
    	bra     SkiOpt
    			endif
    	        endif
       
    ;added to speed up s-31 divides by skipping clr'd bits in divisor/dividend lsb
    		ifdef SKI_DIV_OPT
    			if ( SKI_DIV_OPT == 2 | SKI_DIV_OPT == 3 )
    SkiOpt2
    	btfsc	R0, 0	; if lowest bit set, goto divloop
    	bra	#divloop
    	btfsc	R1, 0	; if lowest bit set, goto divloop
    	bra	#divloop
    
    	bcf    	STATUS, C	;clr carry-shift over complete R0
    	rrcf	R0 + 3, F	;shift R0+3, .0 into carry
    	rrcf	R0 + 2, F	;shift R0+2
    	rrcf	R0 + 1, F	;shift R0+1
    	rrcf	R0 + 0, F	;shift R0+0
    
    	bcf	STATUS, C	;clr carry-shift over complete R1
    	rrcf	R1 + 3, F	;shift R1, .0 into carry
    	rrcf	R1 + 2, F	;shift R1+2
    	rrcf	R1 + 1, F	;shift R1+1
    	rrcf	R1 + 0, F	;shift R1+0
    
    	movlw	1		;subtract one from the loop count
    	subwf	R3, F
    
    	btfss	STATUS, Z	;stop if no more loops
    	bra	SkiOpt2
    			endif
    		endif
    
    ;above added to speed divide operations
    #divloop
    	rlcf	R0 + 3, W     ;NOTE 1
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    	bc	#divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    	bcf	STATUS, C
    #divok
    	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    	decfsz	R3, F
    	bra	#divloop
    
    		ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	#divdone	; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    #divdone
    		endif
    
    	movf	R0, W		; Get low byte to W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
    	endif
    ENDASM
    Any other swell ideas? I've got another idea, but not sure how to implement it...
    At the NOTE 1 above (about 2/3 down thru the code), pre-check both R0 and R1, find the highest set bit from both R0 and R1 (i.e. if R0=4 and R1=128, then the highest set bit of the two is R1.7). Shift THAT bit into the carry and make the loop count start from there (in this case the loop count would be 7). The MSB of R0 (i.e. R0.31) is used as the basis of whether or not to do the divloop in the first place. If that bit is clear, and the corresponding bit in R1 is clear, no need to do that loop. But shifting both R0 and R1 up one place and decrementing the loop count will mess up the subtraction process.
    Last edited by skimask; - 13th September 2008 at 05:03.

  2. #2
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by skimask View Post
    Did some thinking...
    Uht OH!

    > My counts aren't that accurate since I'm using the gLCD and a handheld stopwatch (I quit using the simulator completely).

    Uhhh, did you forget what thread this was in...
    You don't need a stopwatch. Try Post #2

    Added: Or SteveB's "Code Timer"
    http://www.picbasic.co.uk/forum/showthread.php?t=9350
    <br>
    Last edited by Darrel Taylor; - 13th September 2008 at 06:01. Reason: Code Timer
    DT

  3. #3
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    Uhhh, did you forget what thread this was in...
    You don't need a stopwatch. Try Post #2
    <br>
    I know, but that'll only measure one or maybe a handful of divides at a time, a single divide operation.
    What I was looking for was a multitude of divides, with a bunch of different numbers, large, small, etc., which is why I kept the inner/outer loops that you originally had, but added large steps (prime numbers of course ). So, I figured doing as many different divides with different numbers in a minute would be a better indicator as to whether or not the optimizations worked or not....overall....

  4. #4
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    That's all stuff that can be done later if things work out.

    I think right now, to prove that it does "optimize" the divide routine, you need a "Base Line".
    How long does it take PBPL to do a single signed 32-bit divide?
    And how long does it take yours?

    Just a few tests at various bit sizes will tell if there's any benefit or not.
    <br>
    DT

  5. #5
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    That's all stuff that can be done later if things work out.
    I think right now, to prove that it does "optimize" the divide routine, you need a "Base Line".
    How long does it take PBP to do a single signed 32-bit divide.
    And how long does it take yours.
    Just a few tests at various bit sizes will tell if there's any benefit or not.
    <br>
    That's the thing...It depends on the bits in the divisor and the dividend.
    I'm sure you know how binary division works, set a result bit, shift everything over one, subtract, if there's a carry, add it back in, reset the result bit, try again.
    Like I said, I used your little error-checker, and the only error I can get out of the last code posted was 0/0. Everything else checks out good (unless my implementation is wrong!).
    As it is, I changed up the code just a bit to only send out to the LCD once every 256 loops (if bl.byte0=0 then -do the lcd-). So, obviously, the LCD was taking up loads of cycles, which it does anyways.
    The optimize's as they stand right now (the right shift version) do speed things up a little bit, overall, similar to how a right shift saves time over a divide by 2. In some cases though, they use a few more cycles.
    I know the 'left shift loop reduction' method on R0 and R1 has decent grounding. Just can't figure out how to implement it without wrecking the results!
    It might be easier to just use 4 different versions, where the version chosen is based on the larger of R0 and R1 (32, 24, 16 and 8 bit versions). Will use a bit more memory though...

  6. #6
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    What, are you going to make me do it?

    I will have exact numbers, good or bad.
    Do you want to be the first to know ... (a.k.a. you do it)

    Or do you give me a free Jab? Without a glove.
    <br>
    DT

  7. #7
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Results are in ....

    I'm taping up my hand.

    Anything you want to change?
    <br>
    DT

Similar Threads

  1. Optimizing LCD commands?
    By jblackann in forum mel PIC BASIC Pro
    Replies: 1
    Last Post: - 4th December 2007, 16:30

Members who have read this thread : 0

You do not have permission to view the list of names.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts