Optimizing DIV


Closed Thread
Results 1 to 40 of 42

Thread: Optimizing DIV

Hybrid View

  1. #1
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Now I get nothing with SKI_DIV_OPT 1.

    SKI_DIV_OPT 2 still looks the same.

    Are you still using the Simulator?
    <br>
    DT

  2. #2
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    Now I get nothing with SKI_DIV_OPT 1.

    SKI_DIV_OPT 2 still looks the same.

    Are you still using the Simulator?
    <br>
    Yes, yes, yes... I know...Don't use the simulator.
    I'll get on it good this weekend with hardware (probably just temporarily reprogram my OBD reader for grins)...

    Ok, So, Opt 1 - nothing - What mean you by 'nothing'? Zero's all the way around?
    Opt 2 - gets roughly the same garbage as before?

  3. #3
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by skimask View Post
    Ok, So, Opt 1 - nothing - What mean you by 'nothing'? Zero's all the way around?
    No serial output at all. It's stuck in the optimize section, getting dizzy doing loops.
    May have something to do with subtracting 8 loops from what's now 25 (was 32), but I'm not sure.

    Opt 2 - gets roughly the same garbage as before?
    Same stuff. Quotient is 0, Remainder has the full A value.
    <br>
    DT

  4. #4
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Jeeze...up in SkiShift3...
    Code:
    	movlw	8
    	subwf	R3, F
    	
    	movf	R3, W
    	btfss	STATUS, C	;if it's 1 (actually 1-8)
    	bra	Ski_Shift3	;jump out
    Not going to get much of a carry from that am I?
    The subwf should set STATUS as appropriate, should be able to remove movf R3, W above the branch.
    I'm still looking thru my code in MCS...
    Might not have to worry about the most-sig-bit in R0/R1 since it's preset to 0 by the code at the beginning, therefore, that'll negate checking bit 30 instead of bit 31 of R0/R1.
    Last edited by skimask; - 11th September 2008 at 23:18.

  5. #5
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Did some pencil/paper work on the s31 divide operations at the bit level...
    Trying to optimize at the bit level is fruitless. Preshifting bits accomplishes the same thing that #divloop does except the fact that if the subtraction fails, #divloop restores the working registers (Rx). A few cycles may be wasted there with the restoration of the R(x) register, but those same cycles that may have been saved there, would have been used in the preshifting anyway.
    Optimizing at the byte level should still show a fair amount of cycle savings...
    More pencil/paper work...

  6. #6
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Did some thinking...
    This code tries to optimize the divide loops by preshifting the other way, to the right, getting rid of the trailing zero's (i.e. 16/8 is the same as 2/1, saves 3 times thru the #divloop).
    This works 99.99999% of the time, all the way up to 32 bits. The only thing I can get to fail is 0 / 0, zero divided by zero. Special case needs special case code.
    Problem is...I can't seem to see any improvements! The instruction cycle savings is there. I can count them by hand. On my setup, if I use PBP's divide, I get about 2600 divides per minute. If I use OPT 1,2 or 3 (both byte and bit together), I get roughly the same 2600 divides per minute. My counts aren't that accurate since I'm using the gLCD and a handheld stopwatch (I quit using the simulator completely).
    The code I'm using has some minimal LCD output in it, so it may be that most of the time is being used up by the display code and not a lot left over for the divides themselves.

    EDIT: The LCD routines are definitely killing the speed...Divides per minute are more around 280,000+...

    Code:
    ASM
    	ifdef DIVS_USED
      LIST
    #DIVS
    	clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	#divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    #divchkr1
    	btfss	R1 + 3, 7	; Check for R1 negative
    	bra	#divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	#divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
    	endif
    	ifdef DIV_USED
      LIST
    #DIV
    		ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator	
    		endif
    #divdo
    	clrf	R2		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    	movlw	32
    	movwf	R3
    ;added to speed up s-31 divide op's by ignoring zero'd bytes
    	        ifdef SKI_DIV_OPT
    	        	if ( SKI_DIV_OPT == 1 | SKI_DIV_OPT == 3 )
    SkiOpt
    	movf    R0, W      ; IF R0(0)= 0 
    	bnz     #divloop
    
    	movf    R1, W      ;   AND R1(0)= 0 then 
    	bnz     #divloop
    
    	movff   R0 + 1, R0 + 0 ;      and preshift R0
    	movff   R0 + 2, R0 + 1
    	movff   R0 + 3, R0 + 2
    	clrf    R0 + 3
    
    	movff   R1 + 1, R1 + 0 ;      and R1 over 8 bits
    	movff   R1 + 2, R1 + 1
    	movff   R1 + 3, R1 + 2
    	clrf    R1 + 3
    
    	movlw   8              ;      loops - 8
    	subwf   R3, F
    	btfss   STATUS, Z      ; stop if no loop's left (0/0)
    	bra     SkiOpt
    			endif
    	        endif
       
    ;added to speed up s-31 divides by skipping clr'd bits in divisor/dividend lsb
    		ifdef SKI_DIV_OPT
    			if ( SKI_DIV_OPT == 2 | SKI_DIV_OPT == 3 )
    SkiOpt2
    	btfsc	R0, 0	; if lowest bit set, goto divloop
    	bra	#divloop
    	btfsc	R1, 0	; if lowest bit set, goto divloop
    	bra	#divloop
    
    	bcf    	STATUS, C	;clr carry-shift over complete R0
    	rrcf	R0 + 3, F	;shift R0+3, .0 into carry
    	rrcf	R0 + 2, F	;shift R0+2
    	rrcf	R0 + 1, F	;shift R0+1
    	rrcf	R0 + 0, F	;shift R0+0
    
    	bcf	STATUS, C	;clr carry-shift over complete R1
    	rrcf	R1 + 3, F	;shift R1, .0 into carry
    	rrcf	R1 + 2, F	;shift R1+2
    	rrcf	R1 + 1, F	;shift R1+1
    	rrcf	R1 + 0, F	;shift R1+0
    
    	movlw	1		;subtract one from the loop count
    	subwf	R3, F
    
    	btfss	STATUS, Z	;stop if no more loops
    	bra	SkiOpt2
    			endif
    		endif
    
    ;above added to speed divide operations
    #divloop
    	rlcf	R0 + 3, W     ;NOTE 1
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    	bc	#divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    	bcf	STATUS, C
    #divok
    	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    	decfsz	R3, F
    	bra	#divloop
    
    		ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	#divdone	; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    #divdone
    		endif
    
    	movf	R0, W		; Get low byte to W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
    	endif
    ENDASM
    Any other swell ideas? I've got another idea, but not sure how to implement it...
    At the NOTE 1 above (about 2/3 down thru the code), pre-check both R0 and R1, find the highest set bit from both R0 and R1 (i.e. if R0=4 and R1=128, then the highest set bit of the two is R1.7). Shift THAT bit into the carry and make the loop count start from there (in this case the loop count would be 7). The MSB of R0 (i.e. R0.31) is used as the basis of whether or not to do the divloop in the first place. If that bit is clear, and the corresponding bit in R1 is clear, no need to do that loop. But shifting both R0 and R1 up one place and decrementing the loop count will mess up the subtraction process.
    Last edited by skimask; - 13th September 2008 at 05:03.

  7. #7
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by skimask View Post
    Did some thinking...
    Uht OH!

    > My counts aren't that accurate since I'm using the gLCD and a handheld stopwatch (I quit using the simulator completely).

    Uhhh, did you forget what thread this was in...
    You don't need a stopwatch. Try Post #2

    Added: Or SteveB's "Code Timer"
    http://www.picbasic.co.uk/forum/showthread.php?t=9350
    <br>
    Last edited by Darrel Taylor; - 13th September 2008 at 06:01. Reason: Code Timer
    DT

Similar Threads

  1. Optimizing LCD commands?
    By jblackann in forum mel PIC BASIC Pro
    Replies: 1
    Last Post: - 4th December 2007, 16:30

Members who have read this thread : 0

You do not have permission to view the list of names.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts