Now I get nothing with SKI_DIV_OPT 1.
SKI_DIV_OPT 2 still looks the same.
Are you still using the Simulator?
<br>
Now I get nothing with SKI_DIV_OPT 1.
SKI_DIV_OPT 2 still looks the same.
Are you still using the Simulator?
<br>
DT
Yes, yes, yes... I know...Don't use the simulator.
I'll get on it good this weekend with hardware (probably just temporarily reprogram my OBD reader for grins)...
Ok, So, Opt 1 - nothing - What mean you by 'nothing'? Zero's all the way around?
Opt 2 - gets roughly the same garbage as before?
No serial output at all. It's stuck in the optimize section, getting dizzy doing loops.
May have something to do with subtracting 8 loops from what's now 25 (was 32), but I'm not sure.
Same stuff. Quotient is 0, Remainder has the full A value.Opt 2 - gets roughly the same garbage as before?
<br>
DT
Jeeze...up in SkiShift3...
Not going to get much of a carry from that am I?Code:movlw 8 subwf R3, F movf R3, W btfss STATUS, C ;if it's 1 (actually 1-8) bra Ski_Shift3 ;jump out
The subwf should set STATUS as appropriate, should be able to remove movf R3, W above the branch.
I'm still looking thru my code in MCS...
Might not have to worry about the most-sig-bit in R0/R1 since it's preset to 0 by the code at the beginning, therefore, that'll negate checking bit 30 instead of bit 31 of R0/R1.
Last edited by skimask; - 11th September 2008 at 23:18.
Did some pencil/paper work on the s31 divide operations at the bit level...
Trying to optimize at the bit level is fruitless. Preshifting bits accomplishes the same thing that #divloop does except the fact that if the subtraction fails, #divloop restores the working registers (Rx). A few cycles may be wasted there with the restoration of the R(x) register, but those same cycles that may have been saved there, would have been used in the preshifting anyway.
Optimizing at the byte level should still show a fair amount of cycle savings...
More pencil/paper work...
Did some thinking...
This code tries to optimize the divide loops by preshifting the other way, to the right, getting rid of the trailing zero's (i.e. 16/8 is the same as 2/1, saves 3 times thru the #divloop).
This works 99.99999% of the time, all the way up to 32 bits. The only thing I can get to fail is 0 / 0, zero divided by zero. Special case needs special case code.
Problem is...I can't seem to see any improvements! The instruction cycle savings is there. I can count them by hand. On my setup, if I use PBP's divide, I get about 2600 divides per minute. If I use OPT 1,2 or 3 (both byte and bit together), I get roughly the same 2600 divides per minute. My counts aren't that accurate since I'm using the gLCD and a handheld stopwatch (I quit using the simulator completely).
The code I'm using has some minimal LCD output in it, so it may be that most of the time is being used up by the display code and not a lot left over for the divides themselves.
EDIT: The LCD routines are definitely killing the speed...Divides per minute are more around 280,000+...
Any other swell ideas? I've got another idea, but not sure how to implement it...Code:ASM ifdef DIVS_USED LIST #DIVS clrf R3 + 3 ; Clear sign difference indicator btfss R0 + 3, 7 ; Check for R0 negative bra #divchkr1 ; Not negative btg R3 + 3, 7 ; Flip sign indicator clrf WREG ; Clear W for subtracts negf R0 ; Flip value to plus subfwb R0 + 1, F subfwb R0 + 2, F subfwb R0 + 3, F #divchkr1 btfss R1 + 3, 7 ; Check for R1 negative bra #divdo ; Not negative btg R3 + 3, 7 ; Flip sign indicator clrf WREG ; Clear W for subtracts negf R1 ; Flip value to plus subfwb R1 + 1, F subfwb R1 + 2, F subfwb R1 + 3, F bra #divdo ; Skip unsigned entry NOLIST DIV_USED = 1 endif ifdef DIV_USED LIST #DIV ifdef DIVS_USED clrf R3 + 3 ; Clear sign difference indicator endif #divdo clrf R2 ; Do the divide clrf R2 + 1 clrf R2 + 2 clrf R2 + 3 movlw 32 movwf R3 ;added to speed up s-31 divide op's by ignoring zero'd bytes ifdef SKI_DIV_OPT if ( SKI_DIV_OPT == 1 | SKI_DIV_OPT == 3 ) SkiOpt movf R0, W ; IF R0(0)= 0 bnz #divloop movf R1, W ; AND R1(0)= 0 then bnz #divloop movff R0 + 1, R0 + 0 ; and preshift R0 movff R0 + 2, R0 + 1 movff R0 + 3, R0 + 2 clrf R0 + 3 movff R1 + 1, R1 + 0 ; and R1 over 8 bits movff R1 + 2, R1 + 1 movff R1 + 3, R1 + 2 clrf R1 + 3 movlw 8 ; loops - 8 subwf R3, F btfss STATUS, Z ; stop if no loop's left (0/0) bra SkiOpt endif endif ;added to speed up s-31 divides by skipping clr'd bits in divisor/dividend lsb ifdef SKI_DIV_OPT if ( SKI_DIV_OPT == 2 | SKI_DIV_OPT == 3 ) SkiOpt2 btfsc R0, 0 ; if lowest bit set, goto divloop bra #divloop btfsc R1, 0 ; if lowest bit set, goto divloop bra #divloop bcf STATUS, C ;clr carry-shift over complete R0 rrcf R0 + 3, F ;shift R0+3, .0 into carry rrcf R0 + 2, F ;shift R0+2 rrcf R0 + 1, F ;shift R0+1 rrcf R0 + 0, F ;shift R0+0 bcf STATUS, C ;clr carry-shift over complete R1 rrcf R1 + 3, F ;shift R1, .0 into carry rrcf R1 + 2, F ;shift R1+2 rrcf R1 + 1, F ;shift R1+1 rrcf R1 + 0, F ;shift R1+0 movlw 1 ;subtract one from the loop count subwf R3, F btfss STATUS, Z ;stop if no more loops bra SkiOpt2 endif endif ;above added to speed divide operations #divloop rlcf R0 + 3, W ;NOTE 1 rlcf R2, F rlcf R2 + 1, F rlcf R2 + 2, F rlcf R2 + 3, F movf R1, W subwf R2, F movf R1 + 1, W subwfb R2 + 1, F movf R1 + 2, W subwfb R2 + 2, F movf R1 + 3, W subwfb R2 + 3, F bc #divok movf R1, W addwf R2, F movf R1 + 1, W addwfc R2 + 1, F movf R1 + 2, W addwfc R2 + 2, F movf R1 + 3, W addwfc R2 + 3, F bcf STATUS, C #divok rlcf R0, F rlcf R0 + 1, F rlcf R0 + 2, F rlcf R0 + 3, F decfsz R3, F bra #divloop ifdef DIVS_USED btfss R3 + 3, 7 ; Should result be negative? bra #divdone ; Not negative clrf WREG ; Clear W for subtracts negf R0 ; Flip quotient to minus subfwb R0 + 1, F subfwb R0 + 2, F subfwb R0 + 3, F negf R2 ; Flip remainder to minus subfwb R2 + 1, F subfwb R2 + 2, F subfwb R2 + 3, F #divdone endif movf R0, W ; Get low byte to W goto DUNN NOLIST DUNN_USED = 1 endif ENDASM
At the NOTE 1 above (about 2/3 down thru the code), pre-check both R0 and R1, find the highest set bit from both R0 and R1 (i.e. if R0=4 and R1=128, then the highest set bit of the two is R1.7). Shift THAT bit into the carry and make the loop count start from there (in this case the loop count would be 7). The MSB of R0 (i.e. R0.31) is used as the basis of whether or not to do the divloop in the first place. If that bit is clear, and the corresponding bit in R1 is clear, no need to do that loop. But shifting both R0 and R1 up one place and decrementing the loop count will mess up the subtraction process.
Last edited by skimask; - 13th September 2008 at 05:03.
Uht OH!
> My counts aren't that accurate since I'm using the gLCD and a handheld stopwatch (I quit using the simulator completely).
Uhhh, did you forget what thread this was in...
You don't need a stopwatch. Try Post #2
Added: Or SteveB's "Code Timer"
http://www.picbasic.co.uk/forum/showthread.php?t=9350
<br>
Last edited by Darrel Taylor; - 13th September 2008 at 06:01. Reason: Code Timer
DT
Bookmarks