Code I'm using now. Using MPLAB simulator to get accurate cycle counts.
Cycle counts for various loop and step sizesCode:resetplaceholder: '18f4685 code DEFINE OSC 40 '40 Mhz clock for proto work DEFINE NO_CLRWDT 1 'no extra clear watchdog timer instructions DISABLE DEFINE SKI_DIV_OPT 3 AL VAR LONG : BL VAR LONG : SkiQ VAR LONG : SkiR VAR LONG PBPQ VAR LONG : PBPR VAR LONG : AW VAR WORD : BW VAR WORD : RW VAR WORD ERROR VAR BIT : errorcount var long : maxnum var long : s1 var long pbpq = al / bl 'need to use at least one PBP divide to kick in DIVS for now al3 var al.byte3 : al2 var al.byte2 : al1 var al.byte1 : al0 var al.byte0 bl3 var bl.byte3 : bl2 var bl.byte2 : bl1 var bl.byte1 : bl0 var bl.byte0 'aliased so I can find it easily in MPSIM testcode: @ nop ;stp = easy to search and add a breakpoint in SIM maxnum = 9 : s1 = 1 : gosub dodivide maxnum = 99 : s1 = 1 : gosub dodivide maxnum = 999 : s1 = 19 : gosub dodivide maxnum = 9999 : s1 = 193 : gosub dodivide maxnum = 99999 : s1 = 1193 : gosub dodivide maxnum = 999999 : s1 = 31279 : gosub dodivide maxnum = 9999999 : s1 = 112791 : gosub dodivide maxnum = 99999999 : s1 = 327913 : gosub dodivide maxnum = 999999999 : s1 = 1279137 : gosub dodivide @ nop ;stp END stop dodivide: For AL = 0 to maxnum step s1 for BL = 0 to maxnum step s1 PBPQ = AL / BL next BL next AL @ nop ;stp For AL = 0 to maxnum step s1 for BL = 0 to maxnum step s1 @ MOVE?NN _AL, R0 @ MOVE?NN _BL, R1 ; AL / BL @ L?CALL #DIVS @ MOVE?ANN R0, _SkiQ @ MOVE?NN R2, _SkiR next BL next AL @ nop ;stp return ASM ifdef DIVS_USED LIST #DIVS clrf R3 + 3 ; Clear sign difference indicator btfss R0 + 3, 7 ; Check for R0 negative bra #divchkr1 ; Not negative btg R3 + 3, 7 ; Flip sign indicator clrf WREG ; Clear W for subtracts negf R0 ; Flip value to plus subfwb R0 + 1, F subfwb R0 + 2, F subfwb R0 + 3, F #divchkr1 btfss R1 + 3, 7 ; Check for R1 negative bra #divdo ; Not negative btg R3 + 3, 7 ; Flip sign indicator clrf WREG ; Clear W for subtracts negf R1 ; Flip value to plus subfwb R1 + 1, F subfwb R1 + 2, F subfwb R1 + 3, F bra #divdo ; Skip unsigned entry NOLIST DIV_USED = 1 endif ifdef DIV_USED LIST #DIV ifdef DIVS_USED clrf R3 + 3 ; Clear sign difference indicator endif #divdo clrf R2 ; Do the divide clrf R2 + 1 clrf R2 + 2 clrf R2 + 3 movlw 32 movwf R3 ;added to speed up s-31 divides by using byte and bit shifting, ;added checking for 32, 24, 16 and 8 bit divides ;and using those routines if R0 and R1 are small enough ifdef SKI_DIV_OPT SkiOpt3 ;shift down bytes if low bytes are 0'd movf R0, W ; IF R0(0)= 0 bnz SkiOpt4 movf R1, W ; AND R1(0)= 0 then bnz SkiOpt4 movff R0 + 1, R0 + 0 ; and preshift R0 movff R0 + 2, R0 + 1 movff R0 + 3, R0 + 2 clrf R0 + 3 movff R1 + 1, R1 + 0 ; and R1 over 8 bits movff R1 + 2, R1 + 1 movff R1 + 3, R1 + 2 clrf R1 + 3 movlw 8 ; loops - 8 subwf R3, F btfss STATUS, Z ; stop if no loop's left (0/0) bra SkiOpt3 bra #divdone SkiOpt4 ;shift down bytes if low bits are 0'd btfsc R0, 0 ; if lowest bit set, goto divloop bra skiopt5 btfsc R1, 0 ; if lowest bit set, goto divloop bra skiopt5 bcf STATUS, C ;clr carry-shift over complete R0 rrcf R0 + 3, F ;shift R0+3, .0 into carry rrcf R0 + 2, F ;shift R0+2 rrcf R0 + 1, F ;shift R0+1 rrcf R0 + 0, F ;shift R0+0 bcf STATUS, C ;clr carry-shift over complete R1 rrcf R1 + 3, F ;shift R1, .0 into carry rrcf R1 + 2, F ;shift R1+2 rrcf R1 + 1, F ;shift R1+1 rrcf R1 + 0, F ;shift R1+0 movlw 1 ;subtract one from the loop count subwf R3, F btfss STATUS, Z ;stop if no more loops bra SkiOpt4 bra #divdone skiopt5 ;check if can use different divide methods (32, 24, 16, 8) movlw 32 ;load loop count movwf R3 movf R0 + 3, W bnz #divloop ;use normal div movf R1 + 3, W bnz #divloop ;use normal div movlw 24 ;load loop count movwf R3 movf R0 + 2, W bnz #divloop24 ;jump out to 24 bit movf R1 + 2, W bnz #divloop24 ;jump out to 24 bit movlw 16 ;load loop count movwf R3 movf R0 + 1, W bnz #divloop16 ;jump out to 16 bit movf R1 + 1, W bnz #divloop16 ;jump out to 16 bit movlw 8 ;load loop count movwf R3 bra #divloop8 ;fall thru to 8 bit endif ;above added to speed divide operations #divloop ;32 bit rlcf R0 + 3, W rlcf R2, F rlcf R2 + 1, F rlcf R2 + 2, F rlcf R2 + 3, F movf R1, W subwf R2, F movf R1 + 1, W subwfb R2 + 1, F movf R1 + 2, W subwfb R2 + 2, F movf R1 + 3, W subwfb R2 + 3, F bc #divok movf R1, W addwf R2, F movf R1 + 1, W addwfc R2 + 1, F movf R1 + 2, W addwfc R2 + 2, F movf R1 + 3, W addwfc R2 + 3, F bcf STATUS, C #divok rlcf R0, F rlcf R0 + 1, F rlcf R0 + 2, F rlcf R0 + 3, F decfsz R3, F bra #divloop ifdef DIVS_USED btfss R3 + 3, 7 ; Should result be negative? bra #divdone ; Not negative clrf WREG ; Clear W for subtracts negf R0 ; Flip quotient to minus subfwb R0 + 1, F subfwb R0 + 2, F subfwb R0 + 3, F negf R2 ; Flip remainder to minus subfwb R2 + 1, F subfwb R2 + 2, F subfwb R2 + 3, F bra #divdone endif #divloop24 ;24 bit rlcf R0 + 2, W rlcf R2, F rlcf R2 + 1, F rlcf R2 + 2, F movf R1, W subwf R2, F movf R1 + 1, W subwfb R2 + 1, F movf R1 + 2, W subwfb R2 + 2, F bc #divok24 movf R1, W addwf R2, F movf R1 + 1, W addwfc R2 + 1, F movf R1 + 2, W addwfc R2 + 2, F bcf STATUS, C #divok24 rlcf R0, F rlcf R0 + 1, F rlcf R0 + 2, F decfsz R3, F bra #divloop24 ifdef DIVS_USED btfss R3 + 3, 7 ; Should result be negative? bra #divdone ; Not negative clrf WREG ; Clear W for subtracts negf R0 ; Flip quotient to minus subfwb R0 + 1, F subfwb R0 + 2, F subfwb R0 + 3, F negf R2 ; Flip remainder to minus subfwb R2 + 1, F subfwb R2 + 2, F subfwb R2 + 3, F bra #divdone endif #divloop16 ;16 bit rlcf R0 + 1, W rlcf R2, F rlcf R2 + 1, F movf R1, W subwf R2, F movf R1 + 1, W subwfb R2 + 1, F bc #divok16 movf R1, W addwf R2, F movf R1 + 1, W addwfc R2 + 1, F bcf STATUS, C #divok16 rlcf R0, F rlcf R0 + 1, F decfsz R3, F bra #divloop16 ifdef DIVS_USED btfss R3 + 3, 7 ; Should result be negative? bra #divdone ; Not negative clrf WREG ; Clear W for subtracts negf R0 ; Flip quotient to minus subfwb R0 + 1, F subfwb R0 + 2, F subfwb R0 + 3, F negf R2 ; Flip remainder to minus subfwb R2 + 1, F subfwb R2 + 2, F subfwb R2 + 3, F bra #divdone endif #divloop8 ;8 bit rlcf R0, W rlcf R2, F movf R1, W subwf R2, F bc #divok8 movf R1, W addwf R2, F bcf STATUS, C #divok8 rlcf R0, F decfsz R3, F bra #divloop8 ifdef DIVS_USED btfss R3 + 3, 7 ; Should result be negative? bra #divdone ; Not negative clrf WREG ; Clear W for subtracts negf R0 ; Flip quotient to minus subfwb R0 + 1, F subfwb R0 + 2, F subfwb R0 + 3, F negf R2 ; Flip remainder to minus subfwb R2 + 1, F subfwb R2 + 2, F subfwb R2 + 3, F endif #divdone movf R0, W ; Get low byte to W goto DUNN NOLIST DUNN_USED = 1 endif ENDASM END
Obviously, the biggest speed increase comes from using the various divide routines (32, 24, 16, 8), and once you get into the really big numbers, you lose cycles.Code:loop count s1=step size 'pbp clock count ski clock count increase 9 : s1 = 1 '105,278 26,124 4.033 99 : s1 = 1 '10,632,832 2,507,692 4.240 999 : s1 = 19 '2,981,601 1,136,305 2.624 9999 : s1 = 193 '2,869,397 1,174,807 2.442 99999 : s1 = 1193 '7,479,745 3,913,171 1.911 999999 : s1 = 31279 '1,081,858 722,091 1.498 9999999 : s1 = 112791 '8,388,778 5,638,902 1.462 99999999 : s1 = 327913 '98,376,761 98,167,489 1.002 999999999 : s1 = 1279137 '646,580,978 663,354,535 0.973
In a bit, I'm going to load this into my protoboard and see what happens (PBP div vs. Ski_Opt divides) as far as accuracy goes.
I still think shifting left and cutting the loop count ahead of time has merit. As I said, I know I did it on that 6809E back in the day. I just can't remember how...
EDIT: Added the error checking code back in and put a WATCH on the error counter variable in MPSIM.
So far, the only error I've gotten is on 0/0. Ski_Opt comes back with both zero's. PBP comes back with a max'd out quotient (2^32 -1), 0 in the remainder.
Slow going in the sim with a step of 1 running from 0 to (2^31 -1)....
Just calculated out the situation as described in the line above...
Running on the SIM on my laptop (Dell Insp. 8200 P4 @ 1.7Ghz), it could take about 571,232,829 years, 5 months, 13 days to complete the loop as written!!!
Ya...not so much!!!
Last edited by skimask; - 14th September 2008 at 03:49.
Well, after letting it run a couple of days into the higher numbers, I've found more errors, random bit/byte errors (at least I can't find a pattern to them). Either my PIC is screwing up the data, or my code with ALL of the optimizations is jacked up somewhere. So, after screwing around with it for a couple of days, I give up!
One thing that does seem to work correctly though is the '4 different divides' version of the code, the normal s-31 divide, and 3 other divides for unsigned 24, 16, and 8 bit divides. When I get that code cleaned up and commented, it'll be here. A fair amount of speed increase for math intensive code, as DT said, when the numbers are small. I know in a couple of my programs, the speed increase will be a welcome enhancement.
Latest and greatest (so far)...
Seems to work great for me...will probably fail miserably for others!
Haven't packaged it up real nice yet, some comments could stand to be added yet, some removed...
8 bit only divides run about 4.2 times faster
16 bit only divides run about 2.4 times faster
16/24 bit mixed divides run about 1.5 times faster
24/32 bit mixed divides actually run about 2% slower
But the cycle counts, and therefore the actual figures aren't 100% accurate because they include the cycles used by the For/Next loops.
Code:resetplaceholder: '18f4685 code DEFINE OSC 40 '40 Mhz clock for proto work DEFINE NO_CLRWDT 1 'no extra clear watchdog timer instructions DISABLE DEFINE SKI_DIV_OPT 1 AL VAR LONG : BL VAR LONG : SkiQ VAR LONG : SkiR VAR LONG : PBPQ VAR LONG PBPR VAR LONG : AW VAR WORD : BW VAR WORD : RW VAR WORD : ERROR VAR BIT errorcount var long : indicator var word : loopcount var long minnum var long : maxnum var long : s1 var long pbpq = al / bl 'need to use at least one PBP divide to kick in DIVS for now 'indicator commented out in ASM code during -speed- runs testcode: minnum = 1000 : maxnum = 1255 : s1 = 1 : gosub dodivide gosub dodivide : gosub dodivide : gosub dodivide gosub dodivide : gosub dodivide : gosub dodivide @ nop ;stopper - easy to find point to set a breakpoint in MPSIM END stop dodivide: For AL = minnum to maxnum step s1 For BL = minnum to maxnum step s1 ' indicator.lowbyte = $f0 'watch in MPSIM to show what's going on ' loopcount = loopcount + 1 'watch in MPSIM to show what's going on @ MOVE?NN _AL, R0 ;use only during OPT run @ MOVE?NN _BL, R1 ; AL / BL ;use only during OPT run @ L?CALL #DIVS ;use only during OPT run @ MOVE?NN R0, _SkiQ ;use only during OPT run @ MOVE?NN R2, _SkiR ;use only during OPT run ' indicator.lowbyte = $0f 'watch in MPSIM to show what's going on ' PBPQ = AL / BL ;use only during PBP run ' PBPR = AL // BL ;use only when doing -both- runs to check for errors ' indicator.lowbyte = 0 'watch in MPSIM to show what's going on ' if ( pbpq <> skiq ) or ( pbpr <> skir ) then errorcount = errorcount + 1 'watch in MPSIM to show what's going on next BL next AL return end ASM ifdef DIVS_USED LIST #DIVS clrf R3 + 1 ;clear shiftout counter clrf R3 + 3 ; Clear sign difference indicator btfss R0 + 3, 7 ; Check for R0 negative bra #divchkr1 ; Not negative btg R3 + 3, 7 ; Flip sign indicator clrf WREG ; Clear W for subtracts negf R0 + 0 ; Flip value to plus subfwb R0 + 1, F subfwb R0 + 2, F subfwb R0 + 3, F #divchkr1 btfss R1 + 3, 7 ; Check for R1 negative bra #divdo ; Not negative btg R3 + 3, 7 ; Flip sign indicator clrf WREG ; Clear W for subtracts negf R1 + 0 ; Flip value to plus subfwb R1 + 1, F subfwb R1 + 2, F subfwb R1 + 3, F bra #divdo ; Skip unsigned entry NOLIST DIV_USED = 1 endif ifdef DIV_USED LIST #DIV ifdef DIVS_USED clrf R3 + 3 ; Clear sign difference indicator endif #divdo clrf R2 + 0 ; Do the divide clrf R2 + 1 clrf R2 + 2 clrf R2 + 3 movlw 32 movwf R3 + 0 ifdef SKI_DIV_OPT ;check for zero case and send directly to divloop if they are zero movf R0 + 0, W bnz #divzero1 movf R0 + 1, W bnz #divzero1 movf R0 + 2, W bnz #divzero1 movf R0 + 3, W bnz #divzero1 bra #divloopa ;if R0 is zero, do #divloop #divzero1 movf R1 + 0, W bnz SkiOpt5 movf R1 + 1, W bnz SkiOpt5 movf R1 + 2, W bnz SkiOpt5 movf R1 + 3, W bnz SkiOpt5 ;if R1 is not zero, continue OPT bra #divloopa ;if R1 is zero, do divloop SkiOpt5 ;check if can use different divide methods (32, 24, 16, 8) movlw 32 ;load loop count movwf R3 + 0 movf R0 + 3, W bnz #divloopa ;use normal div movf R1 + 3, W bnz #divloopa ;use normal div movlw 24 ;load loop count movwf R3 + 0 movf R0 + 2, W bnz #divloop24a ;jump out to 24 bit movf R1 + 2, W bnz #divloop24a ;jump out to 24 bit movlw 16 ;load loop count movwf R3 + 0 movf R0 + 1, W bnz #divloop16a ;jump out to 16 bit movf R1 + 1, W bnz #divloop16a ;jump out to 16 bit movlw 8 ;load loop count movwf R3 + 0 movf R0 + 0, W bnz #divloop8a ;jump out to 8 bit movf R1 + 0, W bnz #divloop8a movlw 32 ;reload loop count with max count movwf R3 + 0 endif ;above added to speed divide operations #divloopa ;32 bit divide ; movlw 50 ; movwf _indicator + 1 #divloop rlcf R0 + 3, W rlcf R2, F rlcf R2 + 1, F rlcf R2 + 2, F rlcf R2 + 3, F movf R1, W subwf R2, F movf R1 + 1, W subwfb R2 + 1, F movf R1 + 2, W subwfb R2 + 2, F movf R1 + 3, W subwfb R2 + 3, F bc #divok movf R1, W addwf R2, F movf R1 + 1, W addwfc R2 + 1, F movf R1 + 2, W addwfc R2 + 2, F movf R1 + 3, W addwfc R2 + 3, F bcf STATUS, 0 #divok rlcf R0, F rlcf R0 + 1, F rlcf R0 + 2, F rlcf R0 + 3, F decfsz R3, F bra #divloop ifdef DIVS_USED bra #divnegchk ; Check for negative result endif ifdef SKI_DIV_OPT #divloop24a ;24 bit divide ; movlw 36 ; movwf _indicator + 1 #divloop24 rlcf R0 + 2, W rlcf R2, F rlcf R2 + 1, F rlcf R2 + 2, F movf R1, W subwf R2, F movf R1 + 1, W subwfb R2 + 1, F movf R1 + 2, W subwfb R2 + 2, F bc #divok24 movf R1, W addwf R2, F movf R1 + 1, W addwfc R2 + 1, F movf R1 + 2, W addwfc R2 + 2, F bcf STATUS, 0 #divok24 rlcf R0, F rlcf R0 + 1, F rlcf R0 + 2, F decfsz R3, F bra #divloop24 ifdef DIVS_USED bra #divnegchk ; Check for negative result endif #divloop16a ;16 bit divide ; movlw 22 ; movwf _indicator + 1 #divloop16 rlcf R0 + 1, W rlcf R2, F rlcf R2 + 1, F movf R1, W subwf R2, F movf R1 + 1, W subwfb R2 + 1, F bc #divok16 movf R1, W addwf R2, F movf R1 + 1, W addwfc R2 + 1, F bcf STATUS, 0 #divok16 rlcf R0, F rlcf R0 + 1, F decfsz R3, F bra #divloop16 ifdef DIVS_USED bra #divnegchk ; Check for negative result endif #divloop8a ;8 bit divide ; movlw 8 ; movwf _indicator + 1 #divloop8 rlcf R0, W rlcf R2, F movf R1, W subwf R2, F bc #divok8 movf R1, W addwf R2, F bcf STATUS, 0 #divok8 rlcf R0, F decfsz R3, F bra #divloop8 endif ifdef DIVS_USED #divnegchk btfss R3 + 3, 7 ; Should result be negative? bra #divdone ; Not negative clrf WREG ; Clear W for subtracts negf R0 ; Flip quotient to minus subfwb R0 + 1, F subfwb R0 + 2, F subfwb R0 + 3, F negf R2 ; Flip remainder to minus subfwb R2 + 1, F subfwb R2 + 2, F subfwb R2 + 3, F endif #divdone movf R0 + 0,W goto DUNN NOLIST DUNN_USED = 1 endif ENDASM END
This may be of interest, 32x16 divide in 395 cycles:
http://cablemodem.fibertel.com.ar/at...C%20micros.htm
Aye lad...but alas, those routines are unsigned!
This was all mainly an exercise on my part to see how stuff worked...turned out to do other stuff.
Mainly, the routine above was meant to replace the 32/32 signed divide routine in the pbppi18l.lib library. The regular PBP library treats all math as though it were 32 bit, even if the numbers are smaller. You don't gain much by optimizing adds, subtracts, and multiplies, but divides stand to gain quite a bit if the routines can differentiate between different size numbers and just to different, faster routines accordingly. The library adds about 100 bytes to the routine, but it's only 100 bytes once, not for every divide, and it could possibly shave a load of cycles off the routine.
It was something I stumbled upon while browsing the MC forums. My signed 32x16 divide executes in 19 instruction cycles (what we talked about), hehe.
Ski, BTW, are you located across the pond
Bookmarks