Optimizing DIV


Closed Thread
Page 1 of 2 12 LastLast
Results 1 to 40 of 42

Thread: Optimizing DIV

  1. #1
    skimask's Avatar
    skimask Guest

    Default Optimizing DIV

    Quote Originally Posted by Darrel Taylor View Post
    Here's an example of measuring the time to do a 16/16 bit divide. But you can have any number of statements inbetween, as long as the time does not exceed 65535 instructions.
    An aside from another thread discussing math execution time in PBP 2.50B, where it seems that ALL add/subtract/multiply/divide operations are done as though they are signed-31-bit operations vs. differentiating between byte/word/long/mixed operations.
    IF I'm reading pbppi18L.Lib correctly, it seems that the worst case for a s31/s31 divide is about 1,127 cycles (signed, 2^30-1 divided by 1), best case looks to be about 713 cycles (unsigned, 1 divided by 0)
    Code:
    ;****************************************************************
    ;* DIV        : 32 x 32 divide                                  *
    ;* Input      : R0 / R1                                         *
    ;* Output     : R0 = quotient                                   * 
    ;*            : R2 = remainder                                  *
    ;* Notes      : R2 = R0 MOD R1                                  *
    ;****************************************************************
    
        ifdef DIVS_USED
      LIST
    DIVS	clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    divchkr1 btfss	R1 + 3, 7	; Check for R1 negative
    	bra	divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
        endif
    
        ifdef DIV_USED
      LIST
    DIV
          ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator	
          endif
    divdo	clrf	R2		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    
    	movlw	32
    
    IF R0.byte3 = 0 AND R1.byte3 = 0 then movlw 24
    and preshift R0 and R1 over 8 bits
    
    IF R0.word1 = 0 AND R1.word1 = 0 then movlw 16
    and preshift R0 and R1 over 16 bits
    
    IF R0.word1 = 0 AND R1.word1 = 0 and R0.byte1 = 0 and R1.byte1 = 0 then movlw 8
    and preshift R0 and R1 over 24 bits
    
    	movwf	R3
    
    divloop	rlcf	R0 + 3, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    
    	bc	divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    
    	bcf	STATUS, C
    
    divok	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    
    	decfsz	R3, F
    	bra	divloop
    
          ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	divdone		; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    divdone
        endif
    
    	movf	R0, W		; Get low byte to W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
        endif
    The divide loop itself takes 34 cycles per iteration for 32 iterations. Worst case, 1,088 cycles, going into the loop 25 cycles, coming out of the loop 14 cycles.
    I think, and I haven't played with it yet, in the highlighted section above, a couple of checks (in italics) could be put in there to check if the dividend and/or divisor's upper bytes (or break it down to individual bits) are cleared. If they are, then preshift R0 and R1 and lop off 8, 16, or 24 (up to 30 if checking by bit) iterations of the loop itself.
    The worst case (long dividend/divisor) would still take the full 1,127 cycles (plus some cycles for the checks), but the best case (byte dividend/divisor) could be knocked down from 713 cycles to about 24 cycles.
    Same could be said for the add, subtract and multiply routines, although gains would be minimal for each, and the PIC has the built-in single cycle 8x8 hardware multiplier.

  2. #2
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    This doesn't quite follow the Italics section, but I think It's what you meant.
    Code:
    	movlw	32             ; start with 32 loops
    	movwf	R3
    
    SkiOpt
    	movf    R0 + 3, W      ; IF R0.byte3 = 0 
    	bnz     divloop
    	movf    R1 + 3, W      ;   AND R1.byte3 = 0 then 
    	bnz     divloop
    
    	movlw   8              ;      loops - 8  ; movlw 24
    	subwf   R3, F
    
    	movff   R0 + 2, R0 + 3 ;      and preshift R0
    	movff   R0 + 1, R0 + 2
    	movff   R0 + 0, R0 + 1
    	clrf    R0
    
    	movff   R1 + 2, R1 + 3 ;      and R1 over 8 bits
    	movff   R1 + 1, R1 + 2
    	movff   R1 + 0, R1 + 1
    	clrf    R1
    
    	movf    R3, W
    	btfss   STATUS, Z      ; stop if no loop's left (0/0)
    	bra     SkiOpt
    Does it look like what you were thinking?
    <br>
    DT

  3. #3
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    This doesn't quite follow the Italics section, but I think It's what you meant...................
    Does it look like what you were thinking?
    <br>
    I was going to handle it today but got sidetracked with a bunch of welding/fabricating.

    Yep, that's pretty much exactly what I was thinking and it would, by default, kick out of the shifting loop if either R0/R1 was 'negative'.
    Then just test it out by doing LONG divides in a loop ( 0 thru 2^31, looping the divisor inside another loop for the dividend, would take a LONG-LONG time even at 40Mhz ), save the result and remainder, remultiply (which obviously hasn't been modified) add the remainder back in and see if that result is the same as the original integer. That should verify everything works correctly. After that, rewrite the code to break it down by bits instead of bytes.
    Should end up to be a significant speed increase and a relatively simple cut-and-paste into the .lib file...

  4. #4
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Code:
    ;****************************************************************
    ;* DIV        : 32 x 32 divide                                  *
    ;* Input      : R0 / R1                                         *
    ;* Output     : R0 = quotient                                   * 
    ;*            : R2 = remainder                                  *
    ;* Notes      : R2 = R0 MOD R1                                  *
    ;****************************************************************
    
        ifdef DIVS_USED
      LIST
    DIVS	clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    divchkr1 btfss	R1 + 3, 7	; Check for R1 negative
    	bra	divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
        endif
    
        ifdef DIV_USED
      LIST
    DIV
          ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator	
          endif
    divdo	clrf	R2		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    
    	movlw	32             ; start with 32 loops
    	movwf	R3
    
            ifdef SKI_DIV_SPEEDUP
    SkiOpt
    	movf    R0 + 3, W      ; IF R0.byte3 = 0 
    	bnz     divloop
    	movf    R1 + 3, W      ;   AND R1.byte3 = 0 then 
    	bnz     divloop
    
    	movlw   8              ;      loops - 8  ; movlw 24
    	subwf   R3, F
    
    	movff   R0 + 2, R0 + 3 ;      and preshift R0
    	movff   R0 + 1, R0 + 2
    	movff   R0 + 0, R0 + 1
    	clrf    R0
    
    	movff   R1 + 2, R1 + 3 ;      and R1 over 8 bits
    	movff   R1 + 1, R1 + 2
    	movff   R1 + 0, R1 + 1
    	clrf    R1
    
    	movf    R3, W
    	btfss   STATUS, Z      ; stop if no loop's left (0/0)
    	bra     SkiOpt
    
            endif
    
    divloop	rlcf	R0 + 3, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    
    	bc	divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    
    	bcf	STATUS, C
    
    divok	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    
    	decfsz	R3, F
    	bra	divloop
    
          ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	divdone		; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    divdone
        endif
    
    	movf	R0, W		; Get low byte to W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
        endif
    This optimization (byte level optimize) seems to work very well on my end using MPLAB sim and the stopwatch.
    I've got to finish up a 2nd demo board with an LCD, one running the non-optimized version, one running the optimized version, just to see which one counts faster and by how much.
    I still want to change the optimization down to the bit level and have the library recognize the DEFINE as none (not defined), byte or bit.
    How do you get MPASM to recognize different parameters using DEFINE in PBP? I can't seem to get it to work right...

  5. #5
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by skimask View Post
    How do you get MPASM to recognize different parameters using DEFINE in PBP? I can't seem to get it to work right...
    Several ways ... here's one.

    Code:
    DEFINE  OptLevel  BYTE
    
    ASM
    BIT = 1
    BYTE = 8
    WORD = 16
    LONG = 32
    
        IFDEF OptLevel  
            IF (OptLevel == BYTE)
                ; -- Byte level specified
            ENDIF
            IF (OptLevel == WORD)
                ; -- Word level specified
            ENDIF
        ENDIF
    ENDASM
    DT

  6. #6
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    Several ways ... here's one.
    Noticed that this morning when I looked thru a couple of my .LST files.
    Not sure if my problem was the missing parentheses or the double-equal in the IF statement.

    My next step is to 'include' various levels of 32 bit divide optimization, none, byte, and bit level
    i.e.
    DEFINE SKI_DIV_OPT BYTE (speeds up divides 8 bits at a time)
    DEFINE SKI_DIV_OPT BIT (speeds up divides at the bit level)

  7. #7
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Ok, did some fooling around with the divide optimizations...
    This looks like it should work...
    Code:
    ;****************************************************************
    ;* DIV        : 32 x 32 divide                                  *
    ;*                                                              *
    ;* Input      : R0 / R1                                         *
    ;* Output     : R0 = quotient                                   * 
    ;*            : R2 = remainder                                  *
    ;*                                                              *
    ;* Notes      : R2 = R0 MOD R1                                  *
    ;****************************************************************
    
        ifdef DIVS_USED
      LIST
    DIVS	clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    divchkr1 btfss	R1 + 3, 7	; Check for R1 negative
    	bra	divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
        endif
    
        ifdef DIV_USED
      LIST
    DIV
          ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator	
          endif
    divdo	clrf	R2		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    
    	movlw	32
    	movwf	R3
    
    ;added to speed up s-31 bit divide operations by ignoring
    ;zero'd bytes
            ifdef SKI_DIV_OPT
    		if ( SKI_DIV_OPT == 1 )
    SkiOpt
    	movf    R0 + 3, W      ; IF R0.byte3 = 0 
    	bnz     divloop
    	movf    R1 + 3, W      ;   AND R1.byte3 = 0 then 
    	bnz     divloop
    
    	movlw   8              ;      loops - 8  ; movlw 24
    	subwf   R3, F
    
    	movff   R0 + 2, R0 + 3 ;      and preshift R0
    	movff   R0 + 1, R0 + 2
    	movff   R0 + 0, R0 + 1
    	clrf    R0
    
    	movff   R1 + 2, R1 + 3 ;      and R1 over 8 bits
    	movff   R1 + 1, R1 + 2
    	movff   R1 + 0, R1 + 1
    	clrf    R1
    
    	movf    R3, W
    	btfss   STATUS, Z      ; stop if no loop's left (0/0)
    	bra     SkiOpt
    		endif
            endif
    
    ;above added to speed divide operations
    
    ;added to speed up s-31 bit divides by skipping cleared bits in divisor/dividend
    	ifdef SKI_DIV_OPT
    		if ( SKI_DIV_OPT == 2 )
    SkiOpt2
    	btfsc	R0 + 3, 7	; if highest bit set, goto divloop
    	bra	divloop
    	btfsc	R1 + 3, 7	; if highest bit set, goto divloop
    	bra	divloop
    
    ;streamlined code here...old stuff is gone...
    	bsc	status, 0	;clear carry - shift over complete R0
    	rlcf	R0, F		;shift R0, .7 into carry
    	rlcf	R0 + 1, F	;shift R0+1
    	rlcf	R0 + 2, F	;shift R0+2
    	rlcf	R0 + 3, F	;shift R0+3
    
    	bsc	status, 0	;clear carry - shift over complete R1
    	rlcf	R1, F		;shift R1, .7 into carry
    	rlcf	R1 + 1, F	;shift R1+1
    	rlcf	R1 + 2, F	;shift R1+2
    	rlcf	R1 + 3, F	;shift R1+3
    
    	movlw	1		;subtract one from the loop count
    	subwf	R3, F
    
    	movf	R3, W
    	btfss STATUS, Z	;stop if no more loops
    	bra	SkiOpt2
    
    		endif
    	endif
    ;above added to speed up divides at the bit level
    
    divloop	rlcf	R0 + 3, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    
    	bc	divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    
    	bcf	STATUS, C
    
    divok	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    
    	decfsz	R3, F
    	bra	divloop
    
          ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	divdone		; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    divdone
        endif
    
    	movf	R0, W		; Get low byte to W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
        endif
    See any glaring problems?
    I'm still a bit fuzzy on using the DEFINEs in PBP and relating them to the assembler though, so not sure if that will work.
    And, with the bit level optimization, I'm not sure that'll save any time anyways.
    More testing to come...
    Last edited by skimask; - 8th September 2008 at 02:06. Reason: Streamlined code a bit, changed shift from hi->low, to low->high by using carry bit...

  8. #8
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Well, looks like it may need something else.

    I wrapped it in a program, and changed the labels so it would coexist with PBP, then used a couple FOR loops to cycle through some numbers. It does the optimized divide, then does a normal PBP divide and compares the results. If they aren't the same, is sends a message out the USART.

    With the DEFINE SKI_DIV_OPT line commented out, both versions are always the same. Which shows that the test program is working, and the output looks like this ...
    Code:
    A=93  No ERRORs
    A=94  No ERRORs
    A=95  No ERRORs
    A=96  No ERRORs
    A=97  No ERRORs
    A=98  No ERRORs
    A=99  No ERRORs
    A=100  No ERRORs
    A=101  No ERRORs
    A=102  No ERRORs
    A=103  No ERRORs
    A=104  No ERRORs
    With SKI_DIV_OPT defined as either 1 or 2, the results are the same. Here's some output from SKI_DIV_OPT 1 ...
    Code:
     Quotient Error: A=59  B=51  Ski=0  PBP=1
    Remainder Error: A=59  B=51  Ski=59  PBP=8
     Quotient Error: A=59  B=52  Ski=0  PBP=1
    Remainder Error: A=59  B=52  Ski=59  PBP=7
     Quotient Error: A=59  B=53  Ski=0  PBP=1
    Remainder Error: A=59  B=53  Ski=59  PBP=6
     Quotient Error: A=59  B=54  Ski=0  PBP=1
    Remainder Error: A=59  B=54  Ski=59  PBP=5
     Quotient Error: A=59  B=55  Ski=0  PBP=1
    Remainder Error: A=59  B=55  Ski=59  PBP=4
     Quotient Error: A=59  B=56  Ski=0  PBP=1
    Remainder Error: A=59  B=56  Ski=59  PBP=3
     Quotient Error: A=59  B=57  Ski=0  PBP=1
    Remainder Error: A=59  B=57  Ski=59  PBP=2
     Quotient Error: A=59  B=58  Ski=0  PBP=1
    Remainder Error: A=59  B=58  Ski=59  PBP=1
     Quotient Error: A=59  B=59  Ski=0  PBP=1
    Remainder Error: A=59  B=59  Ski=59  PBP=0
     Quotient Error: A=60  B=0  Ski=255  PBP=4294967295
     Quotient Error: A=60  B=1  Ski=0  PBP=60
    Remainder Error: A=60  B=1  Ski=60  PBP=0
     Quotient Error: A=60  B=2  Ski=0  PBP=30
    Remainder Error: A=60  B=2  Ski=60  PBP=0
     Quotient Error: A=60  B=3  Ski=0  PBP=20
    Remainder Error: A=60  B=3  Ski=60  PBP=0
     Quotient Error: A=60  B=4  Ski=0  PBP=15
    Remainder Error: A=60  B=4  Ski=60  PBP=0
     Quotient Error: A=60  B=5  Ski=0  PBP=12
    Remainder Error: A=60  B=5  Ski=60  PBP=0
     Quotient Error: A=60  B=6  Ski=0  PBP=10
    Remainder Error: A=60  B=6  Ski=60  PBP=0
     Quotient Error: A=60  B=7  Ski=0  PBP=8
    Remainder Error: A=60  B=7  Ski=60  PBP=4
     Quotient Error: A=60  B=8  Ski=0  PBP=7
    Remainder Error: A=60  B=8  Ski=60  PBP=4
     Quotient Error: A=60  B=9  Ski=0  PBP=6
    Remainder Error: A=60  B=9  Ski=60  PBP=6
     Quotient Error: A=60  B=10  Ski=0  PBP=6
    Remainder Error: A=60  B=10  Ski=60  PBP=0
     Quotient Error: A=60  B=11  Ski=0  PBP=5
    Remainder Error: A=60  B=11  Ski=60  PBP=5
     Quotient Error: A=60  B=12  Ski=0  PBP=5
    Remainder Error: A=60  B=12  Ski=60  PBP=0
     Quotient Error: A=60  B=13  Ski=0  PBP=4
    Here's the program, any 18F will do ...
    Code:
    '****************************************************************
    '*  Name    : Test_SkiDIV.pbp                                   *
    '*  Author  : Darrel Taylor                                     *
    '*  Date    : 9/9/2008                                          *
    '*  Version : 1.0                                               *
    '*  Thread  : instruction execution time                        *
    '*      http://www.picbasic.co.uk/forum/showthread.php?p=61992  *
    '****************************************************************
    ;****************************************************************
    ;* DIV        : 32 x 32 divide                                  *
    ;*                                                              *
    ;* Input      : R0 / R1                                         *
    ;* Output     : R0 = quotient                                   * 
    ;*            : R2 = remainder                                  *
    ;*                                                              *
    ;* Notes      : R2 = R0 MOD R1                                  *
    ;****************************************************************
    
    DEFINE OSC 40
    
    DEFINE HSER_TXSTA 24h 'Hser transmit status init 
    DEFINE HSER_RCSTA 90h 'Hser receive status init 
    DEFINE HSER_BAUD 38400 'Hser baud rate 
    DEFINE HSER_CLROERR 1 'Hser clear overflow automatically 
     
    DEFINE SKI_DIV_OPT 1
    
    AL    VAR LONG
    BL    VAR LONG
    SkiQ  VAR LONG   ; ASM quotient
    SkiR  VAR LONG   ; ASM remainder
    
    PBPQ  VAR LONG   ; PBP quotient
    PBPR  VAR LONG   ; PBP remainder
    
    AW  VAR WORD
    BW  VAR WORD
    RW  VAR WORD
    
    ERROR VAR BIT
    
    For AL = 0 to 1000
        ERROR = 0
        For BL = 0 to 1000
            @ MOVE?NN  _AL, R0
            @ MOVE?NN  _BL, R1  ; AL / BL
            @ L?CALL   #DIVS
            @ MOVE?ANN R0, _SkiQ
            @ MOVE?NN  R2, _SkiR
            PBPQ = AL / BL                   ; do same in PBP
            PBPR = AL // BL
            
            if SkiQ != PBPQ then 
                HSEROUT [" Quotient Error: A=",DEC AL,"  B=",DEC BL, _
                         "  Ski=",dec SkiQ,"  PBP=",DEC PBPQ,13,10]
                ERROR = 1
            endif
            if SkiR != PBPR then 
                HSEROUT ["Remainder Error: A=",DEC AL,"  B=",DEC BL, _
                         "  Ski=",dec SkiR,"  PBP=",DEC PBPR,13,10]
                ERROR = 1
            endif
        next BL
        if ERROR = 0 then HSEROUT ["A=",dec AL,"  No ERRORs",13,10]
    next AL
    
    stop
    
    ; -------------------
    ASM
        ifdef DIVS_USED
      LIST
    #DIVS  clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	#divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    #divchkr1  btfss	R1 + 3, 7	; Check for R1 negative
    	bra	#divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	#divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
        endif
    
        ifdef DIV_USED
      LIST
    #DIV
          ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator	
          endif
    #divdo	clrf	R2		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    
    	movlw	32
    	movwf	R3
    
    ;added to speed up s-31 bit divide operations by ignoring
    ;zero'd bytes
            ifdef SKI_DIV_OPT
    		if ( SKI_DIV_OPT == 1 )
    SkiOpt
    	movf    R0 + 3, W      ; IF R0.byte3 = 0 
    	bnz     #divloop
    	movf    R1 + 3, W      ;   AND R1.byte3 = 0 then 
    	bnz     #divloop
    
    	movlw   8              ;      loops - 8  ; movlw 24
    	subwf   R3, F
    
    	movff   R0 + 2, R0 + 3 ;      and preshift R0
    	movff   R0 + 1, R0 + 2
    	movff   R0 + 0, R0 + 1
    	clrf    R0
    
    	movff   R1 + 2, R1 + 3 ;      and R1 over 8 bits
    	movff   R1 + 1, R1 + 2
    	movff   R1 + 0, R1 + 1
    	clrf    R1
    
    	movf    R3, W
    	btfss   STATUS, Z      ; stop if no loop's left (0/0)
    	bra     SkiOpt
    		endif
            endif
    
    ;above added to speed divide operations
    
    ;added to speed up s-31 bit divides by skipping cleared bits in divisor/dividend
    	ifdef SKI_DIV_OPT
    		if ( SKI_DIV_OPT == 2 )
    SkiOpt2
    	btfsc	R0 + 3, 7	; if highest bit set, goto divloop
    	bra	#divloop
    	btfsc	R1 + 3, 7	; if highest bit set, goto divloop
    	bra	#divloop
    
    ;streamlined code here...old stuff is gone...
    	bcf    	STATUS, 0	;clear carry - shift over complete R0
    	rlcf	R0, F		;shift R0, .7 into carry
    	rlcf	R0 + 1, F	;shift R0+1
    	rlcf	R0 + 2, F	;shift R0+2
    	rlcf	R0 + 3, F	;shift R0+3
    
    ;	bsc	status, 0	;clear carry - shift over complete R1
    	bcf	    STATUS, 0	;clear carry - shift over complete R1
    	rlcf	R1, F		;shift R1, .7 into carry
    	rlcf	R1 + 1, F	;shift R1+1
    	rlcf	R1 + 2, F	;shift R1+2
    	rlcf	R1 + 3, F	;shift R1+3
    
    	movlw	1		;subtract one from the loop count
    	subwf	R3, F
    
    	movf	R3, W
    	btfss STATUS, Z	;stop if no more loops
    	bra	SkiOpt2
    
    		endif
    	endif
    ;above added to speed up divides at the bit level
    
    #divloop	rlcf	R0 + 3, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    
    	bc	#divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    
    	bcf	STATUS, C
    
    #divok	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    
    	decfsz	R3, F
    	bra	#divloop
    
          ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	#divdone		; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    #divdone
        endif
    
    	movf	R0, W		; Get low byte to W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
        endif
    ENDASM
    DT

  9. #9
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    You got to it before I could finish it up. I see what you've done.
    Yep, sure doesn't work! I did have it working though...at least in MPLAB's sim, watching registers come up the same. Never did finish up the 2nd LCD board...
    Question : I would've thought that using the pound sign in front of a label and for a label would jack up the assembler. This works as you've written?

  10. #10
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Yup, the pound sign is just another character to the assembler.

    Psuedo-ops like #define, could just as easily been ?define or Tdefine (had microchip chosen to do so).
    It's just a word that uses # as one of it's characters.
    DT

  11. #11
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    One possible error I can see is that these are 'signed-31' bit numbers, not 32 bit numbers.
    Probably shouldn't shift everything over into the msb of the MSB, stop one bit short of that.

    Code:
    '****************************************************************
    '*  Name    : Test_SkiDIV.pbp                                   *
    '*  Author  : Darrel Taylor                                     *
    '*  Date    : 9/9/2008                                          *
    '*  Version : 1.0                                               *
    '*  Thread  : instruction execution time                        *
    '*      http://www.picbasic.co.uk/forum/showthread.php?p=61992  *
    '****************************************************************
    ;****************************************************************
    ;* DIV        : 32 x 32 divide                                  *
    ;* Input      : R0 / R1                                         *
    ;* Output     : R0 = quotient                                   * 
    ;*            : R2 = remainder                                  *
    ;* Notes      : R2 = R0 MOD R1                                  *
    ;****************************************************************
    ;1755 bytes used
    DEFINE OSC 40
    DEFINE HSER_TXSTA 24h 'Hser transmit status init 
    DEFINE HSER_RCSTA 90h 'Hser receive status init 
    DEFINE HSER_BAUD 38400 'Hser baud rate 
    DEFINE HSER_CLROERR 1 'Hser clear overflow automatically 
    DEFINE SKI_DIV_OPT 1
    AL    VAR LONG
    BL    VAR LONG
    SkiQ  VAR LONG   ; ASM quotient
    SkiR  VAR LONG   ; ASM remainder
    PBPQ  VAR LONG   ; PBP quotient
    PBPR  VAR LONG   ; PBP remainder
    AW  VAR WORD
    BW  VAR WORD
    RW  VAR WORD
    ERROR VAR BIT
    For AL = 0 to 1000
        ERROR = 0
        For BL = 0 to 1000
            @ MOVE?NN  _AL, R0
            @ MOVE?NN  _BL, R1  ; AL / BL
            @ L?CALL   #DIVS
            @ MOVE?ANN R0, _SkiQ
            @ MOVE?NN  R2, _SkiR
            PBPQ = AL / BL                   ; do same in PBP
            PBPR = AL // BL
            if SkiQ != PBPQ then 
                HSEROUT [" Quotient Error: A=",DEC AL,"  B=",DEC BL, _
                         "  Ski=",dec SkiQ,"  PBP=",DEC PBPQ,13,10]
                ERROR = 1
            endif
            if SkiR != PBPR then 
                HSEROUT ["Remainder Error: A=",DEC AL,"  B=",DEC BL, _
                         "  Ski=",dec SkiR,"  PBP=",DEC PBPR,13,10]
                ERROR = 1
            endif
        next BL
        if ERROR = 0 then HSEROUT ["A=",dec AL,"  No ERRORs",13,10]
    next AL
    stop
    ; ---- nothing has been changed below
    ASM
    	ifdef DIVS_USED
      LIST
    #DIVS
    	clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	#divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    #divchkr1
    	btfss	R1 + 3, 7	; Check for R1 negative
    	bra	#divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	#divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
    	endif
    
    	ifdef DIV_USED
      LIST
    #DIV
    		ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator	
    		endif
    #divdo
    	clrf	R2		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    
    	movlw	32
    	movwf	R3
    
    ;added to speed up s-31 divide op's by ignoring zero'd bytes
            ifdef SKI_DIV_OPT
    		if ( SKI_DIV_OPT == 1 )
    SkiOpt
    	movf    R0 + 3, W	; IF R0.byte3 (low 7 bits) = 0 
    	bcf	W, 7		; Clear 'sign'
    	bnz	#divloop	; 
    	movf    R1 + 3, W	; AND R1.byte3 (low 7 bits)=0 then
    	bcf	W, 7		; Clear 'sign'
    	bnz     #divloop
    	;but only check Rx+3.7 the first time thru (1st time)
    	;after that, instead of a byte shift,
    	;shift everything << 7 (2nd time),
    	;then do 2 more byte shifts if possible
    
    Ski_Shift7
    	movlw	6
    	movwf	R3
    	
    	rlcf	R0, F		;shift 7 times
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    
    	rlcf	R1, F
    	rlcf	R1 + 1, F
    	rlcf	R1 + 2, F
    	rlcf	R1 + 3, F
    
    	movlw	1
    	subwf	R3, F
    	
    	movf	R3, W
    	btfss	STATUS, Z
    	bra	Ski_Shift7	
    
    	;check again after 7 shifts
    	movlw	25
    	movwf	R3
    
    Ski_Shift3
    	movf    R0 + 3, W	; IF R0.byte3 = 0 
    	bnz	#divloop	; 
    	movf    R1 + 3, W	; AND R1.byte3 = 0 then
    	bnz     #divloop
    
    	movff   R0 + 2, R0 + 3 ;      and preshift R0
    	movff   R0 + 1, R0 + 2
    	movff   R0 + 0, R0 + 1
    	clrf    R0
    
    	movff   R1 + 2, R1 + 3 ;      and R1 over 8 bits
    	movff   R1 + 1, R1 + 2
    	movff   R1 + 0, R1 + 1
    	clrf    R1
    	
    	movlw	8
    	subwf	R3, F
    	
    	movf	R3, W
    	btfss	STATUS, C	;if it's 1 (actually 1-8)
    	bra	Ski_Shift3	;jump out
    	
    	bra     SkiOpt
    		endif
            endif
    
    ;above added to speed divide operations
    
    ;added to speed up s-31 divides by skipping clr'd bits in divisor/dividend
    	ifdef SKI_DIV_OPT
    		if ( SKI_DIV_OPT == 2 )
    SkiOpt2
    		;change 7 to 6
    	btfsc	R0 + 3, 6	; if highest bit set, goto divloop
    	bra	#divloop
    		;change 7 to 6
    	btfsc	R1 + 3, 6	; if highest bit set, goto divloop
    	bra	#divloop
    	;check Rx+3.6 instead of .7 and do shift if possible
    
    ;streamlined code here...old stuff is gone...
    	bcf    	STATUS, C	;clr carry-shift over complete R0
    	rlcf	R0, F		;shift R0, .7 into carry
    	rlcf	R0 + 1, F	;shift R0+1
    	rlcf	R0 + 2, F	;shift R0+2
    	rlcf	R0 + 3, F	;shift R0+3
    
    	bcf	STATUS, C	;clr carry-shift over complete R1
    	rlcf	R1, F		;shift R1, .7 into carry
    	rlcf	R1 + 1, F	;shift R1+1
    	rlcf	R1 + 2, F	;shift R1+2
    	rlcf	R1 + 3, F	;shift R1+3
    
    	movlw	1		;subtract one from the loop count
    	subwf	R3, F
    
    	movf	R3, W
    	btfss STATUS, Z	;stop if no more loops
    	bra	SkiOpt2
    
    		endif
    	endif
    ;above added to speed up divides at the bit level
    
    #divloop
    	rlcf	R0 + 3, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    	bc	#divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    	bcf	STATUS, C
    #divok
    	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    	decfsz	R3, F
    	bra	#divloop
    		ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	#divdone		; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    #divdone
    		endif
    	movf	R0, W		; Get low byte to W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
    	endif
    ENDASM
    Last edited by skimask; - 11th September 2008 at 05:42. Reason: Code changes...what else? x2 after post #17

  12. #12
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Thumbs up

    Hmmm, that's promissing! ...
    Code:
    ...
    A=247  No ERRORs
    A=248  No ERRORs
    A=249  No ERRORs
    A=250  No ERRORs
    A=251  No ERRORs
    A=252  No ERRORs
    A=253  No ERRORs
    A=254  No ERRORs
    A=255  No ERRORs
    A=256  No ERRORs
    A=257  No ERRORs
    A=258  No ERRORs
    A=259  No ERRORs
    A=260  No ERRORs
    ...
    More Testing and Timing is in order.
    DT

  13. #13
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    .................................................. ............
    EDIT: It'll fail above 24 bits...and if it's negative...Nothing in there shifting over the bytes!

    EDIT #2 : Replaced code in post #15
    Last edited by skimask; - 11th September 2008 at 05:43. Reason: Edit's abound...

  14. #14
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Ok, this looks good...er...I think...
    Code:
    '****************************************************************
    '*  Name    : Test_SkiDIV.pbp                                   *
    '*  Author  : Darrel Taylor                                     *
    '*  Date    : 9/9/2008                                          *
    '*  Version : 1.0                                               *
    '*  Thread  : instruction execution time                        *
    '*      http://www.picbasic.co.uk/forum/showthread.php?p=61992  *
    '****************************************************************
    ;****************************************************************
    ;* DIV        : 32 x 32 divide                                  *
    ;* Input      : R0 / R1                                         *
    ;* Output     : R0 = quotient                                   * 
    ;*            : R2 = remainder                                  *
    ;* Notes      : R2 = R0 MOD R1                                  *
    ;****************************************************************
    ;1755 bytes used
    DEFINE OSC 40
    DEFINE HSER_TXSTA 24h 'Hser transmit status init 
    DEFINE HSER_RCSTA 90h 'Hser receive status init 
    DEFINE HSER_BAUD 38400 'Hser baud rate 
    DEFINE HSER_CLROERR 1 'Hser clear overflow automatically 
    DEFINE SKI_DIV_OPT 1
    AL    VAR LONG
    BL    VAR LONG
    SkiQ  VAR LONG   ; ASM quotient
    SkiR  VAR LONG   ; ASM remainder
    PBPQ  VAR LONG   ; PBP quotient
    PBPR  VAR LONG   ; PBP remainder
    AW  VAR WORD
    BW  VAR WORD
    RW  VAR WORD
    ERROR VAR BIT
    For AL = 0 to 1000
        ERROR = 0
        For BL = 0 to 1000
            @ MOVE?NN  _AL, R0
            @ MOVE?NN  _BL, R1  ; AL / BL
            @ L?CALL   #DIVS	'throw in a little bit of clock counting?
            @ MOVE?ANN R0, _SkiQ
            @ MOVE?NN  R2, _SkiR
            PBPQ = AL / BL                   ; do same in PBP
            PBPR = AL // BL		'throw in a bit of clock counting here too?
            if SkiQ != PBPQ then 
                HSEROUT [" Quotient Error: A=",DEC AL,"  B=",DEC BL, _
                         "  Ski=",dec SkiQ,"  PBP=",DEC PBPQ,13,10]
                ERROR = 1
            endif
            if SkiR != PBPR then 
                HSEROUT ["Remainder Error: A=",DEC AL,"  B=",DEC BL, _
                         "  Ski=",dec SkiR,"  PBP=",DEC PBPR,13,10]
                ERROR = 1
            endif
        next BL
        if ERROR = 0 then HSEROUT ["A=",dec AL,"  No ERRORs",13,10]
    next AL
    stop
    ; ---- nothing has been changed below
    ASM
    	ifdef DIVS_USED
      LIST
    #DIVS
    	clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	#divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    #divchkr1
    	btfss	R1 + 3, 7	; Check for R1 negative
    	bra	#divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	#divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
    	endif
    
    	ifdef DIV_USED
      LIST
    #DIV
    		ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator	
    		endif
    #divdo
    	clrf	R2		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    
    	movlw	32
    	movwf	R3
    
    ;added to speed up s-31 divide op's by ignoring zero'd bytes
            ifdef SKI_DIV_OPT
    		if ( SKI_DIV_OPT == 1 )
    SkiOpt
    	movf    R0 + 3, W	; IF R0.byte3 (low 7 bits) = 0 
    	bcf	W, 7		; Clear 'sign'
    	bnz	#divloop	; 
    	movf    R1 + 3, W	; AND R1.byte3 (low 7 bits)=0 then
    	bcf	W, 7		; Clear 'sign'
    	bnz     #divloop
    	;but only check Rx+3.7 the first time thru (1st time)
    	;after that, instead of a byte shift,
    	;shift everything << 7 (2nd time),
    	;then do 2 more byte shifts if possible
    
    Ski_Shift7
    	movlw	6
    	movwf	R3
    	
    	bcf	STATUS, C
    	rlcf	R0, F		;shift 7 times
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    
    	rlcf	R1, F
    	rlcf	R1 + 1, F
    	rlcf	R1 + 2, F
    	rlcf	R1 + 3, F
    
    	movlw	1
    	subwf	R3, F
    	
    	movf	R3, W
    	btfss	STATUS, Z
    	bra	Ski_Shift7	
    
    	;check again after 7 shifts
    	movlw	25
    	movwf	R3
    
    Ski_Shift3
    	movf    R0 + 3, W	; IF R0.byte3 = 0 
    	bnz	#divloop	; 
    	movf    R1 + 3, W	; AND R1.byte3 = 0 then
    	bnz     #divloop
    
    	movff   R0 + 2, R0 + 3 ;      and preshift R0
    	movff   R0 + 1, R0 + 2
    	movff   R0 + 0, R0 + 1
    	clrf    R0
    
    	movff   R1 + 2, R1 + 3 ;      and R1 over 8 bits
    	movff   R1 + 1, R1 + 2
    	movff   R1 + 0, R1 + 1
    	clrf    R1
    	
    	movlw	8
    	subwf	R3, F
    	
    	movf	R3, W
    	btfss	STATUS, C	;if it's 1 (actually 1-8)
    	bra	Ski_Shift3	;jump out
    
    		endif
            endif
    
    ;above added to speed divide operations
    
    ;added to speed up s-31 divides by skipping clr'd bits in divisor/dividend
    	ifdef SKI_DIV_OPT
    		if ( SKI_DIV_OPT == 2 )
    SkiOpt2
    		;change 7 to 6
    	btfsc	R0 + 3, 6	; if highest bit set, goto divloop
    	bra	#divloop
    		;change 7 to 6
    	btfsc	R1 + 3, 6	; if highest bit set, goto divloop
    	bra	#divloop
    	;check Rx+3.6 instead of .7 and do shift if possible
    
    ;streamlined code here...old stuff is gone...
    	bcf    	STATUS, C	;clr carry-shift over complete R0
    	rlcf	R0, F		;shift R0, .7 into carry
    	rlcf	R0 + 1, F	;shift R0+1
    	rlcf	R0 + 2, F	;shift R0+2
    	rlcf	R0 + 3, F	;shift R0+3
    
    	bcf	STATUS, C	;clr carry-shift over complete R1
    	rlcf	R1, F		;shift R1, .7 into carry
    	rlcf	R1 + 1, F	;shift R1+1
    	rlcf	R1 + 2, F	;shift R1+2
    	rlcf	R1 + 3, F	;shift R1+3
    
    	movlw	1		;subtract one from the loop count
    	subwf	R3, F
    
    	movf	R3, W
    	btfss STATUS, Z	;stop if no more loops
    	bra	SkiOpt2
    
    		endif
    	endif
    ;above added to speed up divides at the bit level
    
    #divloop
    	rlcf	R0 + 3, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    	bc	#divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    	bcf	STATUS, C
    #divok
    	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    	decfsz	R3, F
    	bra	#divloop
    		ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	#divdone		; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    #divdone
    		endif
    	movf	R0, W		; Get low byte to W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
    	endif
    ENDASM
    Last edited by skimask; - 11th September 2008 at 06:34.

  15. #15
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Now I get nothing with SKI_DIV_OPT 1.

    SKI_DIV_OPT 2 still looks the same.

    Are you still using the Simulator?
    <br>
    DT

  16. #16
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    Now I get nothing with SKI_DIV_OPT 1.

    SKI_DIV_OPT 2 still looks the same.

    Are you still using the Simulator?
    <br>
    Yes, yes, yes... I know...Don't use the simulator.
    I'll get on it good this weekend with hardware (probably just temporarily reprogram my OBD reader for grins)...

    Ok, So, Opt 1 - nothing - What mean you by 'nothing'? Zero's all the way around?
    Opt 2 - gets roughly the same garbage as before?

  17. #17
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by skimask View Post
    Ok, So, Opt 1 - nothing - What mean you by 'nothing'? Zero's all the way around?
    No serial output at all. It's stuck in the optimize section, getting dizzy doing loops.
    May have something to do with subtracting 8 loops from what's now 25 (was 32), but I'm not sure.

    Opt 2 - gets roughly the same garbage as before?
    Same stuff. Quotient is 0, Remainder has the full A value.
    <br>
    DT

  18. #18
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Jeeze...up in SkiShift3...
    Code:
    	movlw	8
    	subwf	R3, F
    	
    	movf	R3, W
    	btfss	STATUS, C	;if it's 1 (actually 1-8)
    	bra	Ski_Shift3	;jump out
    Not going to get much of a carry from that am I?
    The subwf should set STATUS as appropriate, should be able to remove movf R3, W above the branch.
    I'm still looking thru my code in MCS...
    Might not have to worry about the most-sig-bit in R0/R1 since it's preset to 0 by the code at the beginning, therefore, that'll negate checking bit 30 instead of bit 31 of R0/R1.
    Last edited by skimask; - 11th September 2008 at 23:18.

  19. #19
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Did some pencil/paper work on the s31 divide operations at the bit level...
    Trying to optimize at the bit level is fruitless. Preshifting bits accomplishes the same thing that #divloop does except the fact that if the subtraction fails, #divloop restores the working registers (Rx). A few cycles may be wasted there with the restoration of the R(x) register, but those same cycles that may have been saved there, would have been used in the preshifting anyway.
    Optimizing at the byte level should still show a fair amount of cycle savings...
    More pencil/paper work...

  20. #20
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Did some thinking...
    This code tries to optimize the divide loops by preshifting the other way, to the right, getting rid of the trailing zero's (i.e. 16/8 is the same as 2/1, saves 3 times thru the #divloop).
    This works 99.99999% of the time, all the way up to 32 bits. The only thing I can get to fail is 0 / 0, zero divided by zero. Special case needs special case code.
    Problem is...I can't seem to see any improvements! The instruction cycle savings is there. I can count them by hand. On my setup, if I use PBP's divide, I get about 2600 divides per minute. If I use OPT 1,2 or 3 (both byte and bit together), I get roughly the same 2600 divides per minute. My counts aren't that accurate since I'm using the gLCD and a handheld stopwatch (I quit using the simulator completely).
    The code I'm using has some minimal LCD output in it, so it may be that most of the time is being used up by the display code and not a lot left over for the divides themselves.

    EDIT: The LCD routines are definitely killing the speed...Divides per minute are more around 280,000+...

    Code:
    ASM
    	ifdef DIVS_USED
      LIST
    #DIVS
    	clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	#divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    #divchkr1
    	btfss	R1 + 3, 7	; Check for R1 negative
    	bra	#divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	#divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
    	endif
    	ifdef DIV_USED
      LIST
    #DIV
    		ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator	
    		endif
    #divdo
    	clrf	R2		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    	movlw	32
    	movwf	R3
    ;added to speed up s-31 divide op's by ignoring zero'd bytes
    	        ifdef SKI_DIV_OPT
    	        	if ( SKI_DIV_OPT == 1 | SKI_DIV_OPT == 3 )
    SkiOpt
    	movf    R0, W      ; IF R0(0)= 0 
    	bnz     #divloop
    
    	movf    R1, W      ;   AND R1(0)= 0 then 
    	bnz     #divloop
    
    	movff   R0 + 1, R0 + 0 ;      and preshift R0
    	movff   R0 + 2, R0 + 1
    	movff   R0 + 3, R0 + 2
    	clrf    R0 + 3
    
    	movff   R1 + 1, R1 + 0 ;      and R1 over 8 bits
    	movff   R1 + 2, R1 + 1
    	movff   R1 + 3, R1 + 2
    	clrf    R1 + 3
    
    	movlw   8              ;      loops - 8
    	subwf   R3, F
    	btfss   STATUS, Z      ; stop if no loop's left (0/0)
    	bra     SkiOpt
    			endif
    	        endif
       
    ;added to speed up s-31 divides by skipping clr'd bits in divisor/dividend lsb
    		ifdef SKI_DIV_OPT
    			if ( SKI_DIV_OPT == 2 | SKI_DIV_OPT == 3 )
    SkiOpt2
    	btfsc	R0, 0	; if lowest bit set, goto divloop
    	bra	#divloop
    	btfsc	R1, 0	; if lowest bit set, goto divloop
    	bra	#divloop
    
    	bcf    	STATUS, C	;clr carry-shift over complete R0
    	rrcf	R0 + 3, F	;shift R0+3, .0 into carry
    	rrcf	R0 + 2, F	;shift R0+2
    	rrcf	R0 + 1, F	;shift R0+1
    	rrcf	R0 + 0, F	;shift R0+0
    
    	bcf	STATUS, C	;clr carry-shift over complete R1
    	rrcf	R1 + 3, F	;shift R1, .0 into carry
    	rrcf	R1 + 2, F	;shift R1+2
    	rrcf	R1 + 1, F	;shift R1+1
    	rrcf	R1 + 0, F	;shift R1+0
    
    	movlw	1		;subtract one from the loop count
    	subwf	R3, F
    
    	btfss	STATUS, Z	;stop if no more loops
    	bra	SkiOpt2
    			endif
    		endif
    
    ;above added to speed divide operations
    #divloop
    	rlcf	R0 + 3, W     ;NOTE 1
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    	bc	#divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    	bcf	STATUS, C
    #divok
    	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    	decfsz	R3, F
    	bra	#divloop
    
    		ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	#divdone	; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    #divdone
    		endif
    
    	movf	R0, W		; Get low byte to W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
    	endif
    ENDASM
    Any other swell ideas? I've got another idea, but not sure how to implement it...
    At the NOTE 1 above (about 2/3 down thru the code), pre-check both R0 and R1, find the highest set bit from both R0 and R1 (i.e. if R0=4 and R1=128, then the highest set bit of the two is R1.7). Shift THAT bit into the carry and make the loop count start from there (in this case the loop count would be 7). The MSB of R0 (i.e. R0.31) is used as the basis of whether or not to do the divloop in the first place. If that bit is clear, and the corresponding bit in R1 is clear, no need to do that loop. But shifting both R0 and R1 up one place and decrementing the loop count will mess up the subtraction process.
    Last edited by skimask; - 13th September 2008 at 05:03.

  21. #21
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by skimask View Post
    Did some thinking...
    Uht OH!

    > My counts aren't that accurate since I'm using the gLCD and a handheld stopwatch (I quit using the simulator completely).

    Uhhh, did you forget what thread this was in...
    You don't need a stopwatch. Try Post #2

    Added: Or SteveB's "Code Timer"
    http://www.picbasic.co.uk/forum/showthread.php?t=9350
    <br>
    Last edited by Darrel Taylor; - 13th September 2008 at 06:01. Reason: Code Timer
    DT

  22. #22
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    Uhhh, did you forget what thread this was in...
    You don't need a stopwatch. Try Post #2
    <br>
    I know, but that'll only measure one or maybe a handful of divides at a time, a single divide operation.
    What I was looking for was a multitude of divides, with a bunch of different numbers, large, small, etc., which is why I kept the inner/outer loops that you originally had, but added large steps (prime numbers of course ). So, I figured doing as many different divides with different numbers in a minute would be a better indicator as to whether or not the optimizations worked or not....overall....

  23. #23
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    That's all stuff that can be done later if things work out.

    I think right now, to prove that it does "optimize" the divide routine, you need a "Base Line".
    How long does it take PBPL to do a single signed 32-bit divide?
    And how long does it take yours?

    Just a few tests at various bit sizes will tell if there's any benefit or not.
    <br>
    DT

  24. #24
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    That's all stuff that can be done later if things work out.
    I think right now, to prove that it does "optimize" the divide routine, you need a "Base Line".
    How long does it take PBP to do a single signed 32-bit divide.
    And how long does it take yours.
    Just a few tests at various bit sizes will tell if there's any benefit or not.
    <br>
    That's the thing...It depends on the bits in the divisor and the dividend.
    I'm sure you know how binary division works, set a result bit, shift everything over one, subtract, if there's a carry, add it back in, reset the result bit, try again.
    Like I said, I used your little error-checker, and the only error I can get out of the last code posted was 0/0. Everything else checks out good (unless my implementation is wrong!).
    As it is, I changed up the code just a bit to only send out to the LCD once every 256 loops (if bl.byte0=0 then -do the lcd-). So, obviously, the LCD was taking up loads of cycles, which it does anyways.
    The optimize's as they stand right now (the right shift version) do speed things up a little bit, overall, similar to how a right shift saves time over a divide by 2. In some cases though, they use a few more cycles.
    I know the 'left shift loop reduction' method on R0 and R1 has decent grounding. Just can't figure out how to implement it without wrecking the results!
    It might be easier to just use 4 different versions, where the version chosen is based on the larger of R0 and R1 (32, 24, 16 and 8 bit versions). Will use a bit more memory though...

  25. #25
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    What, are you going to make me do it?

    I will have exact numbers, good or bad.
    Do you want to be the first to know ... (a.k.a. you do it)

    Or do you give me a free Jab? Without a glove.
    <br>
    DT

  26. #26
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Results are in ....

    I'm taping up my hand.

    Anything you want to change?
    <br>
    DT

  27. #27
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Arg! I fell asleep in my chair thinking about it.
    I'm tellin' ya... I wrote a complement of 32bit math routines for my 6809E back in the day on my CoCo2. There's probably one friggin thing I'm forgetting. And no I haven't google'd anything because I don't want to.
    I'll remember it eventually... Make that jab to the back of the head. Maybe that shot will knock it forward.

  28. #28
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Ok, getting somewhere now...
    Had a thought earlier of adding a few extra loops to knock down the '32bit only' divide into sections dealing with only 32, 24, 16 and 8 bit divides.
    It's working well in the short tests, sometimes knocking cycle counts down by more than 3/4, usually by about 1/2.
    Fairly sure I fixed the 0/0 bug also...that could've been a bit more obvious...but I don't see how.

  29. #29
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    <img align=left border=1 hspace=10 vspace=10 src="http://www.picbasic.co.uk/forum/attachment.php?attachmentid=2857" />
    <!-- Name:  kapow.jpg
Views: 1329
Size:  5.7 KB -->It appears that along with some new errors, ...
    you've added an average of 3 instruction cycles to the PBP DIV time.

    According to the theory, the most optimization would be gained when using small numbers. So 8-bit/8-bit should see the greatest effect.<hr>

    --Testing 8/8, Skip 0/0 this time--
    Test- A=1-255, B=0-255
    No ERRORs:
    SkiMIN=762 SkiMAX=1018
    PBPMIN=759 PBPMAX=1015


    This is the latest test program ... SkiDiv5.pbp

    These are the results at different OPT levels ...
    SKI_DIV_OPT_1
    SKI_DIV_OPT_2
    SKI_DIV_OPT_3
    SKI_DIV_OPT_0

    This test program uses the Timing methods described in Post #2.
    DT

  30. #30
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    It appears that along with some new errors, ...
    Hmmm... That code I posted last night at 22:17 worked good for me, as far as errors go. 0/0 was the only one I got and I found the source of that one. But not much for savings.

    you've added an average of 3 instruction cycles to the PBP DIV time.
    According to the theory, the most optimization would be gained when using small numbers. So 8-bit/8-bit should see the greatest effect.
    That's what I'm getting also now that I've got a 'method' working.
    Last edited by skimask; - 14th September 2008 at 03:52.

  31. #31
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by skimask
    It's working well in the short tests, sometimes knocking cycle counts down by more than 3/4, usually by about 1/2.
    Keep in mind that the previous test program was only looking for the accuracy of the results, not the timing.

    It used 2 divides, 1 for the quotient and another for the remainder.
    Code:
            PBPQ = AL / BL                   ; do same in PBP
            PBPR = AL // BL
    When it looked like you were getting 1/2 the cycle count. It's because PBP had to do it twice.

    The new test program is geared towards Timing.
    PBP only has to do it once now.
    <br>
    DT

  32. #32
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    It used 2 divides, 1 for the quotient and another for the remainder.
    Ya, I noticed that last night after everything was running TWICE as fast.
    I knew it was too good to be true.
    I'm cooking on some numbers right now based on loop sizes of 9, 99, and 999 (step 1)...and so on...
    After those get done cooking, I'm going to expand those number out to 32 bit with larger steps and let it cook again. Like you annotated, could take months!
    Last edited by skimask; - 14th September 2008 at 00:08.

  33. #33
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Wink

    Quote Originally Posted by skimask View Post
    I'm cooking on some numbers right now ... After those get done cooking, I'm going to expand those number out to ... and let it cook again. Like you annotated, could take months!
    Go for it! I think the SkiDiv5 program should give you enough information to give it a worthy effort.

    Just remember. I'm only the guy giving the results.
    You've chosen to attempt bettering Jeff Schmoyer ...
    Good luck.
    <br>
    DT

  34. #34
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    Go for it! I think the SkiDiv5 program should give you enough information to give it a worthy effort.
    Just remember. I'm only the guy giving the results.
    You've chosen to attempt bettering Jeff Schmoyer ...
    Good luck.
    <br>
    Working on it... I'm surely not trying to better Jeff! That would be fruitless. If this does work as planned, I'm sure it'll be one of those tradeoffs...speed for space.
    You're the only guy giving the results? So far...
    Standby...shouldn't be too long now...

  35. #35
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Code I'm using now. Using MPLAB simulator to get accurate cycle counts.
    Code:
    resetplaceholder:	'18f4685 code
    DEFINE	OSC		40	'40 Mhz clock for proto work
    DEFINE	NO_CLRWDT	1	'no extra clear watchdog timer instructions
    DISABLE
    DEFINE SKI_DIV_OPT 3
    AL VAR LONG : BL VAR LONG : SkiQ VAR LONG : SkiR VAR LONG
    PBPQ VAR LONG : PBPR VAR LONG : AW VAR WORD : BW VAR WORD : RW VAR WORD
    ERROR VAR BIT : errorcount var long : maxnum var long : s1 var long
    pbpq = al / bl	'need to use at least one PBP divide to kick in DIVS for now
    al3 var al.byte3 : al2 var al.byte2 : al1 var al.byte1 : al0 var al.byte0
    bl3 var bl.byte3 : bl2 var bl.byte2 : bl1 var bl.byte1 : bl0 var bl.byte0
    'aliased so I can find it easily in MPSIM
    testcode:
    	@ nop	;stp = easy to search and add a breakpoint in SIM
    		maxnum = 9 : s1 = 1 : gosub dodivide
            	maxnum = 99 : s1 = 1 : gosub dodivide
            	maxnum = 999 : s1 = 19 : gosub dodivide
            	maxnum = 9999 : s1 = 193 : gosub dodivide
    		maxnum = 99999 : s1 = 1193 : gosub dodivide
    		maxnum = 999999 : s1 = 31279 : gosub dodivide
    		maxnum = 9999999 : s1 = 112791 : gosub dodivide
    		maxnum = 99999999 : s1 = 327913 : gosub dodivide
    		maxnum = 999999999 : s1 = 1279137 : gosub dodivide
    	@ nop	;stp
    END
    stop
    dodivide:	For AL = 0 to maxnum step s1
    			for BL = 0 to maxnum step s1
    				PBPQ = AL / BL
    			next BL
    		next AL
    		@ nop	;stp
    		For AL = 0 to maxnum step s1
    			for BL = 0 to maxnum step s1
    				@ MOVE?NN	_AL, R0
    				@ MOVE?NN	_BL, R1		; AL / BL
    				@ L?CALL	#DIVS
    				@ MOVE?ANN	R0, _SkiQ
    				@ MOVE?NN	R2, _SkiR
    			next BL
    		next AL
    		@ nop	;stp
    		return
    
    ASM
    	ifdef DIVS_USED
      LIST
    #DIVS
    	clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	#divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    #divchkr1
    	btfss	R1 + 3, 7	; Check for R1 negative
    	bra	#divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	#divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
    	endif
    	ifdef DIV_USED
      LIST
    #DIV
    		ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator	
    		endif
    #divdo
    	clrf	R2		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    	movlw	32
    	movwf	R3
    
    ;added to speed up s-31 divides by using byte and bit shifting,
    ;added checking for 32, 24, 16 and 8 bit divides
    ;and using those routines if R0 and R1 are small enough
    		ifdef SKI_DIV_OPT
    SkiOpt3	;shift down bytes if low bytes are 0'd
    	movf    R0, W      ; IF R0(0)= 0 
    	bnz     SkiOpt4
    
    	movf    R1, W      ;   AND R1(0)= 0 then 
    	bnz     SkiOpt4
    
    	movff   R0 + 1, R0 + 0 ;      and preshift R0
    	movff   R0 + 2, R0 + 1
    	movff   R0 + 3, R0 + 2
    	clrf    R0 + 3
    
    	movff   R1 + 1, R1 + 0 ;      and R1 over 8 bits
    	movff   R1 + 2, R1 + 1
    	movff   R1 + 3, R1 + 2
    	clrf    R1 + 3
    
    	movlw   8              ;      loops - 8
    	subwf   R3, F
    	btfss   STATUS, Z      ; stop if no loop's left (0/0)
    	bra     SkiOpt3
    	bra	#divdone
    
    SkiOpt4	;shift down bytes if low bits are 0'd
    	btfsc	R0, 0	; if lowest bit set, goto divloop
    	bra	skiopt5
    	btfsc	R1, 0	; if lowest bit set, goto divloop
    	bra	skiopt5
    
    	bcf    	STATUS, C	;clr carry-shift over complete R0
    	rrcf	R0 + 3, F	;shift R0+3, .0 into carry
    	rrcf	R0 + 2, F	;shift R0+2
    	rrcf	R0 + 1, F	;shift R0+1
    	rrcf	R0 + 0, F	;shift R0+0
    
    	bcf	STATUS, C	;clr carry-shift over complete R1
    	rrcf	R1 + 3, F	;shift R1, .0 into carry
    	rrcf	R1 + 2, F	;shift R1+2
    	rrcf	R1 + 1, F	;shift R1+1
    	rrcf	R1 + 0, F	;shift R1+0
    
    	movlw	1		;subtract one from the loop count
    	subwf	R3, F
    
    	btfss	STATUS, Z	;stop if no more loops
    	bra	SkiOpt4
    	bra	#divdone
    
    skiopt5	;check if can use different divide methods (32, 24, 16, 8)
    	movlw	32		;load loop count
    	movwf	R3
    	movf	R0 + 3, W
    	bnz	#divloop	;use normal div
    	movf	R1 + 3, W
    	bnz	#divloop	;use normal div
    
    	movlw	24		;load loop count
    	movwf	R3
    	movf	R0 + 2, W
    	bnz	#divloop24	;jump out to 24 bit
    	movf	R1 + 2, W
    	bnz	#divloop24	;jump out to 24 bit
    
    	movlw	16		;load loop count
    	movwf	R3
    	movf	R0 + 1, W
    	bnz	#divloop16	;jump out to 16 bit
    	movf	R1 + 1, W
    	bnz	#divloop16	;jump out to 16 bit
    
    	movlw	8		;load loop count
    	movwf	R3
    	bra	#divloop8	;fall thru to 8 bit
    	
    		endif
    ;above added to speed divide operations
    
    #divloop	;32 bit
    	rlcf	R0 + 3, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    	bc	#divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    	bcf	STATUS, C
    #divok
    	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    	decfsz	R3, F
    	bra	#divloop
    
    		ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	#divdone	; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    	bra	#divdone
    		endif
    
    #divloop24	;24 bit
    	rlcf	R0 + 2, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	bc	#divok24
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	bcf	STATUS, C
    #divok24
    	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	decfsz	R3, F
    	bra	#divloop24
    
    		ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	#divdone	; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    	bra	#divdone
    		endif
    		
    #divloop16	;16 bit
    	rlcf	R0 + 1, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	bc	#divok16
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	bcf	STATUS, C
    #divok16
    	rlcf	R0, F
    	rlcf	R0 + 1, F
    	decfsz	R3, F
    	bra	#divloop16
    
    		ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	#divdone	; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    	bra	#divdone
    		endif
    		
    #divloop8	;8 bit
    	rlcf	R0, W
    	rlcf	R2, F
    	movf	R1, W
    	subwf	R2, F
    	bc	#divok8
    	movf	R1, W
    	addwf	R2, F
    	bcf	STATUS, C
    #divok8
    	rlcf	R0, F
    	decfsz	R3, F
    	bra	#divloop8
    
    		ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	#divdone	; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    		endif
    #divdone
    	movf	R0, W		; Get low byte to W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
    	endif
    ENDASM
    END
    Cycle counts for various loop and step sizes
    Code:
    loop count	s1=step size	'pbp clock count     ski clock count	increase
    9 : 		s1 = 1		'105,278	          26,124	4.033
    99 : 		s1 = 1		'10,632,832	       2,507,692	4.240
    999 : 		s1 = 19		'2,981,601	       1,136,305	2.624
    9999 : 	        s1 = 193	'2,869,397	       1,174,807	2.442
    99999 : 	s1 = 1193	'7,479,745	       3,913,171	1.911
    999999 : 	s1 = 31279	'1,081,858	         722,091	1.498
    9999999 : 	s1 = 112791	'8,388,778	       5,638,902	1.462
    99999999 : 	s1 = 327913	'98,376,761	      98,167,489	1.002
    999999999 : 	s1 = 1279137	'646,580,978	     663,354,535	0.973
    Obviously, the biggest speed increase comes from using the various divide routines (32, 24, 16, 8), and once you get into the really big numbers, you lose cycles.
    In a bit, I'm going to load this into my protoboard and see what happens (PBP div vs. Ski_Opt divides) as far as accuracy goes.
    I still think shifting left and cutting the loop count ahead of time has merit. As I said, I know I did it on that 6809E back in the day. I just can't remember how...

    EDIT: Added the error checking code back in and put a WATCH on the error counter variable in MPSIM.
    So far, the only error I've gotten is on 0/0. Ski_Opt comes back with both zero's. PBP comes back with a max'd out quotient (2^32 -1), 0 in the remainder.
    Slow going in the sim with a step of 1 running from 0 to (2^31 -1)....
    Just calculated out the situation as described in the line above...
    Running on the SIM on my laptop (Dell Insp. 8200 P4 @ 1.7Ghz), it could take about 571,232,829 years, 5 months, 13 days to complete the loop as written!!!
    Ya...not so much!!!
    Last edited by skimask; - 14th September 2008 at 03:49.

  36. #36
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Well, after letting it run a couple of days into the higher numbers, I've found more errors, random bit/byte errors (at least I can't find a pattern to them). Either my PIC is screwing up the data, or my code with ALL of the optimizations is jacked up somewhere. So, after screwing around with it for a couple of days, I give up!
    One thing that does seem to work correctly though is the '4 different divides' version of the code, the normal s-31 divide, and 3 other divides for unsigned 24, 16, and 8 bit divides. When I get that code cleaned up and commented, it'll be here. A fair amount of speed increase for math intensive code, as DT said, when the numbers are small. I know in a couple of my programs, the speed increase will be a welcome enhancement.

  37. #37
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Latest and greatest (so far)...
    Seems to work great for me...will probably fail miserably for others!
    Haven't packaged it up real nice yet, some comments could stand to be added yet, some removed...
    8 bit only divides run about 4.2 times faster
    16 bit only divides run about 2.4 times faster
    16/24 bit mixed divides run about 1.5 times faster
    24/32 bit mixed divides actually run about 2% slower

    But the cycle counts, and therefore the actual figures aren't 100% accurate because they include the cycles used by the For/Next loops.

    Code:
    resetplaceholder:	'18f4685 code
    DEFINE	OSC		40	'40 Mhz clock for proto work
    DEFINE	NO_CLRWDT	1	'no extra clear watchdog timer instructions
    DISABLE
    DEFINE SKI_DIV_OPT 1
    AL VAR LONG : BL VAR LONG : SkiQ VAR LONG : SkiR VAR LONG : PBPQ VAR LONG
    PBPR VAR LONG : AW VAR WORD : BW VAR WORD : RW VAR WORD : ERROR VAR BIT
    errorcount var long : indicator var word : loopcount var long
    minnum var long : maxnum var long : s1 var long
    pbpq = al / bl	'need to use at least one PBP divide to kick in DIVS for now
    'indicator commented out in ASM code during -speed- runs
    testcode:	minnum = 1000 : maxnum = 1255 : s1 = 1 : gosub dodivide
                        gosub dodivide : gosub dodivide : gosub dodivide
                        gosub dodivide : gosub dodivide : gosub dodivide
    		@ nop	;stopper - easy to find point to set a breakpoint in MPSIM
    END
    stop
    dodivide:	For AL = minnum to maxnum step s1
                       For BL = minnum to maxnum step s1
    '			indicator.lowbyte = $f0		'watch in MPSIM to show what's going on
    '			loopcount = loopcount + 1	'watch in MPSIM to show what's going on
    
    			@ MOVE?NN	_AL, R0				;use only during OPT run
    			@ MOVE?NN	_BL, R1		; AL / BL	;use only during OPT run
    			@ L?CALL	#DIVS				;use only during OPT run
    			@ MOVE?NN	R0, _SkiQ			;use only during OPT run
    			@ MOVE?NN	R2, _SkiR			;use only during OPT run
    
    '			indicator.lowbyte = $0f		'watch in MPSIM to show what's going on
    
    '			PBPQ = AL / BL					;use only during PBP run
    
    '			PBPR = AL // BL					;use only when doing -both- runs to check for errors
    '			indicator.lowbyte = 0		'watch in MPSIM to show what's going on
    '			if ( pbpq <> skiq ) or ( pbpr <> skir ) then errorcount = errorcount + 1	'watch in MPSIM to show what's going on
    		next BL
              next AL
         return
    end
    ASM
    	ifdef DIVS_USED
      LIST
    #DIVS
    	clrf	R3 + 1		;clear shiftout counter
    	clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	#divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0 + 0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    #divchkr1
    	btfss	R1 + 3, 7	; Check for R1 negative
    	bra	#divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1 + 0		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	#divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
    	endif
    	ifdef DIV_USED
      LIST
    #DIV
    		ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator
    		endif
    #divdo
    	clrf	R2 + 0		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    	movlw	32
    	movwf	R3 + 0
    		ifdef SKI_DIV_OPT
    ;check for zero case and send directly to divloop if they are zero
    	movf	R0 + 0, W
    	bnz	#divzero1
    	movf	R0 + 1, W
    	bnz	#divzero1
    	movf	R0 + 2, W
    	bnz	#divzero1
    	movf	R0 + 3, W
    	bnz	#divzero1
    	bra	#divloopa	;if R0 is zero, do #divloop
    #divzero1
    	movf	R1 + 0, W
    	bnz	SkiOpt5
    	movf	R1 + 1, W
    	bnz	SkiOpt5
    	movf	R1 + 2, W
    	bnz	SkiOpt5
    	movf	R1 + 3, W
    	bnz	SkiOpt5		;if R1 is not zero, continue OPT
    	bra	#divloopa	;if R1 is zero, do divloop
    SkiOpt5	;check if can use different divide methods (32, 24, 16, 8)
    	movlw	32		;load loop count
    	movwf	R3 + 0
    	movf	R0 + 3, W
    	bnz	#divloopa	;use normal div
    	movf	R1 + 3, W
    	bnz	#divloopa	;use normal div
    	movlw	24		;load loop count
    	movwf	R3 + 0
    	movf	R0 + 2, W
    	bnz	#divloop24a	;jump out to 24 bit
    	movf	R1 + 2, W
    	bnz	#divloop24a	;jump out to 24 bit
    	movlw	16		;load loop count
    	movwf	R3 + 0
    	movf	R0 + 1, W
    	bnz	#divloop16a	;jump out to 16 bit
    	movf	R1 + 1, W
    	bnz	#divloop16a	;jump out to 16 bit
    	movlw	8		;load loop count
    	movwf	R3 + 0
    	movf	R0 + 0, W
    	bnz	#divloop8a	;jump out to 8 bit
    	movf	R1 + 0, W
    	bnz	#divloop8a
    	movlw	32		;reload loop count with max count
    	movwf	R3 + 0
    		endif
    ;above added to speed divide operations
    #divloopa	;32 bit divide
    ;	movlw	50
    ;	movwf	_indicator + 1
    #divloop
    	rlcf	R0 + 3, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    	bc	#divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    	bcf	STATUS, 0
    #divok
    	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    	decfsz	R3, F
    	bra	#divloop
    		ifdef DIVS_USED
    	bra	#divnegchk	; Check for negative result
    		endif
    		ifdef SKI_DIV_OPT
    #divloop24a	;24 bit divide
    ;	movlw	36
    ;	movwf	_indicator + 1
    #divloop24
    	rlcf	R0 + 2, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	bc	#divok24
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	bcf	STATUS, 0
    #divok24
    	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	decfsz	R3, F
    	bra	#divloop24
    		ifdef DIVS_USED
    	bra	#divnegchk	; Check for negative result
    		endif
    #divloop16a	;16 bit divide
    ;	movlw	22
    ;	movwf	_indicator + 1
    #divloop16
    	rlcf	R0 + 1, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	bc	#divok16
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	bcf	STATUS, 0
    #divok16
    	rlcf	R0, F
    	rlcf	R0 + 1, F
    	decfsz	R3, F
    	bra	#divloop16
    		ifdef DIVS_USED
    	bra	#divnegchk	; Check for negative result
    		endif
    #divloop8a	;8 bit divide
    ;	movlw	8
    ;	movwf	_indicator + 1
    #divloop8
    	rlcf	R0, W
    	rlcf	R2, F
    	movf	R1, W
    	subwf	R2, F
    	bc	#divok8
    	movf	R1, W
    	addwf	R2, F
    	bcf	STATUS, 0
    #divok8
    	rlcf	R0, F
    	decfsz	R3, F
    	bra	#divloop8
    		endif
    		ifdef DIVS_USED
    #divnegchk
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	#divdone	; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    		endif
    #divdone
    	movf	R0 + 0,W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
    	endif
    ENDASM
    END

  38. #38
    Join Date
    May 2007
    Posts
    604


    Did you find this post helpful? Yes | No

    Default

    This may be of interest, 32x16 divide in 395 cycles:
    http://cablemodem.fibertel.com.ar/at...C%20micros.htm

  39. #39
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by rmteo View Post
    This may be of interest, 32x16 divide in 395 cycles:
    http://cablemodem.fibertel.com.ar/at...C%20micros.htm
    Aye lad...but alas, those routines are unsigned!

    This was all mainly an exercise on my part to see how stuff worked...turned out to do other stuff.
    Mainly, the routine above was meant to replace the 32/32 signed divide routine in the pbppi18l.lib library. The regular PBP library treats all math as though it were 32 bit, even if the numbers are smaller. You don't gain much by optimizing adds, subtracts, and multiplies, but divides stand to gain quite a bit if the routines can differentiate between different size numbers and just to different, faster routines accordingly. The library adds about 100 bytes to the routine, but it's only 100 bytes once, not for every divide, and it could possibly shave a load of cycles off the routine.

  40. #40
    Join Date
    May 2007
    Posts
    604


    Did you find this post helpful? Yes | No

    Default

    It was something I stumbled upon while browsing the MC forums. My signed 32x16 divide executes in 19 instruction cycles (what we talked about), hehe.

    Ski, BTW, are you located across the pond

Similar Threads

  1. Optimizing LCD commands?
    By jblackann in forum mel PIC BASIC Pro
    Replies: 1
    Last Post: - 4th December 2007, 16:30

Members who have read this thread : 1

You do not have permission to view the list of names.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts