Optimizing DIV


Closed Thread
Results 1 to 40 of 42

Thread: Optimizing DIV

Hybrid View

  1. #1
    skimask's Avatar
    skimask Guest

    Default Optimizing DIV

    Quote Originally Posted by Darrel Taylor View Post
    Here's an example of measuring the time to do a 16/16 bit divide. But you can have any number of statements inbetween, as long as the time does not exceed 65535 instructions.
    An aside from another thread discussing math execution time in PBP 2.50B, where it seems that ALL add/subtract/multiply/divide operations are done as though they are signed-31-bit operations vs. differentiating between byte/word/long/mixed operations.
    IF I'm reading pbppi18L.Lib correctly, it seems that the worst case for a s31/s31 divide is about 1,127 cycles (signed, 2^30-1 divided by 1), best case looks to be about 713 cycles (unsigned, 1 divided by 0)
    Code:
    ;****************************************************************
    ;* DIV        : 32 x 32 divide                                  *
    ;* Input      : R0 / R1                                         *
    ;* Output     : R0 = quotient                                   * 
    ;*            : R2 = remainder                                  *
    ;* Notes      : R2 = R0 MOD R1                                  *
    ;****************************************************************
    
        ifdef DIVS_USED
      LIST
    DIVS	clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    divchkr1 btfss	R1 + 3, 7	; Check for R1 negative
    	bra	divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
        endif
    
        ifdef DIV_USED
      LIST
    DIV
          ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator	
          endif
    divdo	clrf	R2		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    
    	movlw	32
    
    IF R0.byte3 = 0 AND R1.byte3 = 0 then movlw 24
    and preshift R0 and R1 over 8 bits
    
    IF R0.word1 = 0 AND R1.word1 = 0 then movlw 16
    and preshift R0 and R1 over 16 bits
    
    IF R0.word1 = 0 AND R1.word1 = 0 and R0.byte1 = 0 and R1.byte1 = 0 then movlw 8
    and preshift R0 and R1 over 24 bits
    
    	movwf	R3
    
    divloop	rlcf	R0 + 3, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    
    	bc	divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    
    	bcf	STATUS, C
    
    divok	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    
    	decfsz	R3, F
    	bra	divloop
    
          ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	divdone		; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    divdone
        endif
    
    	movf	R0, W		; Get low byte to W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
        endif
    The divide loop itself takes 34 cycles per iteration for 32 iterations. Worst case, 1,088 cycles, going into the loop 25 cycles, coming out of the loop 14 cycles.
    I think, and I haven't played with it yet, in the highlighted section above, a couple of checks (in italics) could be put in there to check if the dividend and/or divisor's upper bytes (or break it down to individual bits) are cleared. If they are, then preshift R0 and R1 and lop off 8, 16, or 24 (up to 30 if checking by bit) iterations of the loop itself.
    The worst case (long dividend/divisor) would still take the full 1,127 cycles (plus some cycles for the checks), but the best case (byte dividend/divisor) could be knocked down from 713 cycles to about 24 cycles.
    Same could be said for the add, subtract and multiply routines, although gains would be minimal for each, and the PIC has the built-in single cycle 8x8 hardware multiplier.

  2. #2
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    This doesn't quite follow the Italics section, but I think It's what you meant.
    Code:
    	movlw	32             ; start with 32 loops
    	movwf	R3
    
    SkiOpt
    	movf    R0 + 3, W      ; IF R0.byte3 = 0 
    	bnz     divloop
    	movf    R1 + 3, W      ;   AND R1.byte3 = 0 then 
    	bnz     divloop
    
    	movlw   8              ;      loops - 8  ; movlw 24
    	subwf   R3, F
    
    	movff   R0 + 2, R0 + 3 ;      and preshift R0
    	movff   R0 + 1, R0 + 2
    	movff   R0 + 0, R0 + 1
    	clrf    R0
    
    	movff   R1 + 2, R1 + 3 ;      and R1 over 8 bits
    	movff   R1 + 1, R1 + 2
    	movff   R1 + 0, R1 + 1
    	clrf    R1
    
    	movf    R3, W
    	btfss   STATUS, Z      ; stop if no loop's left (0/0)
    	bra     SkiOpt
    Does it look like what you were thinking?
    <br>
    DT

  3. #3
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    This doesn't quite follow the Italics section, but I think It's what you meant...................
    Does it look like what you were thinking?
    <br>
    I was going to handle it today but got sidetracked with a bunch of welding/fabricating.

    Yep, that's pretty much exactly what I was thinking and it would, by default, kick out of the shifting loop if either R0/R1 was 'negative'.
    Then just test it out by doing LONG divides in a loop ( 0 thru 2^31, looping the divisor inside another loop for the dividend, would take a LONG-LONG time even at 40Mhz ), save the result and remainder, remultiply (which obviously hasn't been modified) add the remainder back in and see if that result is the same as the original integer. That should verify everything works correctly. After that, rewrite the code to break it down by bits instead of bytes.
    Should end up to be a significant speed increase and a relatively simple cut-and-paste into the .lib file...

  4. #4
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Code:
    ;****************************************************************
    ;* DIV        : 32 x 32 divide                                  *
    ;* Input      : R0 / R1                                         *
    ;* Output     : R0 = quotient                                   * 
    ;*            : R2 = remainder                                  *
    ;* Notes      : R2 = R0 MOD R1                                  *
    ;****************************************************************
    
        ifdef DIVS_USED
      LIST
    DIVS	clrf	R3 + 3		; Clear sign difference indicator
    	btfss	R0 + 3, 7	; Check for R0 negative
    	bra	divchkr1	; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip value to plus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    divchkr1 btfss	R1 + 3, 7	; Check for R1 negative
    	bra	divdo		; Not negative
    	btg	R3 + 3, 7	; Flip sign indicator
    	clrf	WREG		; Clear W for subtracts
    	negf	R1		; Flip value to plus
    	subfwb	R1 + 1, F
    	subfwb	R1 + 2, F
    	subfwb	R1 + 3, F
    	bra	divdo		; Skip unsigned entry
      NOLIST
    DIV_USED = 1
        endif
    
        ifdef DIV_USED
      LIST
    DIV
          ifdef DIVS_USED
    	clrf	R3 + 3		; Clear sign difference indicator	
          endif
    divdo	clrf	R2		; Do the divide
    	clrf	R2 + 1
    	clrf	R2 + 2
    	clrf	R2 + 3
    
    	movlw	32             ; start with 32 loops
    	movwf	R3
    
            ifdef SKI_DIV_SPEEDUP
    SkiOpt
    	movf    R0 + 3, W      ; IF R0.byte3 = 0 
    	bnz     divloop
    	movf    R1 + 3, W      ;   AND R1.byte3 = 0 then 
    	bnz     divloop
    
    	movlw   8              ;      loops - 8  ; movlw 24
    	subwf   R3, F
    
    	movff   R0 + 2, R0 + 3 ;      and preshift R0
    	movff   R0 + 1, R0 + 2
    	movff   R0 + 0, R0 + 1
    	clrf    R0
    
    	movff   R1 + 2, R1 + 3 ;      and R1 over 8 bits
    	movff   R1 + 1, R1 + 2
    	movff   R1 + 0, R1 + 1
    	clrf    R1
    
    	movf    R3, W
    	btfss   STATUS, Z      ; stop if no loop's left (0/0)
    	bra     SkiOpt
    
            endif
    
    divloop	rlcf	R0 + 3, W
    	rlcf	R2, F
    	rlcf	R2 + 1, F
    	rlcf	R2 + 2, F
    	rlcf	R2 + 3, F
    	movf	R1, W
    	subwf	R2, F
    	movf	R1 + 1, W
    	subwfb	R2 + 1, F
    	movf	R1 + 2, W
    	subwfb	R2 + 2, F
    	movf	R1 + 3, W
    	subwfb	R2 + 3, F
    
    	bc	divok
    	movf	R1, W
    	addwf	R2, F
    	movf	R1 + 1, W
    	addwfc	R2 + 1, F
    	movf	R1 + 2, W
    	addwfc	R2 + 2, F
    	movf	R1 + 3, W
    	addwfc	R2 + 3, F
    
    	bcf	STATUS, C
    
    divok	rlcf	R0, F
    	rlcf	R0 + 1, F
    	rlcf	R0 + 2, F
    	rlcf	R0 + 3, F
    
    	decfsz	R3, F
    	bra	divloop
    
          ifdef DIVS_USED
    	btfss	R3 + 3, 7	; Should result be negative?
    	bra	divdone		; Not negative
    	clrf	WREG		; Clear W for subtracts
    	negf	R0		; Flip quotient to minus
    	subfwb	R0 + 1, F
    	subfwb	R0 + 2, F
    	subfwb	R0 + 3, F
    	negf	R2		; Flip remainder to minus
    	subfwb	R2 + 1, F
    	subfwb	R2 + 2, F
    	subfwb	R2 + 3, F
    divdone
        endif
    
    	movf	R0, W		; Get low byte to W
    	goto	DUNN
      NOLIST
    DUNN_USED = 1
        endif
    This optimization (byte level optimize) seems to work very well on my end using MPLAB sim and the stopwatch.
    I've got to finish up a 2nd demo board with an LCD, one running the non-optimized version, one running the optimized version, just to see which one counts faster and by how much.
    I still want to change the optimization down to the bit level and have the library recognize the DEFINE as none (not defined), byte or bit.
    How do you get MPASM to recognize different parameters using DEFINE in PBP? I can't seem to get it to work right...

  5. #5
    Join Date
    Jul 2003
    Location
    Colorado Springs
    Posts
    4,959


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by skimask View Post
    How do you get MPASM to recognize different parameters using DEFINE in PBP? I can't seem to get it to work right...
    Several ways ... here's one.

    Code:
    DEFINE  OptLevel  BYTE
    
    ASM
    BIT = 1
    BYTE = 8
    WORD = 16
    LONG = 32
    
        IFDEF OptLevel  
            IF (OptLevel == BYTE)
                ; -- Byte level specified
            ENDIF
            IF (OptLevel == WORD)
                ; -- Word level specified
            ENDIF
        ENDIF
    ENDASM
    DT

  6. #6
    skimask's Avatar
    skimask Guest


    Did you find this post helpful? Yes | No

    Default

    Quote Originally Posted by Darrel Taylor View Post
    Several ways ... here's one.
    Noticed that this morning when I looked thru a couple of my .LST files.
    Not sure if my problem was the missing parentheses or the double-equal in the IF statement.

    My next step is to 'include' various levels of 32 bit divide optimization, none, byte, and bit level
    i.e.
    DEFINE SKI_DIV_OPT BYTE (speeds up divides 8 bits at a time)
    DEFINE SKI_DIV_OPT BIT (speeds up divides at the bit level)

Similar Threads

  1. Optimizing LCD commands?
    By jblackann in forum mel PIC BASIC Pro
    Replies: 1
    Last Post: - 4th December 2007, 16:30

Members who have read this thread : 0

You do not have permission to view the list of names.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts