32 bit square root

**Charles Linquis** · - 6th March 2009, 17:12

I'm trying to use the asm code found in Microchip's technical brief TB40 to accomplish a square root of a LONG.
I'm using PBPL. Sometimes it makes it through the ASM routine, sometimes not. The answer is always wrong.

Are there any PBP/ASM gurus out there that would be willing to help?

Code:

; ASM part modified version of Microchip TB40
 
       DEFINE OSC 40
                                                                                ' Got to start somewhere!
        DEFINE NO_CLRWDT 1
        DEFINE _18F8720 1
        DEFINE HSER_RCSTA 90H
        DEFINE HSER_TXSTA 24H                                                   
       
        DEFINE HSER_CLROERR 1
        DEFINE CCP1_REG PORTC
        DEFINE CCP1_BIT 2
        DEFINE LOADER_USED 1                                                    ' Bootloader
        Define USE_LFSR 1
        DEFINE ADC_BITS 10
        DEFINE ADC_SAMPLEUS 6
        
        SPBRG = 255
 
    INPUTVAL VAR LONG
 
    ARGA0 VAR     BYTE bankA SYSTEM ; various argument registers
    ARGA1 VAR     Byte bankA system
    ARGA2 VAR     Byte bankA system
    ARGA3 VAR     Byte bankA system
   
    ARG1H VAR     Byte SYSTEM
    ARG1L VAR     Byte SYSTEM
    ARG2H VAR     Byte SYSTEM
    ARG2L VAR     Byte SYSTEM

    SARG1 VAR     Byte SYSTEM
    SARG2 VAR     Byte SYSTEM

    RES1 VAR     Byte bankA system
    RES0 VAR     Byte bankA system

    SQRES0 VAR     Byte SYSTEM
    SQRES1 VAR     Byte SYSTEM
    SQRES2 VAR     Byte SYSTEM

    SQRES3 VAR     Byte SYSTEM

    BITLOC0 VAR   Byte  SYSTEM
    BITLOC1 VAR   Byte  SYSTEM
    TEMP0 VAR     Byte  SYSTEM
    TEMP1 VAR     Byte  SYSTEM
    ; *******************************************************************
    ; *******************************************************************
    ; The function of this square root routine is to determine the root
    ; to the nearest integer. At the same time the root is found at the
    ; best possible speed; therefore, the root is found a little differently
    ; for the two basic sizes of numbers, 16-bit and 32-bit. The following
    ; differentiates the two and jumps to the appropriate function.
    ; Sqrt(ARGA3:ARGA2:ARGA1:ARGA0) = RES1:RES0

     TRISC = %10111111
     pause 10
 
hello: 
 
    hserout ["hello world",13,10]

    pause 100
   
   
   For InputVal = 0 to $FFFFFFFF
      
   
   ARGA0 = INPUTVAL.BYTE0
   ARGA1 = INPUTVAL.BYTE1
   ARGA2 = INPUTVal.BYTE2
   ARGA3 = INPUTVAL.BYTE3


ASM
 
  
Sqrt tstfsz ARGA3,1 ; determine if the number is 16-bit
        bra Sqrt32 ; or 32-bit and call the best function
        tstfsz ARGA2, 1
        bra Sqrt32
        clrf RES1, 1
        bra Sqrt16


Sqrt16 clrf TEMP0, 1 ; clear the temp solution
        movlw 0x80 ; setup the first bit
        movwf BITLOC0, 1
        movwf RES0, 1
Square8 movf RES0, W, 1 ; square the guess
        mulwf RES0, 1
        movf PRODL, W, 1 ; ARGA - PROD test
        subwf ARGA0, W, 1
        movf PRODH, W, 1
        subwfb ARGA1, W, 1
        btfsc STATUS, C, 1
        bra NextBit ; if positive then next bit
    ; if negative then rotate right
        movff TEMP0, RES0 ; move last good value back into RES0
        rrncf BITLOC0, F, 1 ; then rotote the bit and put it
        movf BITLOC0, W, 1 ; back into RES0
        iorwf RES0, F, 1
        btfsc BITLOC0, 7, 1; if last value was tested then get
        bra Done ; out
        bra Square8 ; elso go back for another test
NextBit movff RES0, TEMP0 ; copy the last good approximation
        rrncf BITLOC0, F, 1 ; rotate the bit location register
        movf BITLOC0, W, 1
        iorwf RES0, F, 1
        btfsc BITLOC0, 7, 1 ; if last value was tested then get
        bra Done ; out
        bra Square8
Done movff TEMP0,RES0 ; put the final result in RES0
        bra TotallyDone
    
   
   
Sqrt32 clrf TEMP0, 1 ; clear the temp solution
        clrf TEMP1, 1
        clrf BITLOC0, 1 ; setup the first bit
        clrf RES0, 1
        movlw 0x80
        movwf BITLOC1, 1 ; BitLoc = 0x8000
        movwf RES1, 1 ; RES = 0x8000
Squar16 movff RES0, ARG1L ; square the guess
        movff RES1, ARG1H
        call Sq16
        movf SQRES0, W, 1 ; ARGA - PROD test
        subwf ARGA0, W, 1
        movf SQRES1, W, 1
        subwfb ARGA1, W, 1
        movf SQRES2, W, 1
        subwfb ARGA2, W, 1
        movf SQRES3, W, 1
        subwfb ARGA3, W, 1
        btfsc STATUS, C, 1
        bra NxtBt16 ; if positive then next bit
    ; if negative then rotate right
        addlw 0x00 ; clear carry
        movff TEMP0, RES0 ; move last good value back into RES0
        movff TEMP1, RES1
        rrcf BITLOC1, F, 1 ; then rotote the bit and put it
        rrcf BITLOC0, F, 1
        movf BITLOC1, W, 1 ; back into RES1:RES0
        iorwf RES1, F, 1
        movf BITLOC0, W, 1
        iorwf RES0, F, 1
        btfsc STATUS, C, 1 ; if last value was tested then get
        bra Done32 ; out
        bra Squar16 ; elso go back for another test
NxtBt16 addlw 0x00 ; clear carry
        movff RES0, TEMP0 ; copy the last good approximation
        movff RES1, TEMP1
        rrcf BITLOC1, F, 1 ; rotate the bit location register
        rrcf BITLOC0, F, 1
        movf BITLOC1, W, 1 ; and put back into RES1:RES0
        iorwf RES1, F, 1
        movf BITLOC0, W, 1
        iorwf RES0, F, 1


        btfsc STATUS, C, 1 ; if last value was tested then get
        bra Done32 ; out
        bra Squar16
Done32 movff TEMP0,RES0 ; put the final result in RES1:RES0
        movff TEMP1,RES1
        bra TotallyDone

  
Sq16 movf ARG1L, W, 1
        mulwf ARG1L ; ARG1L * ARG2L ->
    ; PRODH:PRODL
        movff PRODH, SQRES1 ;
        movff PRODL, SQRES0 ;
        movf ARG1H, W, 1
        mulwf ARG1H ; ARG1H * ARG2H ->
    ; PRODH:PRODL
        movff PRODH, SQRES3 ;
        movff PRODL, SQRES2 ;
        movf ARG1L, W, 1
        mulwf ARG1H ; ARG1L * ARG2H ->
    ; PRODH:PRODL
        movf PRODL, W, 1 ;
        addwf SQRES1, F, 1 ; Add cross
        movf PRODH, W, 1 ; products
        addwfc SQRES2, F, 1 ;
        clrf WREG, 1 ;
        addwfc SQRES3, F, 1 ;
        movf ARG1H, W, 1 ;
        mulwf ARG1L ; ARG1H * ARG2L ->
    ; PRODH:PRODL
        movf PRODL, W, 1 ;
        addwf SQRES1, F, 1 ; Add cross
        movf PRODH, W, 1 ; products
        addwfc SQRES2, F, 1 ;
        clrf WREG, W ;
        addwfc SQRES3, F, 1 ;
        return

TotallyDone    
    
ENDASM

  HSEROUT ["outputval = ",HEX2 RES1,HEX2 RES0,13,10]

  next inputval

end

**Acetronics2** · - 6th March 2009, 17:25

Hi, Charles

Why not use the PBPL "SQR" function ... ???

it also extracts a 32 bits number square root.

I can Understand you're looking for "something different" or smaller in code ... but ???

Alain

**Charles Linquis** · - 6th March 2009, 18:49

I didn't realize PBPL had that ability! Thanks! I'm going to "time" that routine, though. It has to run as quickly as possible.

**Charles Linquis** · - 6th March 2009, 19:07

For those who are interested -

Taking the SQR of a LONG in PBPL takes about 650- 760 processor cycles, depending on the number.

**tenaja** · - 6th March 2009, 22:51

Originally Posted by Charles Linquis

For those who are interested -

Taking the SQR of a LONG in PBPL takes about 650- 760 processor cycles, depending on the number.

If it is very time sensitive, you might consider Proton. It does the sqrt in about 380 cycles. It is by far the most efficient sqrt routine I've found, and I tested about a half dozen last year, most of them in asm. I don't know of any other compilers that use the routine, but it sure is tight. It's not original, though--I saw the asm code in a couple places. You'll have to search, though, because I don't have my notes on it.

**Charles Linquis** · - 7th March 2009, 01:34

Thanks! I'm at the point where a hundred cycles makes the difference whether my ISR gets messed up or not.

**Acetronics2** · - 7th March 2009, 10:38

Hi, Charles

Did you see that one ???

http://www.piclist.com/techref/micro...qrt/sqrt32.htm

Alain

**Charles Linquis** · - 7th March 2009, 18:27

No, I missed that one. I'll give it a try.
I'm calculating the RMS currents of several non-sinusoidal waveforms simultaneously, along with checking limits and times, and sending the results on to another PIC for display and data collection. I have to sample frequently to get enough accuracy, and I have to do everything on a cycle-by-cycle basis.

**Charles Linquis** · - 10th March 2009, 06:28

I see that the 'movfp' instruction is used in the example. I'm not familiar with the 16F or 17F parts. Is that instruction equivalent to the 18F instruction 'movf' ?

**Acetronics2** · - 10th March 2009, 12:21

Hi, Charles

it's just a particular Movf for PIC 17 ...

from f : Register file address (00h to FFh)
to p : Peripheral register file address (00h to 1Fh)

I do not see any warning to use a simple " Movf " with a 18F Device ... ( banking may be ??? )

Alain

**DSaum** · - 22nd March 2009, 00:01

Charles

How accurate does your square root have to be?

I have some code to do roots within 1% by a Taylor expansion with some relatively fast pbp code using sqr, >>, <<, +, - and a /.

Similarly I have additional code to do roots to 1% when the exponents
are near square, e.g. 0.45 or 0.65 rather than 0.50

My application is measuring air flow rate from Pitot tube type differential
pressure measurements.

**Charles Linquis** · - 22nd March 2009, 00:45

1% is good enough. I'm doing RMS current, and I'm sampling 4 channels on a 740uSec interrupt. I have to be able to do all A/D + squaring + avg + sqr in
that interval. I'm already "interleaving" - doing the math on the previous sample while the A/D is converting, but I can't quite get everything done in the allocated period - even while running 40MHz. I don't have to gain too much to make it work.

I would definitely be interested in your code.

**DSaum** · - 22nd March 2009, 19:09

Charles

Give this a try

'----------- Start PicBasicPro Code--------
'Taylor expansion to compute 1% accurate square roots scaled up by 64
'Copyright by Dave Saum 2009, no rights reserved
'Please email me if you find errors or improvements: DSaum at infiltec dot com
'
'Variables:
Sqr64x var word 'output 1% accurate sq root scaled up by 64
Input var word 'input integer for sq root
'
'Calculation:
Sqr64x=sqr Input 'first approximation to square root, unscaled
Sqr64x=(Sqr64x<<6)+((((Input-(Sqr64x*Sqr64x))<<5)+(Sqr64x>>1))/Sqr64x) 'more accurate sqr root, scaled up by 64
'
'Sample Calculation:
'If you take the sq root of 5 and scale it up by 64, the answer is (5^0.5)*64=143.108
'If you use the PBP sqr function for this: sqr(5)*64 = 2*64 = 128 or 10.6% error under 143.108
'But, using the Sqr64x code above:
'Sqr64x= (2*64)+((((5-(2*2))*32)+(2/2))/2)
' = 128+(((32+1))/2 = 128 +16 = 144 or 0.6% error over 143.108
'------------- end code ------

Hope this helps,

Dave
http://www.infiltec.com/seismo
http://www.infiltec.com/Infrasound@home
http://www.infiltec.com/SID-GRB@home
http://www.infiltec.com/inf-fun.htm

**DSaum** · - 22nd March 2009, 21:56

Charles

One the other hand, a faster and simpler way to go might
be to just scale up the inputs to be sq rooted as
much as possible with shifting rather than
multiplication or division, use the pbp sqr routine, and then
shift the scale out. There would be no multiplies or
divides or adds or subtracts. Here is some code that I
have not tested completely. It would take less
memory if the only the shift factors were in
the case statements and a single sqr and shift
were after the case statements. The final shift might
include a rounding term for more accuracy at the cost
of an additional add.

'---------------start PicBasicPro code --------------
'Shift routine to compute 1% accurate square roots scaled up by 64
'Copyright by Dave Saum 2009, no rights reserved
'Please email me if you find errors or improvements: DSaum at infiltec dot com
'
'Variables:
DispNumber4x var word ' input number whose sq root is to be calculated
AD1 var word 'dummy
AD2 var word 'output square root estimate scaled up by 64

'Calculations:
Select Case DispNumber4x ' Find maximum scale factors for input

Case DispNumber4x<4 ' DispNumber4x < 2^16/2^7*2^7=2^2 =4
AD1 = DispNumber4x<<14 ' Example 2, pbp: sqr(2*16384)/2=181/2= 90 vs float: 64*2^.05=1.414*64= 90.496, (0.5%)
AD2= (sqr AD1)>>1 ' Scale root up by 2^6=64 by using part of 2^7 scale

Case DispNumber4x<16 ' DispNumber4x < 2^16/2^6*2^6=2^4 =16
AD1 = DispNumber4x<<12 ' Example 10, pbp: sqr(10*4096)/1=202/1= 202 vs float: 64*10^.05=3.162*64= 202.39, (0.2%)
AD2= (sqr AD1)>>0 ' Scale root up by 2^6=64 by using part of 2^6 scale

Case DispNumber4x<64 ' DispNumber4x < 2^16/2^5*2^5=2^6 =64
AD1 = DispNumber4x<<10 ' Example 45, pbp: sqr(45*1024)*2=214*2= 428 vs float: 64*45^.05=6.708*64= 429.32, (0.3%)
AD2= (sqr AD1)<<1 ' Scale root up by 2^6=64 by using part of 2^5 scale

Case DispNumber4x<256 ' DispNumber4x < 2^16/2^4*2^4=2^8 =256
AD1 = DispNumber4x<<8
AD2= (sqr AD1)<<2 ' Scale root up by 2^6=64 by using part of 2^4 scale

Case DispNumber4x<1024 ' DispNumber4x < 2^16/2^3*2^3=2^10 =1024
AD1 = DispNumber4x<<6
AD2= (sqr AD1)<<3 ' Scale root up by 2^6=64 by using part of 2^3 scale

Case DispNumber4x<4096 ' DispNumber4x < 2^16/2^2*2^2=2^12 =4096
AD1 = DispNumber4x<<4
AD2= (sqr AD1)<<4 ' Scale root up by 2^6=64 by using part of 2^2 scale

Case DispNumber4x<16384 ' DispNumber4x < 2^16/2^1*2^1=2^14 =16304
AD1 = DispNumber4x<<2
AD2= (sqr AD1)<<5 ' Scale root up by 2^6=64 by using part of 2^1 scale

Case DispNumber4x<65534 ' DispNumber4x < 2^16/2^0*2^0=2^16 =65554
AD1 = DispNumber4x<<0
AD2= (sqr AD1)<<6 ' Scale root up by 2^6=64 by using part of 2^0 scale

End select

'--------------end pbp code

Hope this helps,

Dave
http://www.infiltec.com/seismo
http://www.infiltec.com/Infrasound@home
http://www.infiltec.com/SID-GRB@home
http://www.infiltec.com/inf-fun.htm

**Charles Linquis** · - 23rd March 2009, 02:23

Thanks to all who replied. I have all the other routines done and working. Now I just have to speed up the SQR routine. I'll try them all and post my results.

Thanks again.

**ScaleRobotics** · - 1st May 2009, 17:41

I compared your assembly code to the original Microchip TB040 to see if I could make it work. I did notice that a = 0 in the document, and in your code, every place that a was, now = 1. I am not very good at assembly, so I would like to know the difference between:

Code:

        movwf BITLOC0, 1

and

Code:

        movwf BITLOC0, 0

But, after doing this, I was able to get both the 16 bit and 32 bit working. I did not try setting them all back to 1, to see if that made a difference. But I did accidentally leave in one "1", and it did not work until I made it a zero. I am very curious to know the speed differences of each, if you have done any tests.

Thanks,

Walter

Code:

ASM
 
  
Sqrt tstfsz ARGA3,0 ; determine if the number is 16-bit
        bra Sqrt32 ; or 32-bit and call the best function
        tstfsz ARGA2, 0
        bra Sqrt32
        clrf RES1, 0
        bra Sqrt16


Sqrt16 clrf TEMP0, 0 ; clear the temp solution
        movlw 0x80 ; setup the first bit
        movwf BITLOC0, 0
        movwf RES0, 0
Square8 movf RES0, W, 0 ; square the guess
        mulwf RES0, 0
        movf PRODL, W, 0 ; ARGA - PROD test
        subwf ARGA0, W, 0
        movf PRODH, W, 0
        subwfb ARGA1, W, 0
        btfsc STATUS, C, 0
        bra NextBit ; if positive then next bit
    ; if negative then rotate right
        movff TEMP0, RES0 ; move last good value back into RES0
        rrncf BITLOC0, F, 0 ; then rotote the bit and put it
        movf BITLOC0, W, 0 ; back into RES0
        iorwf RES0, F, 0
        btfsc BITLOC0, 7, 0; if last value was tested then get
        bra Done ; out
        bra Square8 ; elso go back for another test
NextBit movff RES0, TEMP0 ; copy the last good approximation
        rrncf BITLOC0, F, 0 ; rotate the bit location register
        movf BITLOC0, W, 0
        iorwf RES0, F, 0
        btfsc BITLOC0, 7, 0 ; if last value was tested then get
        bra Done ; out
        bra Square8
Done movff TEMP0,RES0 ; put the final result in RES0
        bra TotallyDone
    
   
   
Sqrt32 clrf TEMP0, 0 ; clear the temp solution
        clrf TEMP1, 
        clrf BITLOC0, 0 ; setup the first bit
        clrf RES0, 0
        movlw 0x80
        movwf BITLOC1, 0 ; BitLoc = 0x8000
        movwf RES1, 0 ; RES = 0x8000
Squar16 movff RES0, ARG1L ; square the guess
        movff RES1, ARG1H
        call Sq16
        movf SQRES0, W, 0 ; ARGA - PROD test
        subwf ARGA0, W, 0
        movf SQRES1, W, 0
        subwfb ARGA1, W, 0
        movf SQRES2, W, 0
        subwfb ARGA2, W, 0
        movf SQRES3, W, 0
        subwfb ARGA3, W, 0
        btfsc STATUS, C, 0
        bra NxtBt16 ; if positive then next bit
    ; if negative then rotate right
        addlw 0x00 ; clear carry
        movff TEMP0, RES0 ; move last good value back into RES0
        movff TEMP1, RES1
        rrcf BITLOC1, F, 0 ; then rotote the bit and put it
        rrcf BITLOC0, F, 0
        movf BITLOC1, W, 0 ; back into RES1:RES0
        iorwf RES1, F, 0
        movf BITLOC0, W, 0
        iorwf RES0, F, 0
        btfsc STATUS, C, 0 ; if last value was tested then get
        bra Done32 ; out
        bra Squar16 ; elso go back for another test
NxtBt16 addlw 0x00 ; clear carry
        movff RES0, TEMP0 ; copy the last good approximation
        movff RES1, TEMP1
        rrcf BITLOC1, F, 0 ; rotate the bit location register
        rrcf BITLOC0, F, 0
        movf BITLOC1, W, 0 ; and put back into RES1:RES0
        iorwf RES1, F, 0
        movf BITLOC0, W, 0
        iorwf RES0, F, 0


        btfsc STATUS, C, 0 ; if last value was tested then get
        bra Done32 ; out
        bra Squar16
Done32 movff TEMP0,RES0 ; put the final result in RES1:RES0
        movff TEMP1,RES1
        bra TotallyDone

  
Sq16 movf ARG1L, W, 0
        mulwf ARG1L ; ARG1L * ARG2L ->
    ; PRODH:PRODL
        movff PRODH, SQRES1 ;
        movff PRODL, SQRES0 ;
        movf ARG1H, W, 0
        mulwf ARG1H ; ARG1H * ARG2H ->
    ; PRODH:PRODL
        movff PRODH, SQRES3 ;
        movff PRODL, SQRES2 ;
        movf ARG1L, W, 0
        mulwf ARG1H ; ARG1L * ARG2H ->
    ; PRODH:PRODL
        movf PRODL, W, 0 ;
        addwf SQRES1, F, 0 ; Add cross
        movf PRODH, W, 0 ; products
        addwfc SQRES2, F, 0 ;
        clrf WREG, 0 ;
        addwfc SQRES3, F, 0 ;
        movf ARG1H, W, 0 ;
        mulwf ARG1L ; ARG1H * ARG2L ->
    ; PRODH:PRODL
        movf PRODL, W, 0 ;
        addwf SQRES1, F, 0 ; Add cross
        movf PRODH, W, 0 ; products
        addwfc SQRES2, F, 0 ;
        clrf WREG, W ;
        addwfc SQRES3, F, 0 ;
        return

TotallyDone    
    
ENDASM

**ScaleRobotics** · - 1st May 2009, 19:14

I used LCDOUT instead of your serial routine though.

**Charles Linquis** · - 2nd May 2009, 02:39

I'll try timing it Monday. I eventually got everything to work using the SQR routine in PBPL. I gained some extra time by not using the ADCIN command. Instead, I start a channel, then do the computation for the previous channel. Not having to wait on the A/D bought me enough time to get everything done. I'm doing true RMS on 4 channels simultaneously using an '8723 at 40MHz.

**ScaleRobotics** · - 2nd May 2009, 18:25

Originally Posted by Charles Linquis

I'll try timing it Monday.

Thanks Charles,

No worries, I was just curious if pbpl was any slower compared to using pbp with the square root assembly code. Since pbpl uses a bit more space, I was curious to know if it was faster, or slower. It appears that a 20mhz chip can do about 5 or 6 of these in 1 ms using the assembly include file.

If anyone is interested, to make things easier, I have attached it as an include file. It can only be used on PIC18 chips, and according to TB040, must be modified for use with PIC17 devices that have a hardware multiplier. But if you did not have a new version of pbp (that had pbpl included), this would allow you to perform 32 bit square root. And it is much smaller than compiling in pbpl.

To use, load argh with the upper 16 bits, and argl with the lower 16 bits, then call square. Result will be in word variable RES.

Code:

INCLUDE "square.pbp"

'some defines here
main:
'and a little bit of code there....

ARGH = $0001       'load upper 16 bits into argument (any value you want)
ARGL = $ffff       'load lower 16 bits into argument (any value you want)
call square        'call square assembly function
lcdout $FE,1,#RES  'print result to lcd

Here are the results from codetimer.bas:
Time: 84.66328 usec
OSC Freq: 48 Mhz

**ScaleRobotics** · - 6th May 2009, 04:16

Well, I was wrong. PBPL won my crude speed comparison, and was 1.4 times faster when compared to the above TB040 assembly code. I had thought that Microchip's TB040 code would be pretty optimized, but that is pretty impressive MeLabs! Maybe I can stop being afraid to us pbpl now!

**Bruce** · - 6th May 2009, 04:37

You would flip if you knew some of the big manufacturers we've sold PBP to that have
produced multi-million dollar gadgets with it.

It's a lot more solid than some folks give it credit for.

Thread: 32 bit square root

Thread Tools

Search Thread

Display

32 bit square root

square roots within 1%?

pbp code for 1% accurate square root

using shifts to get 1% square roots with pbp?

Assembly code now working

Square Root 16 and 32 bit include file

Similar Threads

Bits, Bytes Words and Arrays

Best way to get the square root of a 16/32 bit unsigned integer

PICBasic newbie problem

32 bit data displaying on LCD

USART interrupt not interrupting right

Members who have read this thread : 0

Bookmarks

Bookmarks

Posting Permissions