pbp code for 1% accurate square root
Charles
Give this a try
'----------- Start PicBasicPro Code--------
'Taylor expansion to compute 1% accurate square roots scaled up by 64
'Copyright by Dave Saum 2009, no rights reserved
'Please email me if you find errors or improvements: DSaum at infiltec dot com
'
'Variables:
Sqr64x var word 'output 1% accurate sq root scaled up by 64
Input var word 'input integer for sq root
'
'Calculation:
Sqr64x=sqr Input 'first approximation to square root, unscaled
Sqr64x=(Sqr64x<<6)+((((Input-(Sqr64x*Sqr64x))<<5)+(Sqr64x>>1))/Sqr64x) 'more accurate sqr root, scaled up by 64
'
'Sample Calculation:
'If you take the sq root of 5 and scale it up by 64, the answer is (5^0.5)*64=143.108
'If you use the PBP sqr function for this: sqr(5)*64 = 2*64 = 128 or 10.6% error under 143.108
'But, using the Sqr64x code above:
'Sqr64x= (2*64)+((((5-(2*2))*32)+(2/2))/2)
' = 128+(((32+1))/2 = 128 +16 = 144 or 0.6% error over 143.108
'------------- end code ------
Hope this helps,
Dave
http://www.infiltec.com/seismo
http://www.infiltec.com/Infrasound@home
http://www.infiltec.com/SID-GRB@home
http://www.infiltec.com/inf-fun.htm
using shifts to get 1% square roots with pbp?
Charles
One the other hand, a faster and simpler way to go might
be to just scale up the inputs to be sq rooted as
much as possible with shifting rather than
multiplication or division, use the pbp sqr routine, and then
shift the scale out. There would be no multiplies or
divides or adds or subtracts. Here is some code that I
have not tested completely. It would take less
memory if the only the shift factors were in
the case statements and a single sqr and shift
were after the case statements. The final shift might
include a rounding term for more accuracy at the cost
of an additional add.
'---------------start PicBasicPro code --------------
'Shift routine to compute 1% accurate square roots scaled up by 64
'Copyright by Dave Saum 2009, no rights reserved
'Please email me if you find errors or improvements: DSaum at infiltec dot com
'
'Variables:
DispNumber4x var word ' input number whose sq root is to be calculated
AD1 var word 'dummy
AD2 var word 'output square root estimate scaled up by 64
'Calculations:
Select Case DispNumber4x ' Find maximum scale factors for input
Case DispNumber4x<4 ' DispNumber4x < 2^16/2^7*2^7=2^2 =4
AD1 = DispNumber4x<<14 ' Example 2, pbp: sqr(2*16384)/2=181/2= 90 vs float: 64*2^.05=1.414*64= 90.496, (0.5%)
AD2= (sqr AD1)>>1 ' Scale root up by 2^6=64 by using part of 2^7 scale
Case DispNumber4x<16 ' DispNumber4x < 2^16/2^6*2^6=2^4 =16
AD1 = DispNumber4x<<12 ' Example 10, pbp: sqr(10*4096)/1=202/1= 202 vs float: 64*10^.05=3.162*64= 202.39, (0.2%)
AD2= (sqr AD1)>>0 ' Scale root up by 2^6=64 by using part of 2^6 scale
Case DispNumber4x<64 ' DispNumber4x < 2^16/2^5*2^5=2^6 =64
AD1 = DispNumber4x<<10 ' Example 45, pbp: sqr(45*1024)*2=214*2= 428 vs float: 64*45^.05=6.708*64= 429.32, (0.3%)
AD2= (sqr AD1)<<1 ' Scale root up by 2^6=64 by using part of 2^5 scale
Case DispNumber4x<256 ' DispNumber4x < 2^16/2^4*2^4=2^8 =256
AD1 = DispNumber4x<<8
AD2= (sqr AD1)<<2 ' Scale root up by 2^6=64 by using part of 2^4 scale
Case DispNumber4x<1024 ' DispNumber4x < 2^16/2^3*2^3=2^10 =1024
AD1 = DispNumber4x<<6
AD2= (sqr AD1)<<3 ' Scale root up by 2^6=64 by using part of 2^3 scale
Case DispNumber4x<4096 ' DispNumber4x < 2^16/2^2*2^2=2^12 =4096
AD1 = DispNumber4x<<4
AD2= (sqr AD1)<<4 ' Scale root up by 2^6=64 by using part of 2^2 scale
Case DispNumber4x<16384 ' DispNumber4x < 2^16/2^1*2^1=2^14 =16304
AD1 = DispNumber4x<<2
AD2= (sqr AD1)<<5 ' Scale root up by 2^6=64 by using part of 2^1 scale
Case DispNumber4x<65534 ' DispNumber4x < 2^16/2^0*2^0=2^16 =65554
AD1 = DispNumber4x<<0
AD2= (sqr AD1)<<6 ' Scale root up by 2^6=64 by using part of 2^0 scale
End select
'--------------end pbp code
Hope this helps,
Dave
http://www.infiltec.com/seismo
http://www.infiltec.com/Infrasound@home
http://www.infiltec.com/SID-GRB@home
http://www.infiltec.com/inf-fun.htm
Assembly code now working
I compared your assembly code to the original Microchip TB040 to see if I could make it work. I did notice that a = 0 in the document, and in your code, every place that a was, now = 1. I am not very good at assembly, so I would like to know the difference between:
and
But, after doing this, I was able to get both the 16 bit and 32 bit working. I did not try setting them all back to 1, to see if that made a difference. But I did accidentally leave in one "1", and it did not work until I made it a zero. I am very curious to know the speed differences of each, if you have done any tests.
Thanks,
Walter
Code:
ASM
Sqrt tstfsz ARGA3,0 ; determine if the number is 16-bit
bra Sqrt32 ; or 32-bit and call the best function
tstfsz ARGA2, 0
bra Sqrt32
clrf RES1, 0
bra Sqrt16
Sqrt16 clrf TEMP0, 0 ; clear the temp solution
movlw 0x80 ; setup the first bit
movwf BITLOC0, 0
movwf RES0, 0
Square8 movf RES0, W, 0 ; square the guess
mulwf RES0, 0
movf PRODL, W, 0 ; ARGA - PROD test
subwf ARGA0, W, 0
movf PRODH, W, 0
subwfb ARGA1, W, 0
btfsc STATUS, C, 0
bra NextBit ; if positive then next bit
; if negative then rotate right
movff TEMP0, RES0 ; move last good value back into RES0
rrncf BITLOC0, F, 0 ; then rotote the bit and put it
movf BITLOC0, W, 0 ; back into RES0
iorwf RES0, F, 0
btfsc BITLOC0, 7, 0; if last value was tested then get
bra Done ; out
bra Square8 ; elso go back for another test
NextBit movff RES0, TEMP0 ; copy the last good approximation
rrncf BITLOC0, F, 0 ; rotate the bit location register
movf BITLOC0, W, 0
iorwf RES0, F, 0
btfsc BITLOC0, 7, 0 ; if last value was tested then get
bra Done ; out
bra Square8
Done movff TEMP0,RES0 ; put the final result in RES0
bra TotallyDone
Sqrt32 clrf TEMP0, 0 ; clear the temp solution
clrf TEMP1,
clrf BITLOC0, 0 ; setup the first bit
clrf RES0, 0
movlw 0x80
movwf BITLOC1, 0 ; BitLoc = 0x8000
movwf RES1, 0 ; RES = 0x8000
Squar16 movff RES0, ARG1L ; square the guess
movff RES1, ARG1H
call Sq16
movf SQRES0, W, 0 ; ARGA - PROD test
subwf ARGA0, W, 0
movf SQRES1, W, 0
subwfb ARGA1, W, 0
movf SQRES2, W, 0
subwfb ARGA2, W, 0
movf SQRES3, W, 0
subwfb ARGA3, W, 0
btfsc STATUS, C, 0
bra NxtBt16 ; if positive then next bit
; if negative then rotate right
addlw 0x00 ; clear carry
movff TEMP0, RES0 ; move last good value back into RES0
movff TEMP1, RES1
rrcf BITLOC1, F, 0 ; then rotote the bit and put it
rrcf BITLOC0, F, 0
movf BITLOC1, W, 0 ; back into RES1:RES0
iorwf RES1, F, 0
movf BITLOC0, W, 0
iorwf RES0, F, 0
btfsc STATUS, C, 0 ; if last value was tested then get
bra Done32 ; out
bra Squar16 ; elso go back for another test
NxtBt16 addlw 0x00 ; clear carry
movff RES0, TEMP0 ; copy the last good approximation
movff RES1, TEMP1
rrcf BITLOC1, F, 0 ; rotate the bit location register
rrcf BITLOC0, F, 0
movf BITLOC1, W, 0 ; and put back into RES1:RES0
iorwf RES1, F, 0
movf BITLOC0, W, 0
iorwf RES0, F, 0
btfsc STATUS, C, 0 ; if last value was tested then get
bra Done32 ; out
bra Squar16
Done32 movff TEMP0,RES0 ; put the final result in RES1:RES0
movff TEMP1,RES1
bra TotallyDone
Sq16 movf ARG1L, W, 0
mulwf ARG1L ; ARG1L * ARG2L ->
; PRODH:PRODL
movff PRODH, SQRES1 ;
movff PRODL, SQRES0 ;
movf ARG1H, W, 0
mulwf ARG1H ; ARG1H * ARG2H ->
; PRODH:PRODL
movff PRODH, SQRES3 ;
movff PRODL, SQRES2 ;
movf ARG1L, W, 0
mulwf ARG1H ; ARG1L * ARG2H ->
; PRODH:PRODL
movf PRODL, W, 0 ;
addwf SQRES1, F, 0 ; Add cross
movf PRODH, W, 0 ; products
addwfc SQRES2, F, 0 ;
clrf WREG, 0 ;
addwfc SQRES3, F, 0 ;
movf ARG1H, W, 0 ;
mulwf ARG1L ; ARG1H * ARG2L ->
; PRODH:PRODL
movf PRODL, W, 0 ;
addwf SQRES1, F, 0 ; Add cross
movf PRODH, W, 0 ; products
addwfc SQRES2, F, 0 ;
clrf WREG, W ;
addwfc SQRES3, F, 0 ;
return
TotallyDone
ENDASM
1 Attachment(s)
Square Root 16 and 32 bit include file
Quote:
Originally Posted by
Charles Linquis
I'll try timing it Monday.
Thanks Charles,
No worries, I was just curious if pbpl was any slower compared to using pbp with the square root assembly code. Since pbpl uses a bit more space, I was curious to know if it was faster, or slower. It appears that a 20mhz chip can do about 5 or 6 of these in 1 ms using the assembly include file.
If anyone is interested, to make things easier, I have attached it as an include file. It can only be used on PIC18 chips, and according to TB040, must be modified for use with PIC17 devices that have a hardware multiplier. But if you did not have a new version of pbp (that had pbpl included), this would allow you to perform 32 bit square root. And it is much smaller than compiling in pbpl.
To use, load argh with the upper 16 bits, and argl with the lower 16 bits, then call square. Result will be in word variable RES.
Code:
INCLUDE "square.pbp"
'some defines here
main:
'and a little bit of code there....
ARGH = $0001 'load upper 16 bits into argument (any value you want)
ARGL = $ffff 'load lower 16 bits into argument (any value you want)
call square 'call square assembly function
lcdout $FE,1,#RES 'print result to lcd
Here are the results from codetimer.bas:
Time: 84.66328 usec
OSC Freq: 48 Mhz