PDA

View Full Version : PIC Math - the need for speed



HenrikOlsson
- 9th October 2005, 11:34
Hi,
First post here for me, I've been on the pBasic-l list for quiet some time though. Anyway, I'd love some input on the following.

Basicly, what I'm doing is a 'realtime pulse stretcher'. As soon as an input goes low I set an output high and starts TMR1. When the input goes high I stop TMR1, get the count, add 30% to it and then delay that amount of time before I turn the output off. So I tried this:

NewPulseTime = OldPulseTime / 100
NewPulseTime = NewPulseTime * 130
Delay = NewPulseTime - OldPulseTime

That works but it takes to long (Measured it to about 600uS). So I tried this instead.

NewValue = Value >> 6 'Divide by 64.
NewValue = NewValue * 83 'Multiply by 83. 83/64=1.29, close enough
Delay = NewValue - Value 'Calculate the delay.

That works too and is good enough but it seems to take almost 320uS to run.
Does anyone have any ideas how to speed this up except for runnig the PIC faster, that is.

Thanks,
/Henrik Olsson.

Ron Marcus
- 9th October 2005, 15:45
The most important thing you omitted is how long the pulse is that you are measuring. 320 uS may be inconsequential for a 10 second pulse.
If timing is critical, then your best bet is to use an assembly interrupt that constantly looks at the input pin and calculates the stretch amount accordingly. If your processor is dedicated to the pulse stretching function, then only the number crunching needs to be in assembly. If you have a limited scope of pulse width, you could create a lookup table to return the delay, but the accuracy diminishes as the number of possible input values increases. How close to 30% of the value does the output pulse need to stretch?
A lookup table might take 12 cycles to impliment. This would be your fastest route without bumping the clock up.
It might be easier to do it in hardware. With the appropriate RC value on an unused pin, when the pulse comes in, you turn on your output pin and RC pin to start charging the capacitor. When the pulse stops, turn the pin attached to the RC network into an input, and test it to see when it becomes a low to the processor. The longer time it has to charge, the longer it takes to discharge to a "LOW" value. As long as you have the right values for your RC charge and discharge components, this should be quite accurate.

Hope this helps,
Ron

HenrikOlsson
- 9th October 2005, 19:25
Thanks Ron,

The incomming pulses are between 0.5 and 50ms and I need 0.1ms resoulution.The frequency varies from 4 to 50Hz, the pulselength wont be more than 20ms at 50Hz, of course.

Right now I'm just looping tight waiting for the pulse to come and then
set my output. That short delay is no problem. I then start TMR1 and go
back looping, waitng for the pulse to end, grab the TMR1 value and do the math.
It's the math that takes a little longer than I like. Right now it won't
work correct for pulses shorter than ~1.1ms.

The 30% is not critical it can be 28% or 32% but it needs to be consistant over the whole range ie not 28% for a 2ms pulse and 32% for 5ms pulse.

If I'd use a lookup table wouldn't that need 495 entries? (50-0.5*10) And wouldn't it take a different amount of time finding the answer depending on where in the table it's located? Would you mind filling me in on how it would work, if it's still an available solution with the span and resoultion I need.

I thought perhaps someone had an idea on how I could speed up the math even further than I managed with some secret tricks ;-)

Thanks again!
/Henrik Olsson.

Ron Marcus
- 9th October 2005, 20:12
OK. Since you are using timer 1 anyway, how about this... It will need a little ML, but not much. When your input goes high, set your output high. Set timer1 with no prescaler,to interrupt on overflow, and put overflow - 50 in it's register. When it overflows, the ML interrupt routine adds 15 to your pauseus variable, and starts it from overflow - 50 again. This goes on until your input goes low. Use pauseus with the variable you have been adding to, and when it runs out, flip your output low too. If you are running at 4 MHz, this will give you 50 uS resolution. The upper limit will be dictated by the 16 bit size of timer1. You should be able to do this up to about a 110 mS pulse length.
You might be able to write it all in PBP, but at the lower pulse widths, it might not have great resolution.

Darrel Taylor
- 9th October 2005, 20:24
Henrik,

Try this ...

Delay = OldPulseTime */ $004D

That should take 48 Instruction cycles, or 48uS @ 4Mhz
<br>

Ron Marcus
- 9th October 2005, 21:30
Henrik,

Try this ...

Delay = OldPulseTime */ $004D

That should take 48 Instruction cycles, or 48uS @ 4Mhz
<br>

OK Darrel,
I give up. What are you doing here. It sure beats my solution in simplicity, but I am not getting any where near 1.3* OldPulseTime. This equation returns the middle 16 bits of the product correct?

HenrikOlsson
- 10th October 2005, 07:21
Ron and Darrel,
Thank you for helping me with this!

Darrel, let's see if I understand this correct. The */ is like multiplying by fractions of 256 right? So */$004D is like multiplying by 1/256*77 which is 0.3, yeah... I like it. I was hoping for something like this but to be honest I hadn't (or still haven't, perhaps) grasped the concept of the */ operator.

Ron, I like your idea too. I'll think I'll go that way in case Darrel's idea doesn't work. But after really trying to understand what the */ does I think it will work.

Thanks again guys!
/Henrik Olsson.

Darrel Taylor
- 10th October 2005, 08:48
You Got It!

*/ $004D - Just like you said, it multiplies times 77/256 or 0.30078125 &nbsp; It's not perfect, but it's close.

And, it can work with bigger numbers too. For instance, to calculate NewPulseTime from your previous example, you would multiply times 1.30078125

NewPulseTime = OldPulseTime */ $014D ; * 1.30078125

Now you can see why it's easier to express the numbers in Hex. If you separate the high and low bytes 01 4D. The 01 represents whole numbers and 4D is the fractional part.

*/ $0280 = * 2.5
*/ $10C0 = * 16.75
<br>

HenrikOlsson
- 10th October 2005, 19:35
Darrel,
Thanks again! How did you come to the conclusion it would take 48 cycles? By examining the .asm file? I'm asking because I just made a quick'n'dirty test like this:

Value = 20000
Start:
GPIO.0 = 1 'Set output
Delay = Value /*$004D
GPIO.0 = 0 'Reset output
PauseUs 700
Goto Start

But when scoping the output I get about 250uS. The pause is dead on 700uS so I know the oscillator is working correct. Now 250uS is alot better than my previous 'all time low' of 320uS, so at least I'm heading the right direction. ;-)
Any ideas?

I really appreciate the help!
/Henrik Olsson

Ingvar
- 10th October 2005, 19:46
You could get 31.25% just by one addition and a couple of shiftoperations.


Dummy1 VAR WORD
Delay VAR WORD
OldPulseTime VAR WORD

Dummy1 = OldPulseTime >> 2 'Divide by 4 (=0.25)
Delay = Dummy1 >> 2 'Divide by 4 again(=0.0625)
Delay = Delay + Dummy1 'Add them to get 0.3125


Don't know how fast it will be, but it feels like it should be faster than 48 cycles. I also have a gutfeeling that the following code will be even faster.


Dummy1 VAR WORD
Delay VAR WORD
OldPulseTime VAR WORD

Dummy1 = OldPulseTime >> 1 'Divide by 2 (=0.5)
Dummy1 = Dummy1 >> 1 'Divide by 2 (=0.25)
Delay = Dummy1 >> 1 'Divide by 2 (=0.125)
Delay = Delay >> 1 'Divide by 2 (=0.0625)
Delay = Delay + Dummy1 'Add them to get 0.3125

I can't run any of the code right now, but perhaps Darrel could measure theese with one of his ingenious macros?

Darrel Taylor
- 10th October 2005, 20:38
I determined it was 48 cycles by using this procedure ...

instruction execution time
http://www.picbasic.co.uk/forum/showthread.php?t=365

I've double checked, and it's definately 48. Although my hardware is different, 18F452 @ 20Mhz, it shouldn't make that much difference.

For your test routine I get these results:
Value = 20000
Start: ' Cycles uS@20mhz
PORTD.0 = 1 ' 1 .2
Delay = Value /*$004D ' 48 9.6
PORTD.0 = 0 ' 1 .2
PauseUs 700 ' 3501 700.2
Goto Start ' 2 .4
----- -----
' Loop Total 3553 710.6

And, this I didn't expect. Here's the results from Ingvar's first example
' Cycles uS@20mhz
Dummy1 = OldPulseTime >> 2 ' 31 6.2
Delay = Dummy1 >> 2 ' 31 6.2
Delay = Delay + Dummy1 ' 4 .8
----- -----
' Total 66 13.2


But, his second one looks pretty quick. As long as .3125 is close enough.
' Cycles uS@20mhz
Dummy1 = OldPulseTime >> 1 ' 5 1.0
Dummy1 = Dummy1 >> 1 ' 3 .6
Delay = Dummy1 >> 1 ' 5 1.0
Delay = Delay >> 1 ' 3 .6
Delay = Delay + Dummy1 ' 4 .8
----- -----
' Total 20 4.0

HenrikOlsson
- 10th October 2005, 22:24
Darrel, Ingvar,
Hmm, this is interesting.
I tried Ingvars 1'st code and measured it to 82uS. Then I tried the second one and measured it to 22uS which is great! 31.25% works fine for me.

I'm still wondering though why the numbers doesn't match. Ingvars first code should take 66uS but I measure it to ~80uS. His second one should take 20 and it does, close enough. Then we have Darrel's 48 cycle code that takes ~250uS on my hardware. I just verified that again to make sure.

I would think my scope was out of calibration if it wasn't for the fact that 700uS pause I have measures correct on the scope.

The application hardware is 12F629 but the testing above is on a 16F877 @ 4MHz.

Thanks guys!
/Henrik Olsson.

Darrel Taylor
- 10th October 2005, 23:18
Apparently, your Scope works Really Well.

I ran the numbers again for a 16F877 this time, and the results are rather surprising.
Start: ' Cycles uS@20mhz
PORTD.0 = 1 ' 1 .2
Delay = Value /*$004D ' 243 48.6
PORTD.0 = 0 ' 1 .2
PauseUs 700 ' 3502 700.4
Goto Start ' 4 .8
----- -----
' Loop Total 3751 750.2
' Without Pause 249


' Cycles uS@20mhz
Dummy1 = OldPulseTime >> 2 ' 37 7.4
Delay = Dummy1 >> 2 ' 37 7.4
Delay = Delay + Dummy1 ' 6 1.2
----- -----
' Total 80 16.0


' Cycles uS@20mhz
Dummy1 = OldPulseTime >> 1 ' 5 1.0
Dummy1 = Dummy1 >> 1 ' 3 .6
Delay = Dummy1 >> 1 ' 5 1.0
Delay = Delay >> 1 ' 3 .6
Delay = Delay + Dummy1 ' 6 1.2
----- -----
' Total 22 4.2

I really didn't expect that much difference between the same code compiled for the 2 different chips.

Way to go Ingvar.

HenrikOlsson
- 11th October 2005, 06:57
Darrel,
Wow, thanks!
I agree, that's a bit strange isn't it? It's gonna be interesting to see how fast it will run on the 12F629 where it actually will reside and operate when it's finished. That timing routine of yours seems to come in handy from time to time!

Thanks again!
/Henrik Olsson.

Ingvar
- 12th October 2005, 12:12
Darrel,
You forget that your 18F uses it's hardware multiplier. The 16 series does not have one and needs to do a software multply, that's a HUGE difference. I'm a little surprised that the ">>2" takes that much longer than ">>1". I suspected around 10-20 cycles difference, not 60. It should take a little longer since it uses a looped code rather than the straight code ">>1" produces. Strange, perhaps i'll look into it sometime ............ or not.

Henrik,
You should make sure that the variables reside in the same memorybank. You do that by using the "BANK" statement. If you don't it will be slower since the pic needs to switch banks between each statement. Declare your variables like .....


Dummy1 VAR WORD BANK0
Delay VAR WORD BANK0
OldPulseTime VAR WORD BANK0


/Ingvar

PS. Henrik, vart skall jag skicka fakturan ;) DS.

Darrel Taylor
- 13th October 2005, 00:30
Right Ingvar,

That's what it is, the Hardware Multiplier.

The funny thing is that a couple of years ago, I did a comparison between 16F and 18F multiplication, and found that PBP didn't use the hardware MUL for 18F's. So, I made my own MUL routine for 18F's and have been using it ever since (when I need the speed). I hadn't looked since then, but, meLabs must have snuck it in there when I wasn't paying attention. Nothing listed in the Version History.

That's good to know!
<br>

P.S. Ingvar, Add some to that bill for me too. :)

Ingvar
- 13th October 2005, 08:27
Sure, as soon as Henrik tells me where to send it :D

HenrikOlsson
- 13th October 2005, 10:24
Ingvar,
I thought you allready had my billing adress since you helped me with that asm isr a couple of years ago.....
You're in Stockholm, right? Me and two friends are going there on Friday-Saturday to have some fun and visit the famous jazz-club Stampen. Come there and I'll buy you a beer or two! You too Darrel, but I guess it's a little too far off for you...

Things have come in between me and this project right this week but I'll let you know how it all works out.
I appreciate the help guys, I really do!

/Henrik Olsson.

Ingvar
- 14th October 2005, 08:45
Sorry Henrik, i live out my days on the westcoast, Kungsbacka to be precise. Good luck with the rest of the project ....... and don't smoke anything in a jazz-club, i hear it's not always tobacco :p Not that i'd really know since i'm not a jazzfan.