How can I speed this code up? SHIFTOUT is slowing it down and I need a faster way.

# Thread: How can I speed this code up? SHIFTOUT is slowing it down and I need a faster way.

1. ## How can I speed this code up? SHIFTOUT is slowing it down and I need a faster way.

I think what is slowing this down is the SHIFTOUT.

I am running a 16f88 @ 16MHz and I need to know the fastest way to get through this loop:

Code:
```LOOP:
FOR DATA = 4095 to 0 step -1
GOSUB SUB1
GOSUB SUB2
NEXT
FOR DATA = 0 to 4095 step 1
GOSUB SUB1
GOSUB SUB2
NEXT
GOTO LOOP

SUB1:
FOR C1 = 0 TO 15
shiftout dpin,clk,1,[DATA]
NEXT
PORTB = %00100000
PORTB = %00000000
RETURN

SUB2:
PORTB = %00000100
PORTB = %00000000
FOR C3 = 0 TO 4095
PORTA = %00001000
PORTA = %00000000
NEXT
RETURN```

2. Did you find this post helpful? |
131,072 SHIFTOUTs in that loop and it is shifting the same value 15 times, getting another value and shifting that 15 times, again ,again.... That will tale some time.

Unless I am looking at it cross eyed...

Is the above really what you want to do?

3. Did you find this post helpful? |
Originally Posted by mackrackit
131,072 SHIFTOUTs in that loop and it is shifting the same value 15 times, getting another value and shifting that 15 times, again ,again.... That will tale some time.

Unless I am looking at it cross eyed...

Is the above really what you want to do?
Yes but not 15 times 16 times and really its not all that bad if I would be able to have it shifting say at 1MHz or more

4. Did you find this post helpful? |
I wonder if SHIFTING all 16 values at once would be faster than looping? Darn zeros
Code:
`shiftout dpin,clk,1,[DATA,DATA,DATA,DATA,DATA,...]`

5. Did you find this post helpful? |
Originally Posted by mackrackit
I wonder if SHIFTING all 16 values at once would be faster than looping? Darn zeros
Code:
`shiftout dpin,clk,1,[DATA,DATA,DATA,DATA,DATA,...]`
I dunno let me try it...
Last edited by wolwil; - 8th May 2010 at 06:54.

6. Did you find this post helpful? |
Nope The Same

The part that is slow is going from SUB1 to SUB2 not the looping through them 4096 times

I just tried taking out the GOSUB's and still the same thing

7. Did you find this post helpful? |
There are couple of workarounds available:
1) changed crystal to 20MHz
2) If Shiftout command is too slow
Code:
```<code><font color="#000000">    <b>SHIFTOUT </b>dpin,clk,1,[Dat]
</code>```
then don't use it. Do it other way (e.g code below).
Code:
```<code><font color="#000000">    dpin = Dat.0(7) : clk = 1 : clk = 0
dpin = Dat.0(6) : clk = 1 : clk = 0
dpin = Dat.0(5) : clk = 1 : clk = 0
dpin = Dat.0(4) : clk = 1 : clk = 0
dpin = Dat.0(3) : clk = 1 : clk = 0
dpin = Dat.0(2) : clk = 1 : clk = 0
dpin = Dat.0(1) : clk = 1 : clk = 0
dpin = Dat.0(0) : clk = 1 : clk = 0
</code>```
This will run much faster but consume more code space. Trade-off that you have to live with.
BTW, DATA is reserved word so therefore I changed it to Dat

BR,
-Gusse-
Last edited by Gusse; - 8th May 2010 at 09:05. Reason: Crystal comment added

8. Did you find this post helpful? |

## Nope the same.

Originally Posted by Gusse
Code:
```<code><font color="#000000">    dpin = Dat.0(7) : clk = 1 : clk = 0
dpin = Dat.0(6) : clk = 1 : clk = 0
dpin = Dat.0(5) : clk = 1 : clk = 0
dpin = Dat.0(4) : clk = 1 : clk = 0
dpin = Dat.0(3) : clk = 1 : clk = 0
dpin = Dat.0(2) : clk = 1 : clk = 0
dpin = Dat.0(1) : clk = 1 : clk = 0
dpin = Dat.0(0) : clk = 1 : clk = 0
</code>```

BR,
-Gusse-
Nope the same.

20 MHz Clock will still be too slow with SHIFTOUT.

I am assuming your code has something to do with accessing per bit in the word sized Dat variable. So if I wanted to access the 11th bit I would do this Dat.1(2) right?

Would anyone have a faster way in Assembly I could do this?

Also Does anyone know how many clock pulses SHIFTOUT uses?
Last edited by wolwil; - 8th May 2010 at 17:16.

9. Did you find this post helpful? |
Originally Posted by wolwil
Nope the same.

20 MHz Clock will still be too slow with SHIFTOUT.
In your 1st post you are saying that you are running @16MHz. 20MHz is 25% faster than you present system.

Originally Posted by wolwil
I am assuming your code has something to do with accessing per bit in the word sized Dat variable. So if I wanted to access the 11th bit I would do this Dat.1(2) right?

Would anyone have a faster way in Assembly I could do this?

Also Does anyone know how many clock pulses SHIFTOUT uses?
Code does exactly the same as SHIFTOUT but just little bit faster.
If this didn't help then SHIFTOUT is not the bottleneck.
Keep looking other solutions.

11th bit would be Dat.0(10).
Example below (remember MSBFIRST).
Code:
```<code><font color="#000000">    Dat     VAR BYTE [2]

dpin = Dat.0(7) : clk = 1 : clk = 0
dpin = Dat.0(6) : clk = 1 : clk = 0
dpin = Dat.0(5) : clk = 1 : clk = 0
dpin = Dat.0(4) : clk = 1 : clk = 0
dpin = Dat.0(3) : clk = 1 : clk = 0
dpin = Dat.0(2) : clk = 1 : clk = 0
dpin = Dat.0(1) : clk = 1 : clk = 0
dpin = Dat.0(0) : clk = 1 : clk = 0

dpin = Dat.0(15) : clk = 1 : clk = 0
dpin = Dat.0(14) : clk = 1 : clk = 0
dpin = Dat.0(13) : clk = 1 : clk = 0
dpin = Dat.0(12) : clk = 1 : clk = 0
dpin = Dat.0(11) : clk = 1 : clk = 0
dpin = Dat.0(10) : clk = 1 : clk = 0  <font color="#000080"><i>'&lt;- 11th
</i></font>dpin = Dat.0(9)  : clk = 1 : clk = 0
dpin = Dat.0(8)  : clk = 1 : clk = 0

</code>```
EDIT: If you or anybody know faster SHIFTOUT workaround with PBP, I would be interested.

BR,
-Gusse-
Last edited by Gusse; - 8th May 2010 at 18:16. Reason: EDIT

10. Did you find this post helpful? |
Originally Posted by Gusse
Code does exactly the same as SHIFTOUT but just little bit faster.
If this didn't help then SHIFTOUT is not the bottleneck.
Keep looking other solutions.

BR,
-Gusse-
Well I have come to the realization that it is not the SHIFTOUT slowing me down but the pulsing of PORTA.3 for 4096 times. I need to get this to be faster.

For better help on this I am going to add on to an older post that deals with the chip I am using so I don't create multiples on here.

11. Did you find this post helpful? |
Use a 16 bit counter in assembly to count to 0FFF.
In that loop, use the instruction -

btg LATA.3

You can't get much quicker than that.

I haven't counted exactly, but it looks to be under 14 cycles. At 40Mhz, that is
1.4uSEC = 714 Khz.

12. Did you find this post helpful? |
I should have stated that it is 1.4uSec per CYCLE. So 4096 cycles would take 5.8 milliseconds.

13. Did you find this post helpful? |
Originally Posted by Charles Linquis
Use a 16 bit counter in assembly to count to 0FFF.
In that loop, use the instruction -

btg LATA.3

You can't get much quicker than that.

I haven't counted exactly, but it looks to be under 14 cycles. At 40Mhz, that is
1.4uSEC = 714 Khz.
so do I just replace this:
Code:
```	FOR C3 = 0 TO 4095
PORTA = %00001000
PORTA = %00000000
NEXT```
with this:
Code:
```FOR C3 = 0 TO 4095
btg LATA.3
NEXT```
I do not know assembly at all thats why I went with PBP and keep in mind I am using a 16MHz Clock but I will be switching it up to a 20MHz
Last edited by wolwil; - 9th May 2010 at 00:19.

14. Did you find this post helpful? |
No, it will take a bit more than that.

I would have posted something more complete, but I don't have access to my development board until Monday. When I get to my board, I'll be able to send you something that has been tested.

15. Did you find this post helpful? |
Originally Posted by Charles Linquis
No, it will take a bit more than that.

I would have posted something more complete, but I don't have access to my development board until Monday. When I get to my board, I'll be able to send you something that has been tested.
That would be awesome! Thanks

16. Did you find this post helpful? |
Originally Posted by wolwil
That would be awesome! Thanks

It will take me a bit longer than I thought. I think you said you were using a 16F chip. Bad news for me! The 16Fs are missing some of my favorite ASM instructions.

I don't have any 16F stuff lying around, I'll have to find one.

17. Did you find this post helpful? |
Originally Posted by Charles Linquis
It will take me a bit longer than I thought. I think you said you were using a 16F chip. Bad news for me! The 16Fs are missing some of my favorite ASM instructions.

I don't have any 16F stuff lying around, I'll have to find one.
Thats alright I have been trying to get it to work with the HPWM but I just dont quite understand how to get it to work. I put a LED on PORTB.0 and typed the following code but nothing happens to the light:

Code:
```DEFINE OCS 4

C1 VAR WORD

DEFINE CCP1_REG PORTB   'no clue what this does but I think I need it for the following line
DEFINE CCP1_BIT 1     ' because 1 = portb.0 according to the manual I think?

FOR C1 = 0 to 255
HPWM 1,C1,1000
NEXT```
I was thinking I could use this for my clock pulse of 4096 times like this:

Code:
```FOR C1 = 0 TO 15
HPWM 1,127,frequency '50% duty/square wave
NEXT```
for right now I have a second external clock hooked up that I am turning on with throwing a pin high for a couple milliseconds and it fixed what I am trying to fix but its not exact like I need it to be.

Thanks again for your time with this!

18. Did you find this post helpful? |
Hi,
Some PIC's can map the output of the CCP-module to different pins, the 'F88 can map it to either RB0 or RB3. The defines you have tells PBP to generate code to map the CCP1 output to the pin you specify. However, BIT 1 as you've specified means it tries to map it to PORTB.1 which isn't valid.

/Henrik.

19. Did you find this post helpful? |

## How about ASM replacement for PBP Shiftout?

Hi Wolwil,

You could replace PBP SHIFTOUT with attached ASM code example and reduce time spend in SHIFTOUT + loop to ~1/8. This is kind of tested, but no promises ... Check if it is usefull for you.

PBP_shiftout execution time was around 814uS, but ASM_shiftout took only 104uS to do the same.

Files:
PBP_shiftout.pbp.txt <- Original, copied from 1st post
ASM_shiftout.pbp.txt
(Just remove .txt from end of files)

I know that ASM version is spaghetti code but I don' really care
So long time since last time when I did something with ASM.

BR,
-Gusse-
Last edited by Gusse; - 11th May 2010 at 20:18.

20. Did you find this post helpful? |
Originally Posted by HenrikOlsson
Hi,
However, BIT 1 as you've specified means it tries to map it to PORTB.1 which isn't valid.

/Henrik.
See I was confused because in the manual it said:
bit 3-0 CCP1M<3:0>: CCP1 Mode Select bits
0000 = Capture/Compare/PWM disabled (resets CCP1 module)
0100 = Capture mode, every falling edge
0101 = Capture mode, every rising edge
0110 = Capture mode, every 4th rising edge
0111 = Capture mode, every 16th rising edge
1000 = Compare mode, set output on match (CCP1IF bit is set)
1001 = Compare mode, clear output on match (CCP1IF bit is set)
1010 = Compare mode, generate software interrupt on match (CCP1IF bit is set, CCP1 pin is
unaffected)
1011 = Compare mode, trigger special event (CCP1IF bit is set, CCP1 pin is unaffected); CCP1
resets TMR1 and starts an A/D conversion (if A/D module is enabled)
11xx = PWM mode
So I was thinking the xx was what I was setting so I figured the LSB was 1 for port B0 and 2 for B3. Like I said I was confused

So you are saying all I need to do is say 0 and not 1, right?

Originally Posted by Gusse
Hi Wolwil,

You could replace PBP SHIFTOUT with attached ASM code example and reduce time spend in SHIFTOUT + loop to ~1/8. This is kind of tested, but no promises ... Check if it is usefull for you.

PBP_shiftout execution time was around 814uS, but ASM_shiftout took only 104uS to do the same.

BR,
-Gusse-
Thanks I will try this out tonight!
Last edited by wolwil; - 11th May 2010 at 21:34.

21. Did you find this post helpful? |

## I was able to get the HPWM to work but no luck on the ASM

here is the code for the PWM:
Code:
```C1 var byte

For C1 = 0 to 31
hpwm 1,127,32767
next```
looks like I didn't need all the extra mumbo jumbo before it as the 16f88 defaults to b0 for channel 1

The sad thing is this still is not fast enough to do what I need it to do.

I am trying to generate 4096 clock pulses as fast as possible. The chip will allow up to a 30MHz clock pulse to drive it but using the HPWM all I can get is 32,767Hz which is like paying full price for a Lamborghini that only has first gear.

Does anyone have any ideas on how I might be able to achieve this?

22. Did you find this post helpful? |

## Hpwm pbp

Hi
@30Mhz you can get way faster HPWM from the chip BUT you can NOT use the HPWM command in PBP. You need to get inside the car and drive yourself. Set up the HPWM module manually, examples on Page 87 in the datasheet Table 9-3

Hope this helps

23. Did you find this post helpful? |
Originally Posted by sinoteq
Hi
@30Mhz you can get way faster HPWM from the chip BUT you can NOT use the HPWM command in PBP. You need to get inside the car and drive yourself. Set up the HPWM module manually, examples on Page 87 in the datasheet Table 9-3

Hope this helps
HAHA well sure I could use the formula to find out if it will be fast enough but one needs to know what "log, log2 and bits" are first. I wish these manuals would explain the formulas a little better.

How fast is "way faster"? keep in mind the 16f88 will be running at 20MHz but the external chip I am interfacing with can be driven with up to 30MHz.

See I was thinking the fastest way would be to use some simple ASM to turn a Pin HIGH then turn Pin LOW and was thinking this could possibly achieve maybe a 5MHz signal.

24. Did you find this post helpful? |

25. Did you find this post helpful? |
Hi,
I think that the highest frequency you can get out of the CCP module is 2.5Mhz when the chip running at 20Mhz. Even so I don't think it's going to do what you want since there's no easy way to control the number of cycles or "pulses" that is going out and as far I understand you need exactly 4096 "pulses".

Perhaps it's possible to connect the CCP output back into a counter input and set that up to generate an interrupt which stops the CCP module at the right spot but it seems like a long shot.

I'm definetly no "ASM-guy" but to set and reset a pin (or rather the port latch) in ASM you can do:
Code:
```@ BSF PORTA, 3
@ BCF PORTA, 3```
But even if you put 4096 of those in a row you won't get more that 2.5Mhz because each instruction takes one cycle and one cycle is Fosc/4 meaning 200ns per cycle @20MHz so the frequency would be 1/400ns=2.5Mhz. Not to mention you'd fill up the flash memory of the 16F88....

Code:
```i VAR BYTE
j VAR BYTE
For i = 1 to 16      '16*256=4096
For j = 0 to 255
@  BSF PORTA, 3
@  BCF PORTA, 3
NEXT
NEXT```
If I'm right that inner loop seems to execute in 7 cycles and the outer loop seems to take another 12 for a total of (7*256*16)+(12*16)=28864 cycles @ 200ns each. A total of 5772.8us for 4096 pulses or an average frequency of ~709kHz. There will be some jitter in the pulsestream when it goes from the inner to outer loop but I guess it doesn't matter(?)

/Henrik.

26. Did you find this post helpful? |
Here's a neat trick to get 1MHz with a 20MHz oscillator. This takes 5 instruction cycles to toggle the pin, and Timer1 keeps track of the toggle count for you.

You don't need to use any incrementing or decremeting loops or variables.

1. Set T1CKI pin, and make the pin an output.
2. Load Timer1 low & high registers with 65,536 - the number of clocks to output & count.
3. Setup Timer1 for external clock, 1:1 prescaler, and turn it on.

T1CKI outputs your clock while Timer1 count increments on every low-to-high transition on T1CKI.

Code:
```PORTC.0 = 1       ' set pin so 1st low-to-high increments count
TRISC.0 = 0       ' make pin an output

Main:
TMR1H = \$F0    ' 65,536 - 4096 = 61,440 = \$F000
TMR1L = \$00    ' so 4096 toggles will set TMR1IF
T1CON = %00000011 ' 1:1 prescaler, external clock, Timer1 on

ASM
Pulse
BCF PORTC,0       ; clear T1CKI pin
BSF PORTC,0       ; set T1CKI pin
BTFSS PIR1,TMR1IF ; when TMR1 overflows, count is complete
GOTO Pulse        ; loop until Timer1 overflow
BCF  PIR1,TMR1IF  ; clear TMR1 overflow flag bit
ENDASM```
When PIR1,TMR1IF = 1 you have 4096 clocks. I tested this on a 16F877A, so just change the T1CKI pin to whatever it is on your PIC type.
Last edited by Bruce; - 18th May 2010 at 15:52. Reason: A better way

27. Did you find this post helpful? |
Son of a gun, that works (please note that T0CKI is RB6 on a 16F88).

The three cycle loop 'overhead' (BTFSS & GOTO) is a bottleneck. You could get better performance if you spread it out over more pulses. If you were to produce 16 pulses within the loop then you could use a single byte variable counter instead of Timer 1 and bump the output up to 2+ MHz.

Regards, Mike
Last edited by Mike, K8LH; - 18th May 2010 at 07:13.

28. Did you find this post helpful? |
Thank You Bruce for the code!!! I will try to get it loaded and tested tonight.

Originally Posted by Mike, K8LH
Son of a gun, that works (please note that T0CKI is RB6 on a 16F88).

The three cycle loop 'overhead' (BTFSS & GOTO) is a bottleneck. You could get better performance if you spread it out over more pulses. If you were to produce 16 pulses within the loop then you could use a single byte variable counter instead of Timer 1 and bump the output up to 2+ MHz.

Regards, Mike
Mike are you saying Use a FOR Loop ouside of the ASM like this or is there an ASM FOR Loop I should be using for it (Keep in mind I dont know ASM):
Code:
```PORTB.6 = 1       ' set pin so 1st low-to-high increments count
TRISB.6 = 0       ' make pin an output
T1CON = %00000011 ' 1:1 prescaler, external clock, Timer1 on
C1 VAR BYTE

FOR C1 = 0 to 255
ASM
Pulse
BCF PORTB,6       ; clear T1CKI pin / 1
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 2
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 3
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 4
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 5
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 6
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 7
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 8
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 9
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 10
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 11
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 12
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 13
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 14
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 15
BSF PORTB,6       ; set T1CKI pin
BCF PORTB,6       ; clear T1CKI pin / 16
BSF PORTB,6       ; set T1CKI pin
ENDASM
NEXT```

29. Did you find this post helpful? |
I think this is what Mike was talking about?

This will give you 2.5MHz with every 16th logic 1 bit stretched to 4 cycles VS 1. If you can live with this bit being a tad longer, it will definitely speed things up.
Code:
```
' define port & pin use in _Pulse
@ #DEFINE PORT PORTB ' use any port you prefer, but declare it here
@ #DEFINE PIN 6      ' same as above

PORTB.6 = 1       ' initialize pin to idle state
TRISB.6 = 0       ' make the pin an output

C1 VAR BYTE BANK0 SYSTEM

Main:
C1 = 0        ' clear loop count
CALL Pulse    ' generate 4096 pulses
GOTO Main

ASM
_Pulse
BCF PORT,PIN     ; clear pin / 1
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 2
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 3
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 4
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 5
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 6
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 7
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 8
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 9
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 10
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 11
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 12
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 13
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 14
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 15
BSF PORT,PIN     ; set pin
BCF PORT,PIN     ; clear pin / 16
BSF PORT,PIN     ; set pin
DECFSZ C1,F      ; decrement count, skip if 0
GOTO _Pulse      ; not done, keep going
RETURN
ENDASM```
With either version, make sure you have WDT disabled.

30. Did you find this post helpful? |

## Thanks Bruce that last bit of Code worked like a charm!!!

I didn't have to turn off the WDT, I think its off by default but could not find where it tells me in the manual. What will happen if its on? Would I notice it?

I do have one issue but it might just be an issue with it being on a bread board but when I use the Define OSC 20 with the 20 MHz clock it freaks out and does what ever it feels like and is sensitive to the touch but when I use no Define or Define OSC 4MHz but run it with the 20MHz clock it runs fine...I don't get it haha

Thanks again everyone, I am very grateful for all your help and suggestions

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts