View Full Version : How can I speed this code up?  SHIFTOUT is slowing it down and I need a faster way.
  
wolwil
- 8th May 2010, 05:33
I think what is slowing this down is the SHIFTOUT.  
I am running a 16f88 @ 16MHz and I need to know the fastest way to get through this loop:
LOOP:
	FOR DATA = 4095 to 0 step -1
    		GOSUB SUB1
	    	GOSUB SUB2
	NEXT
	FOR DATA = 0 to 4095 step 1
    		GOSUB SUB1
    		GOSUB SUB2
	NEXT
GOTO LOOP
SUB1:
	FOR C1 = 0 TO 15
    		shiftout dpin,clk,1,[DATA]
	NEXT
    	PORTB = %00100000 
    	PORTB = %00000000  
RETURN
SUB2:
	PORTB = %00000100 
	PORTB = %00000000 
	FOR C3 = 0 TO 4095
    		PORTA = %00001000
    		PORTA = %00000000
	NEXT
RETURN
mackrackit
- 8th May 2010, 05:59
131,072 SHIFTOUTs in that loop and it is shifting the same value 15 times, getting another value and shifting that 15 times, again ,again....  That will tale some time.
Unless I am looking at it cross eyed...
Is the above really what you want to do?
wolwil
- 8th May 2010, 06:08
131,072 SHIFTOUTs in that loop and it is shifting the same value 15 times, getting another value and shifting that 15 times, again ,again....  That will tale some time.
Unless I am looking at it cross eyed...
Is the above really what you want to do?
 
Yes but not 15 times 16 times ;) and really its not all that bad if I would be able to have it shifting say at 1MHz or more
mackrackit
- 8th May 2010, 06:20
I wonder if SHIFTING all 16 values at once would be faster than looping? Darn zeros:cool:
shiftout dpin,clk,1,[DATA,DATA,DATA,DATA,DATA,...]
wolwil
- 8th May 2010, 06:25
I wonder if SHIFTING all 16 values at once would be faster than looping? Darn zeros:cool:
shiftout dpin,clk,1,[DATA,DATA,DATA,DATA,DATA,...]
 
I dunno let me try it...
wolwil
- 8th May 2010, 06:54
Nope The Same :(
The part that is slow is going from SUB1 to SUB2 not the looping through them 4096 times
I just tried taking out the GOSUB's and still the same thing
Gusse
- 8th May 2010, 08:56
There are couple of workarounds available:
1) changed crystal to 20MHz
2) If Shiftout command is too slow
<code><font color="#000000">    <b>SHIFTOUT </b>dpin,clk,1,[Dat]
</code>
then don't use it. Do it other way (e.g code below).
<code><font color="#000000">    dpin = Dat.0(7) : clk = 1 : clk = 0
    dpin = Dat.0(6) : clk = 1 : clk = 0
    dpin = Dat.0(5) : clk = 1 : clk = 0
    dpin = Dat.0(4) : clk = 1 : clk = 0
    dpin = Dat.0(3) : clk = 1 : clk = 0
    dpin = Dat.0(2) : clk = 1 : clk = 0
    dpin = Dat.0(1) : clk = 1 : clk = 0
    dpin = Dat.0(0) : clk = 1 : clk = 0
</code>
This will run much faster but consume more code space. Trade-off that you have to live with.
BTW, DATA is reserved word so therefore I changed it to Dat
BR,
-Gusse-
wolwil
- 8th May 2010, 17:10
<code><font color="#000000">    dpin = Dat.0(7) : clk = 1 : clk = 0
    dpin = Dat.0(6) : clk = 1 : clk = 0
    dpin = Dat.0(5) : clk = 1 : clk = 0
    dpin = Dat.0(4) : clk = 1 : clk = 0
    dpin = Dat.0(3) : clk = 1 : clk = 0
    dpin = Dat.0(2) : clk = 1 : clk = 0
    dpin = Dat.0(1) : clk = 1 : clk = 0
    dpin = Dat.0(0) : clk = 1 : clk = 0
</code>
BR,
-Gusse-
 
Nope the same. 
20 MHz Clock will still be too slow with SHIFTOUT.
I am assuming your code has something to do with accessing per bit in the word sized Dat variable.  So if I wanted to access the 11th bit I would do this Dat.1(2) right?
Would anyone have a faster way in Assembly I could do this?
Also Does anyone know how many clock pulses SHIFTOUT uses?
Gusse
- 8th May 2010, 17:42
Nope the same. 
20 MHz Clock will still be too slow with SHIFTOUT.
In your 1st post you are saying that you are running @16MHz. 20MHz is 25% faster than you present system.
I am assuming your code has something to do with accessing per bit in the word sized Dat variable.  So if I wanted to access the 11th bit I would do this Dat.1(2) right?
Would anyone have a faster way in Assembly I could do this?
Also Does anyone know how many clock pulses SHIFTOUT uses?
Code does exactly the same as SHIFTOUT but just little bit faster.
If this didn't help then SHIFTOUT is not the bottleneck.
Keep looking other solutions.
11th bit would be Dat.0(10). 
Example below (remember MSBFIRST).
<code><font color="#000000">    Dat     VAR BYTE [2]
    dpin = Dat.0(7) : clk = 1 : clk = 0
    dpin = Dat.0(6) : clk = 1 : clk = 0
    dpin = Dat.0(5) : clk = 1 : clk = 0
    dpin = Dat.0(4) : clk = 1 : clk = 0
    dpin = Dat.0(3) : clk = 1 : clk = 0
    dpin = Dat.0(2) : clk = 1 : clk = 0
    dpin = Dat.0(1) : clk = 1 : clk = 0
    dpin = Dat.0(0) : clk = 1 : clk = 0
    
    dpin = Dat.0(15) : clk = 1 : clk = 0
    dpin = Dat.0(14) : clk = 1 : clk = 0
    dpin = Dat.0(13) : clk = 1 : clk = 0
    dpin = Dat.0(12) : clk = 1 : clk = 0
    dpin = Dat.0(11) : clk = 1 : clk = 0
    dpin = Dat.0(10) : clk = 1 : clk = 0  <font color="#000080"><i>'<- 11th
    </i></font>dpin = Dat.0(9)  : clk = 1 : clk = 0
    dpin = Dat.0(8)  : clk = 1 : clk = 0
</code>
EDIT: If you or anybody know faster SHIFTOUT workaround with PBP, I would be interested.
BR,
-Gusse-
wolwil
- 8th May 2010, 20:50
Code does exactly the same as SHIFTOUT but just little bit faster.
If this didn't help then SHIFTOUT is not the bottleneck.
Keep looking other solutions.
BR,
-Gusse-
Well I have come to the realization that it is not the SHIFTOUT slowing me down but the pulsing of PORTA.3 for 4096 times.  I need to get this to be faster.  
For better help on this I am going to add on to an older post that deals with the chip I am using so I don't create multiples on here.
Thanks everyone for your help.
Charles Linquis
- 8th May 2010, 23:46
Use a 16 bit counter in assembly to count to 0FFF.
In that loop, use the instruction -
btg LATA.3
You can't get much quicker than that.
I haven't counted exactly, but it looks to be under 14 cycles.  At 40Mhz, that is 
1.4uSEC = 714 Khz.
Charles Linquis
- 8th May 2010, 23:48
I should have stated that it is 1.4uSec per CYCLE.  So 4096 cycles would take 5.8 milliseconds.
wolwil
- 9th May 2010, 00:14
Use a 16 bit counter in assembly to count to 0FFF.
In that loop, use the instruction -
btg LATA.3
You can't get much quicker than that.
I haven't counted exactly, but it looks to be under 14 cycles.  At 40Mhz, that is 
1.4uSEC = 714 Khz.
 
so do I just replace this:
	FOR C3 = 0 TO 4095
    		PORTA = %00001000
    		PORTA = %00000000
	NEXT
with this:
FOR C3 = 0 TO 4095
     btg LATA.3
NEXT
I do not know assembly at all thats why I went with PBP and keep in mind I am using a 16MHz Clock but I will be switching it up to a 20MHz
Charles Linquis
- 9th May 2010, 00:39
No, it will take a bit more than that.
I would have posted something more complete, but I don't have access to my development board until Monday.   When I get to my board, I'll be able to send you something that has been tested.
wolwil
- 9th May 2010, 00:42
No, it will take a bit more than that.
I would have posted something more complete, but I don't have access to my development board until Monday.   When I get to my board, I'll be able to send you something that has been tested.
 
That would be awesome! Thanks
Charles Linquis
- 10th May 2010, 22:02
That would be awesome! Thanks
 
It will take me a bit longer than I thought.  I think you said you were using a 16F chip.   Bad news for me!  The 16Fs are missing some of my favorite ASM instructions.  
I don't have any 16F stuff lying around, I'll have to find one.
wolwil
- 11th May 2010, 17:28
It will take me a bit longer than I thought.  I think you said you were using a 16F chip.   Bad news for me!  The 16Fs are missing some of my favorite ASM instructions.  
I don't have any 16F stuff lying around, I'll have to find one.
 
Thats alright I have been trying to get it to work with the HPWM but I just dont quite understand how to get it to work.  I put a LED on PORTB.0 and typed the following code but nothing happens to the light:
DEFINE OCS 4
C1 VAR WORD
DEFINE CCP1_REG PORTB   'no clue what this does but I think I need it for the following line
DEFINE CCP1_BIT 1     ' because 1 = portb.0 according to the manual I think?
FOR C1 = 0 to 255
HPWM 1,C1,1000 
NEXT
I was thinking I could use this for my clock pulse of 4096 times like this:
FOR C1 = 0 TO 15
HPWM 1,127,frequency '50% duty/square wave
NEXT
for right now I have a second external clock hooked up that I am turning on with throwing a pin high for a couple milliseconds and it fixed what I am trying to fix but its not exact like I need it to be.
Thanks again for your time with this!
HenrikOlsson
- 11th May 2010, 19:33
Hi,
Some PIC's can map the output of the CCP-module to different pins, the 'F88 can map it to either RB0 or RB3. The defines you have tells PBP to generate code to map the CCP1 output to the pin you specify. However, BIT 1 as you've specified means it tries to map it to PORTB.1 which isn't valid.
/Henrik.
Gusse
- 11th May 2010, 20:14
Hi Wolwil,
You could replace PBP SHIFTOUT with attached ASM code example and reduce time spend in SHIFTOUT + loop to  ~1/8. This is kind of tested, but no promises ... Check if it is usefull for you.
PBP_shiftout execution time was around 814uS, but ASM_shiftout took only 104uS to do the same.
Files:
4384   <- Original, copied from 1st post
4385
(Just remove .txt from end of files)
I know that ASM version is spaghetti code but I don' really care :)
So long time since last time when I did something with ASM.
BR,
-Gusse-
wolwil
- 11th May 2010, 21:31
Hi,
However, BIT 1 as you've specified means it tries to map it to PORTB.1 which isn't valid.
/Henrik.
 
See I was confused because in the manual it said:
bit 3-0 CCP1M<3:0>: CCP1 Mode Select bits
0000 = Capture/Compare/PWM disabled (resets CCP1 module)
0100 = Capture mode, every falling edge
0101 = Capture mode, every rising edge
0110 = Capture mode, every 4th rising edge
0111 = Capture mode, every 16th rising edge
1000 = Compare mode, set output on match (CCP1IF bit is set)
1001 = Compare mode, clear output on match (CCP1IF bit is set)
1010 = Compare mode, generate software interrupt on match (CCP1IF bit is set, CCP1 pin is
unaffected)
1011 = Compare mode, trigger special event (CCP1IF bit is set, CCP1 pin is unaffected); CCP1
resets TMR1 and starts an A/D conversion (if A/D module is enabled)
11xx = PWM mode
So I was thinking the xx was what I was setting so I figured the LSB was 1 for port B0 and 2 for B3. Like I said I was confused :)
So you are saying all I need to do is say 0 and not 1, right?
Hi Wolwil,
You could replace PBP SHIFTOUT with attached ASM code example and reduce time spend in SHIFTOUT + loop to  ~1/8. This is kind of tested, but no promises ... Check if it is usefull for you.
PBP_shiftout execution time was around 814uS, but ASM_shiftout took only 104uS to do the same.
BR,
-Gusse-
Thanks I will try this out tonight!
wolwil
- 17th May 2010, 05:35
here is the code for the PWM:
C1 var byte
For C1 = 0 to 31
    hpwm 1,127,32767
next
looks like I didn't need all the extra mumbo jumbo before it as the 16f88 defaults to b0 for channel 1
The sad thing is this still is not fast enough to do what I need it to do.
I am trying to generate 4096 clock pulses as fast as possible.  The chip will allow up to a 30MHz clock pulse to drive it but using the HPWM all I can get is 32,767Hz which is like paying full price for a Lamborghini that only has first gear.  
Does anyone have any ideas on how I might be able to achieve this?
sinoteq
- 17th May 2010, 07:13
Hi
@30Mhz you can get way faster HPWM from the chip BUT you can NOT use the HPWM command in PBP. You need to get inside the car and drive yourself. Set up the HPWM module manually, examples on Page 87 in the datasheet Table 9-3
Hope this helps
wolwil
- 17th May 2010, 15:04
Hi
@30Mhz you can get way faster HPWM from the chip BUT you can NOT use the HPWM command in PBP. You need to get inside the car and drive yourself. Set up the HPWM module manually, examples on Page 87 in the datasheet Table 9-3
Hope this helps
HAHA well sure I could use the formula to find out if it will be fast enough but one needs to know what "log, log2 and bits" are first.  I wish these manuals would explain the formulas a little better.
How fast is "way faster"?  keep in mind the 16f88 will be running at 20MHz but the external chip I am interfacing with can be driven with up to 30MHz.  
See I was thinking the fastest way would be to use some simple ASM to turn a Pin HIGH then turn Pin LOW and was thinking this could possibly achieve maybe a 5MHz signal.
mackrackit
- 17th May 2010, 16:13
Have you seen this?
http://www.rentron.com/Infrared_Communication.htm
HenrikOlsson
- 17th May 2010, 19:43
Hi,
I think that the highest frequency you can get out of the CCP module is 2.5Mhz when the chip running at 20Mhz. Even so I don't think it's going to do what you want since there's no easy way to control the number of cycles or "pulses" that is going out and as far I understand you need exactly 4096 "pulses". 
Perhaps it's possible to connect the CCP output back into a counter input and set that up to generate an interrupt which stops the CCP module at the right spot but it seems like a long shot.
I'm definetly no "ASM-guy" but to set and reset a pin (or rather the port latch) in ASM you can do:
@ BSF PORTA, 3
@ BCF PORTA, 3
But even if you put 4096 of those in a row you won't get more that 2.5Mhz because each instruction takes one cycle and one cycle is Fosc/4 meaning 200ns per cycle @20MHz so the frequency would be 1/400ns=2.5Mhz. Not to mention you'd fill up the flash memory of the 16F88.... 
How about this, I THINK it should run faster than your For i = 0 to 4095 loop.
i VAR BYTE
j VAR BYTE
For i = 1 to 16      '16*256=4096
  For j = 0 to 255
@  BSF PORTA, 3
@  BCF PORTA, 3
  NEXT
NEXT
If I'm right that inner loop seems to execute in 7 cycles and the outer loop seems to take another 12 for a total of (7*256*16)+(12*16)=28864 cycles @ 200ns each. A total of 5772.8us for 4096 pulses or an average frequency of ~709kHz. There will be some jitter in the pulsestream when it goes from the inner to outer loop but I guess it doesn't matter(?) 
/Henrik.
Bruce
- 17th May 2010, 22:21
Here's a neat trick to get 1MHz with a 20MHz oscillator. This takes 5 instruction cycles to toggle the pin, and Timer1 keeps track of the toggle count for you.
 
You don't need to use any incrementing or decremeting loops or variables.
 
1. Set T1CKI pin, and make the pin an output.
2. Load Timer1 low & high registers with 65,536 - the number of clocks to output & count.
3. Setup Timer1 for external clock, 1:1 prescaler, and turn it on.
 
T1CKI outputs your clock while Timer1 count increments on every low-to-high transition on T1CKI.
 
PORTC.0 = 1       ' set pin so 1st low-to-high increments count
TRISC.0 = 0       ' make pin an output
 
Main:
  TMR1H = $F0    ' 65,536 - 4096 = 61,440 = $F000
  TMR1L = $00    ' so 4096 toggles will set TMR1IF
  T1CON = %00000011 ' 1:1 prescaler, external clock, Timer1 on
 
ASM
Pulse
    BCF PORTC,0       ; clear T1CKI pin
    BSF PORTC,0       ; set T1CKI pin
    BTFSS PIR1,TMR1IF ; when TMR1 overflows, count is complete
    GOTO Pulse        ; loop until Timer1 overflow
    BCF  PIR1,TMR1IF  ; clear TMR1 overflow flag bit
ENDASM
When PIR1,TMR1IF = 1 you have 4096 clocks. I tested this on a 16F877A, so just change the T1CKI pin to whatever it is on your PIC type.
Mike, K8LH
- 18th May 2010, 06:33
Son of a gun, that works (please note that T0CKI is RB6 on a 16F88).
The three cycle loop 'overhead' (BTFSS & GOTO) is a bottleneck.  You could get better performance if you spread it out over more pulses.  If you were to produce 16 pulses within the loop then you could use a single byte variable counter instead of Timer 1 and bump the output up to 2+ MHz.
Regards, Mike
wolwil
- 18th May 2010, 19:16
Thank You Bruce for the code!!!  I will try to get it loaded and tested tonight.
Son of a gun, that works (please note that T0CKI is RB6 on a 16F88).
The three cycle loop 'overhead' (BTFSS & GOTO) is a bottleneck.  You could get better performance if you spread it out over more pulses.  If you were to produce 16 pulses within the loop then you could use a single byte variable counter instead of Timer 1 and bump the output up to 2+ MHz.
Regards, Mike
Mike are you saying Use a FOR Loop ouside of the ASM like this or is there an ASM FOR Loop I should be using for it (Keep in mind I dont know ASM):
PORTB.6 = 1       ' set pin so 1st low-to-high increments count
TRISB.6 = 0       ' make pin an output
T1CON = %00000011 ' 1:1 prescaler, external clock, Timer1 on
C1 VAR BYTE
 
FOR C1 = 0 to 255
ASM
Pulse
    BCF PORTB,6       ; clear T1CKI pin / 1
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 2
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 3
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 4
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 5
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 6
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 7
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 8
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 9
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 10
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 11
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 12
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 13
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 14
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 15
    BSF PORTB,6       ; set T1CKI pin
    BCF PORTB,6       ; clear T1CKI pin / 16
    BSF PORTB,6       ; set T1CKI pin
ENDASM
NEXT
Bruce
- 19th May 2010, 18:14
I think this is what Mike was talking about?
 
This will give you 2.5MHz with every 16th logic 1 bit stretched to 4 cycles VS 1. If you can live with this bit being a tad longer, it will definitely speed things up.
 
' define port & pin use in _Pulse
@ #DEFINE PORT PORTB ' use any port you prefer, but declare it here
@ #DEFINE PIN 6      ' same as above
 
PORTB.6 = 1       ' initialize pin to idle state
TRISB.6 = 0       ' make the pin an output
 
C1 VAR BYTE BANK0 SYSTEM
 
Main:
    C1 = 0        ' clear loop count
    CALL Pulse    ' generate 4096 pulses
    GOTO Main
 
ASM
_Pulse
    BCF PORT,PIN     ; clear pin / 1
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 2
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 3
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 4
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 5
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 6
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 7
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 8
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 9
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 10
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 11
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 12
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 13
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 14
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 15
    BSF PORT,PIN     ; set pin
    BCF PORT,PIN     ; clear pin / 16
    BSF PORT,PIN     ; set pin
    DECFSZ C1,F      ; decrement count, skip if 0
    GOTO _Pulse      ; not done, keep going
    RETURN
ENDASM
With either version, make sure you have WDT disabled.
wolwil
- 20th May 2010, 03:47
I didn't have to turn off the WDT, I think its off by default but could not find where it tells me in the manual.  What will happen if its on?  Would I notice it?
I do have one issue but it might just be an issue with it being on a bread board but when I use the Define OSC 20 with the 20 MHz clock it freaks out and does what ever it feels like and is sensitive to the touch but when I use no Define or Define OSC 4MHz but run it with the 20MHz clock it runs fine...I don't get it haha 
Thanks again everyone, I am very grateful for all your help and suggestions
 
Powered by vBulletin® Version 4.1.7 Copyright © 2025 vBulletin Solutions, Inc. All rights reserved.