PDA

View Full Version : Circuit reliability issues



hkpatrice
- 22nd November 2007, 05:01
Dear All,

I have been reading the posts on this forum for the past 3 years, today is the first time I post a message myself. Hopefully, the community of members here can help me solve my problem.

I have designed a circuit designed to switch 16 outputs on and off. It is hooked up to a RS-485 network (not exactly but similar, using two lines idling at 24 volts and a third one for ground, each 24V line may be switched off to represent digital "ones" and "zeros"). Up to 128 switchers may be connected in parallel at any time. Each of these "switchers" is adressable so that the controller can "talk" to each and every one of them separately and switch on/off the apropriate output. So, in essence, the controller can switch on and off a maximum of 2048 outputs (128 switchers X 16 outputs).

This whole network is used in multimedia applications and the configuration of the network will change from one time to the next, for instance, one application may require only 10 switchers whilst the next one may require 100.

The prototypes I built all worked perfectly on my workbench and reliablitlity was 100%. Communication was rock-solid and, after much devellopment, I had 300 of these switchers manufactured.

Here is where the problems started, Now that we are using them in the field, we are experiencing reliablitity issues. Out of 100 switchers, there is always 2-3 that will refuse to start-up, causing all kinds of problems and sometimes short-circuiting themselves.

The problem is that the circuit behaves as if the Pic doesn't start at all, or hangs, causing the outputs to be in indefinite state and disturbing communication on the network. The circuit talks back to the controller by using a current-loop kind of communication system. Essentially dropping a 270-ohm resistor from 24 volts to ground to represent digital "ones" and "zeros". This is fine when communicating at 9600 bauds, because the timing of the "ones" is very short. However, if the Pic hangs, and somehow the transistor controlling this resistor remains switched on, the resistor will eventually burn-up, causing a short circuit in the network.

This behavior is unpredictable. I may have a network with 100 switchers working perfectly at 3 PM. Turn it off, Turn it back on at 5PM and have 1 or 2 switchers not working anymore. Or, I may have a system installed for 3 days with everything working perfectly at all times.

At first, I tought that the problem was related to manufacturing, that somehow the soldering of the Xtal, or trace contaminants on the boads were to be blamed. I had each board cleaned up, and then applied 5 coats of conformal coating to each of them, unfortunately, it did not solve the issue.

I then suspected power supply problems but it doesn't seem to be the case. The circuit is powered by a clean 24Volts DC line and the voltage is regulated to 5 volts by a low-dropout regulator (LM2931AZ-5.0) with a 100uf can type cap upstream and a 47uf tentalum cap downstream, as per manufacturer's recommendations. A check with my scope shows a very clean power rail.

The microcontroller is a Pic 16F877A, with both supply pins decoupled with a 0.1uf cap and the MCLR pin tied to the power supply rail through a 4.7K resistor. No output pin is left unused and they all have 10K pull down resistors.

I am using a 4Mhz Xtal, with 20pf ceramic-disc caps, as close to the chip as can be. The whole circuit has a ground plane.

More and more, I am suspecting that the problem is software-related. Specifically, I am wondering if I did not make an error in setting my configuration fuses.

The symptoms of the problem will be as if the switcher starts to communicate to the controller, and then just stops. If that happens, the 270 ohm resistor will stay between 24 volts and ground and will either short-circuit and/or disrupt the current-loop communication. Another way it can fail is that it will just stay non-responsive, as if the Xtal did not start up. Since this is intermittent, and happens in the field, it is almost imposible for me to take readings with a scope when it happens. Please note that the switcher only communicates when asked, however this problem will pop up at power up, even before I ask the switchers anything. I am sending a dummy message at the start of my program, I read on this forum that it was preferable to do this when using the USART.

I am setting the fuses in the Melabs programmer, not in the program itself. My fuses ar as follows:


Oscillator: HS
Watchdog timer: Enabled
Power-up Timer: Enabled
Brown-out Reset: Enabled
Low voltage programming: Disabled
Flash Program memory write: 0x1000-0x1FFF
code: Protected
Data EEPROM: Protected


I am wondering if a programming blunder somehow puts the pic in reset, and the outputs remain in indefinite state? But then again, the problem seems to only affect the pins driving the two transistors controlling the 270 ohms resistors, the 16 transistors doing the switching are never affected. Or maybe the crystal doesn't start up? but why? Or maybe the 74ALS08 controlling the two "communication" transistors fails? I do have one unused pin on there that I forgot to tie to anything...

I have attached the listing of the program to this message, as well as a JPEG file of the circuit.

Any help or suggestions, be it hardware or software related will be more than welcome as I have been trying to solve this for the past 2 months and am now running out of ideas.

Thanks and best regards,

Patrice

Archangel
- 22nd November 2007, 06:06
Hi hkpatrice,
have you tried installing a capacitor on your MCLR pin to ground to hold the PIC in a reset condition until the chip settles down internaly?
I see no TRISE statement in code . . . Probably need this so chip knows input or outputs.

hkpatrice
- 22nd November 2007, 12:55
Joe s. said:

"Hi hkpatrice,
have you tried installing a capacitor on your MCLR pin to ground to hold the PIC in a reset condition until the chip settles down internaly? I see (at least I think I see) you are feeding the mclr pin through a resistor from the 24 volt line and limiting the voltage with a zener, I am wondering if that zener is introducing noise from it's constant switching, cycling the voltage up/down. . . How does it work on a 5v regulated supply?"

Hello Joe and thanks for the quick answer.

-I haven't tried putting a cap on the MCLR line, however, I have the power-up timer enabled, doesn't that play a similar role?

-The MCLR pin is fed through a resistor from the 5V rail, not the 24V one... And no zener there... Sorry for the quality of the BMP, it's the best I could do while staying under the 200Kb limit.

-You are right about the missing TRISE statement! How could I miss that? Could that be the cause of the problems I've been experiencing?

Thanks for the input!

mister_e
- 22nd November 2007, 17:12
can you try to use PrimoPDF to post your Schematic?
http://www.primopdf.com/

PrimoPDF is free and really nice, not sure 'bout the final PDF size so far

It also seems to have something to do with the ISR, about DISABLE/ENABLE/RESUME 'round your ISR

hkpatrice
- 23rd November 2007, 07:37
mister_e,

Thanks for the suggestion, I tried convertingthe file to PDF but it still comes in at 370Kb...
Still too much to post on here.

Please tell me more about this interrupt handling issue, you seem to have spotted something that eludes me. Of course, If the problem was with the ISR, then it would cause the erratic behavior I've been experiencing...

Thanks for the help!

mister_e
- 23rd November 2007, 14:55
Try to ZIP your PDF, or send it to my e-mail.


The most notable place to use DISABLE is right before the actual interrupt handler. Or the interrupt handler may be placed before the ON INTERRUPT statement as the interrupt flag is not checked before the first ON INTERRUPT in a program.




ON INTERRUPT GOTO serialin ' Declare interrupt handler routine
'
'
'
'
' Subroutines

DISABLE ' Don't check for interrupts in this section

getbuf: ' move the next character in buffer to bufchar

index_out = (index_out + 1) ' Increment index_out pointer (0 to 63)
IF index_out > (buffer_size-1) THEN index_out = 0 ' Reset pointer if outside of buffer
ADDRESS = addressbuffer ' Read buffer location
COMMAND = commandbuffer[index_out]
RETURN


error: ' Display error message if buffer has overrun
errflag = 0 ' Reset the error flag
CREN = 0 ' Disable continuous receive to clear overrun flag
CREN = 1 ' Enable continuous receive
GOTO main ' Carry on


' Interrupt handler
' Where's the Disable???
serialin: ' Buffer the character received
IF PIR1.5=1 THEN 'IF THIS IS A USART INTERRUPT....
IF OERR THEN usart_error ' Check for USART errors
index_in = (index_in + 1) ' Increment index_in pointer (0 to 63)
IF index_in > (buffer_size-1) THEN index_in = 0 'Reset pointer if outside of buffer
IF index_in = index_out THEN buffer_error ' Check for buffer overrun
HSERIN badparity,10,badparity,[addressbuffer[index_in],commandbuffer[index_in]] ' Read USART and store character to next empty location
IF RCIF THEN serialin ' Check for another character while we're here
ENDIF

RESUME ' Return to program

badparity:
IF index_in=0 THEN
index_in= (buffer_size-1)
ELSE
index_in = (index_in - 1) ' Move pointer back to avoid corrupting the buffer.
ENDIF
GOTO main

' You don't even need it as you used HSER_CLROERR define.
' Let's say CLROERR don't work, YOU DON'T WANT TO use a GOTO inside the ISR
' or you'll experiment a stack overflow/underflow one day or another. Same rule apply in [I]badparity sub.. which is also called somewhere in the main loop
usart_error:
errflag=1
GOTO main


buffer_error:
errflag.1 = 1 ' Set the error flag for software
IF index_in=0 THEN
index_in= (buffer_size-1)
ELSE
index_in = (index_in - 1) ' Move pointer back to avoid corrupting the buffer.
ENDIF

RESUME ' Return to program

MAYBE it's safe to place Disable BEFORE the ISR as long as there's another ENABLE somewhere after, but i don't see any :(

Maybe there's something else, those are the first who jump in my face.

HTH