PDA

View Full Version : Software Reliability



Bill Legge
- 24th July 2010, 04:34
I've built a couple of PIC systems that run for months on end (weather station and water pump controller) and noticed that they occasionally go wrong.

I have structured my software so that the initialisation (clock speed, data direction, ADC settings and so on) is first followed by a loop that does the work - this loop should execute 'for ever.'

Found this advice in the book 'Programming 16 Bit Microcontrollers' by Lucio Jasio:

1. It is likely, if the MCU runs for long enough, that power supply fluctuation (undetected by the 'brownout' reset circuit) or noise will corrupt the essential control registers/peripherals.

2. Prevent this by putting periodic 'refresh' code into your main loop.

Like most good ideas - it is obvious once someone points it out.

I've not seen this advice anywhere else in the few years I've been programming PICs. Any comments?

Regards Bill Legge

Normnet
- 24th July 2010, 06:35
I add a loop counter upon which refreshes the registers.

Norm

BrianT
- 24th July 2010, 08:18
I build data loggers that must run unattended for a year plus. There are several aspects to reliability.

The first is electrical interference. This can be cosmic rays hitting a memory cell, static discharge or nearby lightning coming in via an I/O port, dirty power supply, etc, etc. A precaution against this is to refresh all registers every time through the main loop. Something like.

code:
'************************* Initialise ******************************
Initialise:
ADCON0 = %00000000 ' ADC turned OFF until needed
ADCON1 = %00001001 ' Configures 6 analog - balance digital
CMCON = %00000111 ' Comparators OFF
CVRCON = %00000000 ' Voltage Rev OFF
TRISA = %11111111 ' ECG, Accels input
PORTA = %00000000 '
TRISB = %10000010 '
TRISC = %11111111 ' temporary low power
TRISD = %01000000 ' TFU on D.6 ensure it is set as an input
PortD = %00000000 ' temporary low power
TRISE = %00000000
PortE = %00000000
OSCCON = %01101111 ' 4 MHz
endcode

The second problem is software integrity. Watchdog timer, error trapping, housekeeping checks, etc. Things like looking for A < B or A > B rather than A = B. This only matters with floating point code where what you expect to be an equality might turn out to have a small rounding error that defeats the A = B test.

The third is to have a second party go through your code looking for logical errors and ways to break it. I know of one software company working in mission critical real time 24/7 process control systems that has as many people in the test team as they have writing the code.

The fourth is good documentation. Give your code to a colleague and ask if they understand it and could maintain it.

A fifth is the hardware. Clean all flux off the board, or use an approved no-clean flux, don't stress your capacitors, no parts running hot, quality connectors with plating appropriate to the environment your box must work in. Don't let the box boil in direct sunlight or freeze overnight, etc.

HTH
BrianT

Charles Linquis
- 24th July 2010, 15:07
I have a couple more:

Learn all the ins and outs of Read-Modify-Write.
Be very careful when you use GOTOs and GOSUBs. Don't use one when you mean to use the other.
Put timeouts in EVERYTHING (hserin, for example), so that the code never waits forever for a state change or user input.
Actually use the watchdog timer. If you are careful, it can be done.
Always use 4 layer PCBs (at least) with solid power and GND planes.
Put several capacitors right next to the chip power pins (I like 2 X 22uF and
2 X .01uF ceramics).
Put the crystal close to the PIC.
Keep the MCLR line short. Use a 4.7K pull up to Vcc.
Design the hardware such that the circuit (or machine) behaves properly if the PIC is reset with power ON to everything else.