PDA

View Full Version : Weird problem with large(ish) array access



HenrikOlsson
- 11th September 2020, 21:24
Guys, I need some help here. This is probably one of the strangest bugs I've tried to figure out and I'm starting to believe it's actually not me (for once) doing something wrong.

From a much larger piece of code I've reduced this as much as I possibly can, here it is in its current state:

DEFINE OSC 48

BTN1 VAR PortD.7
BTN2 VAR PortH.3
LD1 VAR LATD.4
LD2 VAR LATE.4

CRC VAR WORD
Number_Of_Bytes VAR BYTE

Buffer VAR BYTE[256]

TRISD.4 = 0
TRISE.4 = 0
TRISD.7 = 1
TRISH.3 = 1



LD1 = 0
LD2 = 0
T0CON = %10000011 ' 1:16 prescaler TMR0 on

Main:
IF INTCON.2 = 1 THEN ' TMR0 interrupt flag?
LD2 = ~LD2 ' Toggle LED so we know we're alive and kickin'
INTCON.2 = 0
ENDIF


If BTN1 = 0 THEN
LD1 = 1
GOSUB SendMsg_1
LD1 = 0
WHILE BTN1 = 0 : WEND
ENDIF

IF BTN2 = 0 THEN
LD1 = 1
GOSUB SendMsg_2
LD1 = 0
WHILE BTN2 = 0 :WEND
ENDIF

Goto Main


SendMsg_1:
Buffer[0] = $87
Buffer[1] = $02
Buffer[2] = $06
Buffer[3] = $00
Buffer[4] = $40
Buffer[5] = $00
Buffer[6] = $00
Buffer[7] = $ea
Buffer[8] = $c4
Buffer[9] = $50
Buffer[10] = $00
Buffer[11] = $62
Buffer[12] = $3b
Buffer[13] = $30
Buffer[14] = $00
Buffer[15] = $b8
Buffer[16] = $9d
Buffer[17] = $80
Buffer[18] = $01
Buffer[19] = $6c
Buffer[20] = $00
Buffer[21] = $b8
Buffer[22] = $d5
Buffer[23] = $80
Buffer[24] = $01
Buffer[25] = $70
Buffer[26] = $00
Buffer[27] = $b8
Buffer[28] = $b7
Buffer[29] = $80
Buffer[30] = $01
Buffer[31] = $6a
Buffer[32] = $00
Buffer[33] = $b8
Buffer[34] = $57
Buffer[35] = $80
Buffer[36] = $01
Buffer[37] = $6c
Buffer[38] = $00
Buffer[39] = $b8
Buffer[40] = $c4
Buffer[41] = $80
Buffer[42] = $01
Buffer[43] = $7c
Buffer[44] = $00
Buffer[45] = $b8
Buffer[46] = $a7
Buffer[47] = $80
Buffer[48] = $01
Buffer[49] = $69
Buffer[50] = $00
Buffer[51] = $27
Buffer[52] = $0f
Buffer[53] = $80
Buffer[54] = $01
Buffer[55] = $8b
Buffer[56] = $00
Buffer[57] = $8b
Buffer[58] = $73
Buffer[59] = $80
Buffer[60] = $00
Buffer[61] = $0b
Buffer[62] = $00
Buffer[63] = $b8
Buffer[64] = $a3
Buffer[65] = $80
Buffer[66] = $01
Buffer[67] = $6c
Buffer[68] = $00
Buffer[69] = $b8
Buffer[70] = $88
Buffer[71] = $80
Buffer[72] = $01
Buffer[73] = $6c
Buffer[74] = $00
'************************************************* ********
Pause 5 ' If this PAUSE is removed the program crashes
' ************************************************** *******
Buffer[75] = $b8
Buffer[76] = $ce
Buffer[77] = $80
Buffer[78] = $01
Buffer[79] = $7c
Buffer[80] = $00
Buffer[81] = $b8
Buffer[82] = $aa
Buffer[83] = $80
Buffer[84] = $01
Buffer[85] = $6e
Buffer[86] = $00
Buffer[87] = $f6
Buffer[88] = $71
Buffer[89] = $80
Buffer[90] = $00
Buffer[91] = $0b
Buffer[92] = $00
Buffer[93] = $2c
Buffer[94] = $88
Buffer[95] = $80
Buffer[96] = $01
Buffer[97] = $93
Buffer[98] = $00
Buffer[99] = $14
Buffer[100] = $a5
Buffer[101] = $80
Buffer[102] = $01
Buffer[103] = $84
Buffer[104] = $00
Buffer[105] = $b8
Buffer[106] = $ae
Buffer[107] = $80
Buffer[108] = $01
Buffer[109] = $69
RETURN


SendMsg_2:
@ NOP
RETURN


The target is the 18F87J50 and I'm using a PIC Clicker 2 from MikroE with their USB bootloader so the CONFIG is whatever their BL is using. No hardware other than what's on the PIC Clicker 2 itself is connected. The board is powered by the USB cable which is also used to download the code via their bootloader.

At this point the code isn't really doing much at all. If one button is pressed does a NOP and returns, if the other button is pressed it loads an array with values and returns.

Here's the thing: See that PAUSE 5 about 2/3 down thru the SendMsg_1 subroutine?
If I remove that PAUSE 5 statement the program works fine until that SendMsg_1 routine is called at which point it just locks up. Do not ask me how I came to that conclusion, it's been MANY hours of head scratching and trial'n'error and I'm probably close to having worn out the FLASH on the poor 87J50.

I do believe it's some sort of RAM allocation/acces issue because the very last thing I tried before posting here was to remove the declaration of the CRC variable (which isn't used) and then it all of sudden worked without the PAUSE statement in there. But if it IS some sort of RAM allocation/access issue what difference does the PAUSE do? And what exactly am I doint wrong? Even if I go back to my full program and it starts to work if I put that magic PAUSE 5 in there I really don't trust it - besides I really can't waste 5ms...

Compiling with PBP 3.1.2.4, assembling with MPASM 5.84.

Any insight here would be most appreciated, thanks!

/Henrik.

richard
- 12th September 2020, 08:02
can't see any reason to it. the pause is not inserted where the banks rollover , its not on a code page boundary
there is no difference in the lst that indicates a difference between loads.
perhaps the flash did wear out or the bootloader wierds out somewhere
i don't have one of those chips and my proteus has not got that model so i cant add much more than
try loading the buffer this way ,for a different perspective
[on a 18f26k22 but you can work it out]

#CONFIG
CONFIG FOSC=INTIO67, PLLCFG=OFF, PRICLKEN=OFF, FCMEN=OFF, IESO=OFF
CONFIG PWRTEN=OFF, BOREN=SBORDIS, BORV=190, WDTEN=ON, WDTPS=32768
CONFIG CCP2MX=PORTC1, PBADEN=OFF, CCP3MX=PORTB5, HFOFST=ON, T3CMX=PORTC0
CONFIG P2BMX=PORTB5, MCLRE=EXTMCLR, STVREN=ON, LVP=OFF, XINST=OFF, DEBUG=OFF
CONFIG CP0=OFF, CP1=OFF, CP2=OFF, CP3=OFF, CPB=OFF, CPD=OFF, WRT0=OFF
CONFIG WRT1=OFF, WRT2=OFF, WRT3=OFF, WRTC=OFF, WRTB=OFF, WRTD=OFF, EBTR0=OFF
CONFIG EBTR1=OFF, EBTR2=OFF, EBTR3=OFF, EBTRB=OFF
#ENDCONFIG

define OSC 64
OSCCON = $70 ; 64Mhz
OSCTUNE.6 = 1 ; Enable 4x PLL

flashcount var word
goto overasm
ASM
Flash2Ram macro buffer, msg,size ; Fills the buffer from flash msg
movlw UPPER msg
movwf TBLPTRU
movlw HIGH msg
movwf TBLPTRH
movlw LOW msg
movwf TBLPTRL
movlw low buffer
movwf FSR2L
movlw High buffer
movwf FSR2H
MOVE?CW size ,_flashcount
L?CALL bfill
endm

bfill
tblrd *+
movf TABLAT,w
movwf POSTINC2
CHK?RP _flashcount
MOVF _flashcount,w ;DECREMENT 16
BTFSC STATUS,Z
DECF _flashcount +1, f
DECF _flashcount , f
BNZ bfill
MOVF _flashcount +1,w
BNZ bfill
RST?RP
return
ENDASM
overasm:

DEFINE DEBUG_REG PORTB
DEFINE DEBUG_BIT 7
DEFINE DEBUG_BAUD 9600
DEFINE DEBUG_MODE 0

TRISB.7=0 ;DEBUG
LATB.7=1 ;DEBUG


ansela=0
trisa=%11001111
BTN1 VAR Porta.2
BTN2 VAR Porta.3
LD1 VAR LATa.4
LD2 VAR LATa.5

CRC VAR WORD
Number_Of_Bytes VAR BYTE

Buffer VAR BYTE[256]





LD1 = 0
LD2 = 0
T0CON = %10000111 ' 1:16 prescaler TMR0 on

Main:
IF INTCON.2 = 1 THEN ' TMR0 interrupt flag?
LD2 = ~LD2 ' Toggle LED so we know we're alive and kickin'
INTCON.2 = 0
ENDIF


If BTN1 = 0 THEN
LD1 = 1
GOSUB SendMsg_1
LD1 = 0
WHILE BTN1 = 0 : WEND
ENDIF

IF BTN2 = 0 THEN
LD1 = 1
GOSUB SendMsg_2
LD1 = 0
WHILE BTN2 = 0 :WEND
ENDIF

Goto Main


SendMsg_1:
@ Flash2Ram _Buffer, _msg1,110
' Buffer[0] = $87
' Buffer[1] = $02
' Buffer[2] = $06
' Buffer[3] = $00
' Buffer[4] = $40
' Buffer[5] = $00
' Buffer[6] = $00
' Buffer[7] = $ea
' Buffer[8] = $c4
' Buffer[9] = $50
' Buffer[10] = $00
' Buffer[11] = $62
' Buffer[12] = $3b
' Buffer[13] = $30
' Buffer[14] = $00
' Buffer[15] = $b8
' Buffer[16] = $9d
' Buffer[17] = $80
' Buffer[18] = $01
' Buffer[19] = $6c
' Buffer[20] = $00
' Buffer[21] = $b8
' Buffer[22] = $d5
' Buffer[23] = $80
' Buffer[24] = $01
' Buffer[25] = $70
' Buffer[26] = $00
' Buffer[27] = $b8
' Buffer[28] = $b7
' Buffer[29] = $80
' Buffer[30] = $01
' Buffer[31] = $6a
' Buffer[32] = $00
' Buffer[33] = $b8
' Buffer[34] = $57
' Buffer[35] = $80
' Buffer[36] = $01
' Buffer[37] = $6c
' Buffer[38] = $00
' Buffer[39] = $b8
' Buffer[40] = $c4
' Buffer[41] = $80
' Buffer[42] = $01
' Buffer[43] = $7c
' Buffer[44] = $00
' Buffer[45] = $b8
' Buffer[46] = $a7
' Buffer[47] = $80
' Buffer[48] = $01
' Buffer[49] = $69
' Buffer[50] = $00
' Buffer[51] = $27
' Buffer[52] = $0f
' Buffer[53] = $80
' Buffer[54] = $01
' Buffer[55] = $8b
' Buffer[56] = $00
' Buffer[57] = $8b
' Buffer[58] = $73
' Buffer[59] = $80
' Buffer[60] = $00
' Buffer[61] = $0b
' Buffer[62] = $00
' Buffer[63] = $b8
' Buffer[64] = $a3
' Buffer[65] = $80
' Buffer[66] = $01
' Buffer[67] = $6c
' Buffer[68] = $00
' Buffer[69] = $b8
' Buffer[70] = $88
' Buffer[71] = $80
' Buffer[72] = $01
' Buffer[73] = $6c
' Buffer[74] = $00
' Buffer[75] = $b8
' Buffer[76] = $ce
' Buffer[77] = $80
' Buffer[78] = $01
' Buffer[79] = $7c
' Buffer[80] = $00
' Buffer[81] = $b8
' Buffer[82] = $aa
' Buffer[83] = $80
' Buffer[84] = $01
' Buffer[85] = $6e
' Buffer[86] = $00
' Buffer[87] = $f6
' Buffer[88] = $71
' Buffer[89] = $80
' Buffer[90] = $00
' Buffer[91] = $0b
' Buffer[92] = $00
' Buffer[93] = $2c
' Buffer[94] = $88
' Buffer[95] = $80
' Buffer[96] = $01
' Buffer[97] = $93
' Buffer[98] = $00
' Buffer[99] = $14
' Buffer[100] = $a5
' Buffer[101] = $80
' Buffer[102] = $01
' Buffer[103] = $84
' Buffer[104] = $00
' Buffer[105] = $b8
' Buffer[106] = $ae
' Buffer[107] = $80
' Buffer[108] = $01
' Buffer[109] = $69
Number_Of_Bytes=0
debug 13,10
while Number_Of_Bytes <109
debug hex Buffer[Number_Of_Bytes]
Number_Of_Bytes=Number_Of_Bytes+1
if Number_Of_Bytes//10 then
debug ","
else
debug 13,10
endif
wend
debug hex Buffer[Number_Of_Bytes],13,10
RETURN


SendMsg_2:
@ NOP
RETURN


msg1:
ASM
DB 0X87,0X02,0X06,0X00,0X40,0X00,0X00,0Xea,0Xc4,0X50, 0X00,0X62,0X3b,0X30,0X00,0Xb8
DB 0X9d,0X80,0X01,0X6c,0X00,0Xb8,0Xd5,0X80,0X01,0X70, 0X00,0Xb8,0Xb7,0X80,0X01,0X6a
DB 0X00,0Xb8,0X57,0X80,0X01,0X6c,0X00,0Xb8,0Xc4,0X80, 0X01,0X7c,0X00,0Xb8,0Xa7,0X80
DB 0X01,0X69,0X00,0X27,0X0f,0X80,0X01,0X8b,0X00,0X8b, 0X73,0X80,0X00,0X0b,0X00,0Xb8
DB 0Xa3,0X80,0X01,0X6c,0X00,0Xb8,0X88,0X80,0X01,0X6c, 0X00,0Xb8,0Xce,0X80,0X01,0X7c
DB 0X00,0Xb8,0Xaa,0X80,0X01,0X6e,0X00,0Xf6,0X71,0X80, 0X00,0X0b,0X00,0X2c,0X88,0X80
DB 0X01,0X93,0X00,0X14,0Xa5,0X80,0X01,0X84,0X00,0Xb8, 0Xae,0X80,0X01,0X69
ENDASM

HenrikOlsson
- 12th September 2020, 23:39
Thanks Richard!
This one's got me stumped for sure. After posting yesterday I had to get on with what I was doing so I just left the PAUSE 5 in there and added the remaining pieces of code back in and it worked fine. Then, this morning, after seeing your post about that there "should not be" any difference I removed the PAUSE 5 and it still worked (!)

Then, even more stumped, I went back to the code I posted yesterday and it's repeatable. Without the PAUSE 5 it hangs. If I then remove the declaration of the CRC variable it works again. With the CRC variable declared AND the PAUSE 5 it also works.

This piece of code is just something that I need to send messages to the device I'm actually trying to write the code for. As it stands I've probably spent 3/4 of the time on the code for the "helper" device and 1/4 of the time on the ACTUAL device. The end device is also using the 87J50 so I can't help worrying that something bad might be happening.

Over the years I've come to trust what the PBP compiler produces and I can not remember a single time that an issue has not been MY fault but, honestly, this thing I do not understand.

Thank you for the ASM array loader routines, I will most certainly try those when I get this piece of code out the door. Doing it "my way" is uggly and using ARRAYWRITE doesn't really work for hundreds of bytes.

/Henrik.

richard
- 13th September 2020, 05:48
are you using 3.1xx ?
i looked at a 3.0xxxx lst file [Buffer start was 0x1f with crc in so 0x1f + 74d= (0x69) and not near the bank end]
it might pay to see if adding the crc var pushed the array over a bank boundary or not near the "pause", the pause may get the banksel re-evaluated somehow to obscure the defect.
if that's the case it will indeed be a nasty bug just waiting to pounce if not addressed

HenrikOlsson
- 13th September 2020, 10:38
Yes, PBP 3.1.2.4
If I leave the CRC variable declared it puts it at 0x0026 and Buffer at 0x0029.


_BTN1 _PORTD??7
_BTN2 _PORTH??3
_Buffer 00000029
_CRC 00000026
_DEVID1 003FFFFE


If I remove the CRC variable then Buffer starts at 0x0027 instead.

With CRC variable declared (Buffer starting at 0x0029) it crashes but if I then comment out the lines that access Buffer[108] and Buffer[109] it does NOT crash.
If I comment out the access to Buffer[107] or any other (not that I've tried them all but anyway) it still crashes.

So I thought it "had to be" something with the RAM access but then I added that PAUSE 5 again, which magically makes everything work.
CRC is still being put at 0x0026 and Buffer at 0x0029 so now I'm thinking it's NOT a RAM thing but a code memory thing....

Adding the PAUSE 5 adds library code pushing "my" code down.
I can put the Pause 5 anywhere within the SendMsg_1 subroutine as long as it is before the last two lines, accessing Buffer[108] and Buffer[109].

Oh, it just hit me - I've been compiling with LONGs enabled.
If I disable that it puts CRC at 0x001C and Buffer at 0x001F and it works without the Pause 5 in there which makes me think it's not a code thing but a RAM thing...

Could it have anything to do with the bootloader?
It seems custom to DEFINE RESET_ORG 1000h for USB HID bootloader but if I do that with the MikroE USB HID bootloader it seems to get stuck in bootload mode.

I'm really lost here, don't understand and can't figure out WTF is happening.

richard
- 13th September 2020, 11:21
Could it have anything to do with the bootloader?

it would be provable , the board has access to a icsp with a modified cable you could overwrite the loader easily enough.
you can download the loader from mikroe to replace it afterwards
since you have repeatable example to offer charles it could not hurt to ask his advice.
the 108 cell in buffer is pretty close to access bank sfr range , i wonder if there is a stray access ram bit being set in an instruction
that could all sorts of bad stuff.
have / can you re-verify code ?
can you blank a code range and verify it is blank. [bootloaders are they worth the doubts?]

richard
- 13th September 2020, 11:30
another thought, on the non long version
if you load up another 30 or so cells in buffer with 00 or ff do problems recur
[surely its not just 01 or 0x69 that causes issue]

HenrikOlsson
- 13th September 2020, 17:19
The plot thickens...

Long stort short, flashed the device with the PICKit3, could not reproduce the error. Reflashed the bootloader into it and now I can no longer reproduce the error at will - even my ACTUAL code now works without that magic PAUSE 5 in there.

Before doing that I had reduced the test code to this:

Main:
IF INTCON.2 = 1 THEN ' TMR0 interrupt flag?
LD2 = ~LD2 ' Toggle LED so we know we're alive and kickin'
INTCON.2 = 0
ENDIF

If BTN1 = 0 THEN
' Buffer[107] = $00 ' Testcase with LONGs
Buffer[117] = $00 ' Testcase without LONGs
WHILE BTN1 = 0 : WEND
ENDIF

IF BTN2 = 0 THEN
' Pause 5
' Buffer[108] = $00 ' Testcase with LONGs
Buffer[118] = $00 ' Testcase without LONGs
WHILE BTN2 = 0 :WEND
ENDIF

Goto Main

As soon as the code for BTN2 was executed it crashed while the code for BTN1 worked fine. Enabling the PAUSE 5 made it work. And the value being written made no difference.

As for the question if bootloaders are worth it I'd say maby. In this particular case I'm juggling two boards but I only have one PICKit so swapping between boards would be less than ideal. The USB bootloader is super fast but makes up for that by forcing you to manually reset the device and press connect in the PC application before the bootloader times out.

Strange one this.

amgen
- 14th September 2020, 03:59
how about contact bounce on button switches ?

richard
- 14th September 2020, 04:58
the only thing i can think of it to verify the flash as written by the bootloader is set the way it was intended to be.

Ioannis
- 14th September 2020, 09:04
Can you test the program on different kind of PIC?

Also, without bootloader, it works OK without the magic PAUSE 5 line?

It does seem a RAM problem related to LONGs but Charles may help more.

Ioannis