Well, things have been running happily for a couple hours, so I thought I'd post the solution and possibly help others that get bit by this.
By cranking up how quickly UART packets are received (the sender is dumping 10X faster), and watching what gets sent or stored, I tracked the problem back to the UART routine. Even thought packets were received correctly and passed validity test, they were sometimes getting stored in a variable with errors. Makes no sense, right?
Well, incoming UART packets are stored in an elastic buffer. They are written to an array until an "end of transmission" packet is received, then the number of bytes are checked, checksum calculated, etc. etc.
When the PIC is off dealing with USB, or any number of other things, it is possible for the array to grow beyond it's defined size.
When you write to an array beyond it's defined size, strange things happen. It's almost as if the array provides a window on memory, and writing outside the range moves the window. No errors, no crashes, just bad values at the beginning of the array, and a few less hairs on the head of the coder. Doubling the size of the array seems to have resolved the issue.
Now to clean up the mess I made of the code while chasing this one... On the plus side, I did uncover a couple other bugs I might have missed without this snipe hunt.
Bookmarks