I'm working on my MAX7219 display code and I'm in optimization mode. Have managed to get from 22ms to 6.5ms for redrawing a screen of 81 matrices (at 2MHz SPI clock) and I don't think I can squeeze more performance out of that particular section of the code at this moment. Next in line is the scrolling of the framebuffer, here's the code for it:
Code:
NumberOfDisplays  CON 36
FrameBufferSize   CON NumberOfDisplays * 10 + 20 

FrameBuffer       VAR BYTE[FrameBufferSize]
Row               VAR BYTE
Col               VAR WORD
Offset            VAR WORD
FrameBufferByte   VAR BYTE
MAX7219_Value     VAR BYTE


ScrollLeft:
    ' Shifts content of the screenbuffer one column to the left. It does not redraw the display.
    ' Input: None
    ' Output: None    
    ' This version: 3.5ms for 36 displays @64MHz

    FOR Col = 0 to NumberOfDisplays + 1                        ' 8 columns on left and right side, outside of display area
        For Row = 0 to 9                                                   ' One invisible row top and bottom of display area to facilitate vertical scrolling
            Offset = Row * (NumberOfDisplays + 2) + Col 
            MAX7219_Value = FrameBuffer[Offset] >> 1

            ' As long as the next byte is NOT the last byte in the row:
            IF Col < (NumberOfDisplays + 1) THEN
              FrameBufferByte = FrameBuffer[Offset + 1]
              MAX7219_Value.7 = FrameBufferByte.0
            ENDIF

            FrameBuffer[Offset] = MAX7219_Value
        NEXT
    NEXT
RETURN
Any ideas on how to make this run faster?