Without getting into asm or anything, how about this (untested):
Code:
OFFSET_INCR CON NumberOfDisplays + 2
ScrollLeft:
    ' Shifts content of the screenbuffer one column to the left. It does not redraw the display.
    ' Input: None
    ' Output: None    
    ' This version: 1.30ms for 36 displays @64MHz

    ' do for Col < NumberOfDisplays+1
    FOR Col = 0 to NumberOfDisplays                        ' 8 columns on left and right side, outside of display area
        Offset = Col
        For Row = 0 to 9                                                   ' One invisible row top and bottom of display area to facilitate vertical scrolling
            'Offset = Row * (NumberOfDisplays + 2) + Col 
            MAX7219_Value = FrameBuffer[Offset] >> 1

            FrameBufferByte = FrameBuffer[Offset + 1]
            MAX7219_Value.7 = FrameBufferByte.0

            FrameBuffer[Offset] = MAX7219_Value
            Offset = Offset + OFFSET_INCR           ' basically Row * (NumberOfDisplays + 2)
        NEXT
    NEXT

    Col = NumberOfDisplays + 1                        ' 8 columns on left and right side, outside of display area
    Offset = Col
    For Row = 0 to 9                                                   ' One invisible row top and bottom of display area to facilitate vertical scrolling
        'Offset = Row * (NumberOfDisplays + 2) + Col 
        MAX7219_Value = FrameBuffer[Offset] >> 1
        FrameBuffer[Offset] = MAX7219_Value
        Offset = Offset + OFFSET_INCR           ' basically Row * (NumberOfDisplays + 2)
    NEXT
RETURN
That cuts it down to 1.3ms