Thanks everyone.
So let's answer step by step.
There are no significant delays with ST7920 itself nor SPI should be used for it. I'm looking for best compatibility, and with LCDOUT it works like charm. When reading data from built-in eeprom of PIC16F886, I can update whole, 144x32 pixels screen at 20 fps. That's fully enough.
Regarding the SPI flash, I tried to use 25xx series, but was not able to connect to it via PBP. And by the way, as I know, there is no direct support for hardware MSSP in PBP, so it will be slow as I2C, or I have to use some large inserts on ASM or whatsoever, which will be again limitation in available hardware, like need for PIC18 series, while my code above can be used even on 16F628A.
Maybe I should use some old style, parallel eeprom, like ones used in old motherboards?