9x2x8 lcd "command delays" per text line is probably the worst delay
your snippet lacks that exact detail


the st7920 chip has a spi i/f that's a little faster and uses less pins too

there are better methods, at least use a graphics row buffer

even unpacking a "DA" packed font from flash is fasted than bit banged i2c by a large margin


this example uses 16 byte row data transfers for either i/f [way more efficient]
http://www.picbasic.co.uk/forum/showthread.php?t=24218
post# 26/27