Just a thought...

How much faster if any would it be to check one pine and not the whole port?

If faster then maybe use a shift register into one pin.

Probably no better.