I'd try mode 5 for SHIFTOUT and mode 6 for SHIFTIN.
I would also add some code to make sure #CS is high for a while during startup. Then I would add a scope or logic analyzer to verify that what you think is happening is actually happening at the hardware level.