I realize you have reached your target, but here are some tips to help in the future.

The shiftin/shiftout commands are dreadfully, painfully, pathetically slow. And also very slow. If you use an onboard a/d, or at the least, use the hardware SPI module to talk to an external a/d, you will gain a huge amount of thru-put... On the order of 10x faster, if the off-chip a/d can handle it.

Another trick is to send the data as raw values, and format it on the receiving end since it's fixed width. That will save a minimum of 30% in your case, 20% if you want to leave an alignment byte.