I've finally got around to doing some proper tests on the SPI buffers.

I can confirm that de-asserting SS does not clear the slave's transmit buffer. This is almost certainly why my original code wasn't working properly.

Disabling and re-enabling the SPI module does clear the transmit buffer. I didn't need any NOPs between disabling and re-enabling so the whole thing only takes 2 instruction cycles and it's easy enough to program the master to pause for a few NOPs after de-asserting SS to make sure it doesn't try to send again while the slave is still resetting.
This method of resetting the buffer does feel a bit wrong but it's the only thing I've found that works.

I've also tried another method. This involves keeping the SPI module disabled and only enabling it once SS is asserted. I created a change notification interrupt handler to do this automatically.
Code:
.global __CNInterrupt
__CNInterrupt:

BTSS	ss_reg, #ss
BSET	SPI1EN_reg, #SPI1EN
BTSC	ss_reg, #ss
BCLR	SPI1EN_reg, #SPI1EN
RETFIE
When the SS line is asserted (active low) the module is enabled and when it is de-asserted (high) the module is disabled again. Just keep in mind that the master must pause after asserting SS. The enable/disable code is 4 instructions long and the PIC will also take a few cycles to jump into the interrupt. Notice how the BTSS/BSET come first though. This means that the module will be enabled 2 cycles faster than it will be disabled (enabling fast is likely to be more important).

If multiple SPI slaves are used then the data output pin must be set as an input. This is because disabling the module will turn the pin back into a standard IO pin. If it's configured as an output then it will hold the line in 1 state which will prevent any other slaves from sending to the master.
Note: Some PICs may require the TRIS to be set to output in order for the SPI module to work. Thankfully the ones that I use bypass the TRIS when the module is enabled.

I still don't like the fact that the module is being disabled at all but this way does feel a bit nicer and I guess there will probably be a small power saving while the module is off so this is the method I'll be using in my SPI slaves until someone can point be in a better direction.

Hopefully other people will find this post useful if they come across the same problem.