Debugin produces less code and runs faster than serin or serin2 so you can use a slower crystal.
I have used it for 9600 baud comms with a 4MHz many times without problems so I find it strange that I have the problem with this particular app.

I'm afraid I wouldn't know how to go about doing my own routines.