If you use GOTO DONE, it limits the use of your include to 14-bit devices.
In the 18F library, it's called DUNN.

There's no need to use DONE, since the FSR hasn't been changed, and RETURN resets the BANK to 0 anyhow.

The bsf STATUS, C is only required if the end user is using the SEROUT2 command with a Flow Control Pin.
My observations have concluded that nobody uses that function. Or if they understand how to use Flow Control, they probably aren't using LCD_AnyPin.

However, to cover ALL the bases ... you are correct! bsf STATUS, C should be there. But it needs an @ sign.

As for removing the delays ... I don't think that's a good idea.
The measurement I took previously was 1.8 mS per byte.
Most good LCD's take 1.6 mS for a Clear Screen, so it would be fine (with a good LCD).
But there are LCD's that take longer, 2.4 mS is not uncommon.

You could remove the LCD_DATAUS part since it will always be under 1.8 mS.

But I think the ability to add delays for LCD_COMMANDUS should remain in the program.
You can easily set LCD_COMMANDUS to 0 in the main program for normal displays.
For slower LCD's, just set it to the difference of say 2.4 mS - 1.8 ms = 0.6 mS or 600 uS.