There's the problem ...
Code:
TMR1L=0
TMR1H=128
@ INT_RETURN
By the time it gets all the way through the interrupt handler, another tick or two may have been counted by Timer1.
That gets lost when you reload both bytes of the Timer.

For 32768 crystals, the reload is really simple ...
Code:
TMR1H.7 = 1
And it's best to put it at the beginning of the handler.
Although in your case, the handler is fast enough it won't matter were it is.