Like Henrik said, @ 4Mhz the DIG command takes longer than the interrupt period.
But it's a little different than you might think.

The digit number that you request makes a huge difference in the execution time.

@ 4Mhz, DIG 0 takes ~322us, well within the 1ms period.
But DIG 1 has to do another divide, so it takes ~638 uS.
DIG 2 = 950us.
And finally, DIG 3 takes ~1.28ms, obviously too long to be in a 1ms interrupt.
After that, the timer reload leaves it overflowed and it has to count all the way back to 65536 before the next interrupt happens. Which is why it seemed like it was doing 4 interrupts at a time.



These times were measured by wrapping the DIG command with HIGH/LOW statements and reading the pulse width on a scope.
Code:
HIGH PORTA.3
    n = Value DIG i    ' this statement causes missed interrupts after four are caught
LOW PORTA.3
After bumping up the oscillator to 16Mhz internal, the commands take 1/4th of the time and everything works within the 1ms.

But since the digit only changes when the variable in the mainloop changes, there's no need to calculate it every time in the interrupt handler.
Setting the segment patterns in the mainloop will reduce the interrupt handlers time significantly. Then the interrupt handler just puts the pattern to the pins.
You could probably run at ~20Khz, not that you would want to. 1Khz is fine and there lots of time left over for the mainlop to use.

So that brings us back to how could it work on PBP 2.60 with a 16F, and not on PBP3 with an enhanced core..

Well, I don't think that's possible.
I ran the same program on a 16F886 @ 4Mhz, and DIG took even longer
DIG 0 = 385us
DIG 1 = 760us
DIG 2 = 1.15ms
DIG 3 = 1.54ms

Maybe you were running at a different frequency and didn't remember it?