I have it running on the bench, works perfectly fine with an optical encoder (Scancon 2RMHF-500, 500 lines / 2000 edges per rev) - IRL :-)
Like I said previously I do not expect it to work reliably with a mechanical encoder unless it is properly debounced in hardware. Why your ASM version works with the same encoder I do not know :-(

You have a DEBUG statement within the loop - that will obviously mess up the timing but I know you know that.