I understand your improvement, and I realized that I hadn't considered latency, but at 40Mhz, it is only a few uSec.

Even with a small timing error, it should get the first few bits right - but doesn't!