Use a 16 bit counter in assembly to count to 0FFF.
In that loop, use the instruction -

btg LATA.3

You can't get much quicker than that.

I haven't counted exactly, but it looks to be under 14 cycles. At 40Mhz, that is
1.4uSEC = 714 Khz.