Hi,
I'm definitely no expert on the low level stuff but I do think there's some misunderstaning / confusion going on here.

When you execute a GOTO Label it's NOT like the program starts from the top and then "searches" thru the code until it finds the Label to which you want to GOTO.

The physical adress of the piece of code starting "at" Label is known when the program is compiled so the GOTO Label is simply replaced with GOTO $xxxx where xxxx is the physical adress in program memory where the particular piece of code is located. A GOTO always takes TWO instruction cycles, no more no less.

But, some PICs have its program memory divided into pages and when a GOTO is executed it's important that it GOTO's the adress in the correct page. The compiler handles this for you when needed (that's the warning you see sometimes, Code crosses page boundries or something like that). Making sure you're in the correct page does take a couple of instructions as well so all in all the GOTO CAN take more than 2 instruction cycles but not much. So it CAN make a (small) difference rearanging the routines so it doesn't need to jump across page boundires when executing a GOTO but most of the time you simply wouldn't care.

In this particular case the GOTO Main will take 2 cycles. Only if the Main routine would end up starting in one page and ending in the next would it take more than 2 cycles.

Interrupt service routines are one thing where it can matter. Since the hardware interrupt vector is at adress $0004 (IIRC) it makes sense (if you're looking for the absolute best performance) to place your interrupt service routine at the top of the program (and jump over it of course). If you don't do this it MIGHT end up in another memory page and you'll loose a few cycles every time it needs to set the correct page bits before jumping to the ISR.

Again, I'm no expert so I might very well have some details wrong - I'm happy to be corrected.

/Henrik.