In raw instruction rate terms, a PIC running at 40Mhz executes 10 million instructions per second, assuming there are no jumps, returns, or anything else that uses 2 cycles to execute said instruction.
So, some Atmel chips, like the SX28/SX52 chips, are able to execute a single instruction every clock cycle, and running with the same oscillator frequency as a PIC will appear to be 4 times faster.
But there's more to it than that. If your particular processor doesn't have the instructions you need, and you have to emulate a certain instruction by using multiple other instructions, then your effective rate of execution can drop dramatically.
For instance, the PIC18Fxxx series has a built in 8x8 single cycle multiply instruction. If the Atmel chip didn't have that instruction, it would have to emulate that by using a number of other instructions in a loop, thereby making the PIC look like the superior processor.
Same thing with a divide instruction. The PIC18xxx series doesn't have a divide instruction built into it, so you have to do manual division in a loop...big use of time and cycles. Maybe the Atmel has a built in divide instruction. In that case, the Atmel would appear to be the superior chip.