PBP is fast for most operations, and if you need more speed in certain circumstances, you can just code in ASM.

But I don't think you will have luck decoding MP3 in software with any 8 bit chip. When you are dealing with data that is 16 bits or larger, 8 bit processors are less than half as fast as 16 bit devices running the same clock speed.