Are you using the s/w bit bang command, or the hardware bus commands? That's about as fast as the s/w commands can go.
Have you tried reducing the values of your pullup resistors? Dropping down to 5k or less can sometimes reduce the transition times, and it is especially helpful if they are "on the fence" as far as timing goes.