It is indeed a complex problem. In any option chosen, there would be some overhead lower or higher I guess.

I think I will end up in the asm code to rotate the variables.

Ioannis