That's all stuff that can be done later if things work out.
I think right now, to prove that it does "optimize" the divide routine, you need a "Base Line".
How long does it take PBPL to do a single signed 32-bit divide?
And how long does it take yours?
Just a few tests at various bit sizes will tell if there's any benefit or not.
<br>
Bookmarks