If it is very time sensitive, you might consider Proton. It does the sqrt in about 380 cycles. It is by far the most efficient sqrt routine I've found, and I tested about a half dozen last year, most of them in asm. I don't know of any other compilers that use the routine, but it sure is tight. It's not original, though--I saw the asm code in a couple places. You'll have to search, though, because I don't have my notes on it.
Bookmarks