Not wishing to be rude, but how can 255 iterations be quicker than NCD or the in-line assembler equivalent?

George