The example where it’s the loudest in the video (quite an obvious one),
I think I would have 80’s style speech recognition down.
That variation of the code was very repeatable for words/phrases.