Hi! I currently wear a few hats as an Applied Scientist at Microsoft:
As an ML engineer, I’ve shipped world-class hybrid speech recognition models to 2 Azure APIs: CTS & Speech-to-text, that have powered products such as Word Transcribe, Twitter Spaces & PowerPoint Coach. Naturally this also meant fun times wrangling with our big-data & MLOps pipelines; a migration out of CNTK (legacy) involving lots of gnarly ML debugging; and contributing to a shiny new Pytorch hybrid ASR toolkit.
As a scientist, I closely track relevant research, adapting it as necessary (on occasion, trying to improve on it). After an initial investigation into accent robustness in '19, i’m currently diving deep into self-supervised representation learning, which I expect will soon transform all speech & audio-related tasks.
Recently, I’ve also spent time working across teams (multi-microphone processing, speech separation, ASR, diarization, NLP) for an ambitious incubation project targeting both in-person & remote conversations. Arguably the hardest ASR/NLP domain due to it’s spontaneous structure + need for customization. For a complex product like this, it is paramount to prioritize & measure the right things - which I help with via error analysis.
I entered into the code world via Python in 2015 ~6yrs ago (MATLAB doesn’t really count, does it!). In a previous life in my undergraduate, I majored in Chemical Engineering - having seen hydrofracking & nicotine manufacture up close. As part of a minor in Control Systems, I was exposed to some signal processing, which has kind of come full circle with my long-time fascination with audio at the moment :)