Last month’s coverage on the LLM space was just the tip of the iceberg. Since then, it’s been a WILD month of new announcements (I’ve included a few older, but relevant stuff too). Let’s go!
OpenAI launched an open-source speech-to-text model called Whisper that’s capable of 75+ languages out of the box with high accuracy. Some example use cases we’ve seen include subtitle generation and a Whisper/Stable Diffusion combo.
We saw two generative Audio AI models: AudioGen which can produce audio based on text (eg. birds chirping, engine humming, etc.) and AudioLM which can continue a speech or piano tune based on a 3 second input, leveraging language models to produce coherency in the continuation.
We’ve seen two models of brain-to-text, where they successfully translated brain waves into corresponding text! One announcement from Meta AI, which was a bit more vague, and this paper which shows their model worked for decoding what a person was listening, thinking, or watching.
Around AI characters, a few recent ones include using transformers to build agents capable of mastering games, text-to-motion – where you convert text prompts into a rigged 3d action, teaching AI bodies to play soccer, and Character.ai – a chatbot builder launched by an ex-Google researcher who helped invent their LLM models.
Going back to text-to-image, there’s been awesome continued experiments: this model you can train to your face, this model let’s you tune a model’s style with 3-5 images, this model incorporates language models to guide the prompt for better results, Lexica added reverse image search, and this storybook creator creates an illustrated story based on a simple prompt.
If that’s not enough, Adept Labs announced an Action-Transformer which can surf the web and make purchases, Science.io launched their clinical NLP API, Replit’s AI mode is wild (and they keep adding features like this voice to command feature), and apparently you can train a GPT-3 quality model for ~$450k (code is available).