Find concepts in GPT models, real-time speech to text in the browser, H100s are coming

Posted June 14, 2024 by @deepfates

Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI.

Editor’s note

The big open source AI news this week is the release of Stable Diffusion 3 Medium. People are already doing cool things with it, but public reaction has been mixed.

On a personal note, I got banned from X Dot Com. Apparently it is against the rules to change your profile picture to the old Twitter logo and announcing “WE ARE SO BACK”.

Anyway, here’s some things that caught my eye this week. Find me on Bluesky, I guess.

— deepfates

Stable Diffusion 3 Medium

The long-awaited image generation model is related in the 2B size (no word yet about the larger 8B version).

Users say the model is much better at creating legible text, but that it has problems with anatomy and composition.

Model weights are available under a non-commercial license.

try on replicate

Cool tools

Find concepts in GPT models

OpenAI does dictionary learning on their own models to extract and interpret patterns that may to specific concepts. Similar technique to the one Anthropic used to create Golden Gate Claude.

They release a research paper and feature explorer, but also code that will steer the (practically retro at this point) GPT-2-small model.

post | paper | github | visualizer

Real-time speech to text in the browser

The Transformers.js project has implemented OpenAI’s Whisper model in JavaScript. This means you can open a browser tab, talk to it, and get an accurate transcript of your words in real time. No coding required.

demo

Research radar

A new way to tokenize images

Researchers at ByteDance, find a way to encode images into a single short vector instead of a 2D grid of patches. The new vectors can be as short as 32 elements, instead of 256 or even 1024 for existing methods.

This could make multimodal models and image generators much more compute efficient.

post | paper

Changelog

H100s are coming

We’ll soon be adding support for NVIDIA’s powerful H100 GPUs.

If you’re interested in getting early access to H100s, email support@replicate.com

changelog

Bye for now

How am I doing so far? You going to keep opening these letters? Let me know, so I can fix everything to be exactly perfect. Thanks in advance.