<aside> 🚧 This is a bunch of sketchy notes for myself so far — far from even a first draft.
The final draft is now available on my blog 🔗.
</aside>
We’ve used color theory to build better tools and interfaces for understanding and working with color. We’ve used theory of sound and sound reproduction to build better tools for understanding and creating music. Let’s do that for the thoughts we read and speak.
This post is mean to be a kind of vision doc, to communicate what I hope will be possible as we understand neural models better and become better at channeling those insights into building better interfaces to control them and use them to understand the world. I wrote more about this direction of thought in https://stream.thesephist.com/updates/1706984448.
I really love the way Dynamicland’s specific personality of illustration brings their ideas to life. But because we’re trying to communicate ideas about how humans should interact with information, I would be really delighted if these illustrations could come to live and animate. I’m not super tied to a particular style of illustration as long as there’s coherency throughout.
Highlights over text to communicate a kind of “heatmap” for information
The key benefit of heatmaps is that it lets the user browse and navigate very large corpuses with ease. Maybe we can show an interaction where the user starts with a collection of books or PDFs, turns on some lenses/filters for specific features, and then zooms into those parts to find relevant passages and paragraphs.
Also room for some kind of “semantic syntax highlighting.” here.
A Mel spectrogram-style visualization of GPT-2 small sparse autoencoder features over tokens.
Animation could be something like a user reading a long scrolling story, and swipes in from the scrollbar edge of the screen to expand it to reveal the spectrogram. Scrubbing over a particular feature frequency “column” expands that column and tells you what that feature is, also reveling related feature columns/frequencies.
Feature decomposition is actually quite analogous to Fourier transforms, which is how we get spectrograms.
https://x.com/graycrawford/status/1741538324819652736
Contextual menu for semantic editing actions. Hovering over a specific edit control shows all possible variations.
For example, when selecting a line in a technical document, you could imagine a toolbar that gives you some sliders for technical complexity, and a few drop-downs for perspective or tone.
For text, each variation should just show the diff. The diff is not at the token level, but semantic — it shows a heatmap of which tokens’ log likelihood was boosted or diminished the most by that particular feature’s presence.
https://twitter.com/redblobgames/status/1749145157096899058
MIDI controller for text — I want to edit text with this thing 👉
Each row of dials corresponds to a different feature class.
Touching each slider shows a heatmap of token logprobs most likely to be pushed up and down with that particular control. Moving it edits the text, showing most affected token logprobs.
Maybe the text itself can hover over the device like in AR, or you point the device at some text in a book or on paper and it changes.