About MoMA Browser
An experimental application that makes MoMA's collection navigable through meaning, not just metadata. Built with faceted semantic embeddings, a knowledge graph, and color analysis to reveal connections no filing system could produce.
The Idea
Museums organize by department, date, and medium. These categories are useful but opaque to the connections that make art interesting: how a 1912 Cubist collage relates to a 1965 Minimalist sculpture, or why a photographer's color palette echoes a painter working thirty years earlier.
But the collection already holds deeper structure than its filing system reveals. Nearly a century of curators, directors, and acquisition committees decided what entered these walls and why. They titled the works, wrote the catalog entries, grouped the departments, named the movements, and authored the biographical notes that frame each artist's contribution. Every word in that accumulated record encodes a judgment about what matters and how things relate. The collection is not raw data waiting for computation. It is a body of curatorial thought, built by people whose position at the nexus of art, scholarship, and institutional authority let them shape what the public understands modern art to be.
Those decisions did not stay inside the museum. Each acquisition, each exhibition, each catalog essay provoked response: Clement Greenberg championing Abstract Expressionism, feminist scholars reframing who was absent, postcolonial critics challenging how non-Western art was presented, generations of students and writers drawing comparisons the curators never intended. MoMA shaped the discourse around modern art, and the discourse shaped how the world wrote about MoMA. That surrounding discourse is part of the linguistic environment modern embedding models inherit. So when a Rothko sits near certain Turners in semantic space, the neighborhood is driven by MoMA's own descriptions but also by the model's learned sense of which words and ideas tend to travel together in writing about color, transcendence, and the sublime.
MoMA Browser makes that accumulated structure navigable. Every entity in the collection has been represented as a point in a shared semantic space using OpenAI's embedding model. When we embed the descriptions, biographies, and curatorial framing attached to an item, the model compresses that language into a coordinate. Works that are related in meaning sit near each other in this space, regardless of when they were made, what department filed them, or what medium they use. The embeddings do not conjure relationships from nothing. They register what people have already said about how things connect. They also do not guarantee truth.
Like any map, this one distorts. Embeddings are sensitive to what is written down, what is missing, and what language dominates the record. Similarity is not influence, and proximity is not proof. The point is to surface candidates for attention, then let you look, read, and decide what holds.
The result is a collection you can drift through by meaning. Click a Rothko and find not just other Rothkos, but the particular Turners and Monets that share its luminous quality. Search for "anxiety about technology" and find works across a century that speak to the same concern. Follow the knowledge graph from Surrealism through its influences to the African art that also shaped Cubism.
Semantic Space
Every artwork, exhibition, artist, and art term has been embedded into a shared semantic space. The base layer uses OpenAI's text-embedding-3-small (1536 dimensions), with each entity's available metadata combined into a rich text description. Artworks carry additional faceted embeddings at higher resolution (3072 dimensions via text-embedding-3-large), each built from a different slice of the metadata so that "similar" can mean different things depending on what you are looking for.
Because all four entity types share the same embedding space, cross-type queries work naturally. An artist's embedding can find semantically similar artworks. A term's embedding can find works that embody that concept even if they were never explicitly tagged.
Faceted Embeddings
A single embedding compresses everything about an artwork into one point. This is powerful but lossy: two works might be near each other because of medium, because of subject matter, or because of historical context, and you cannot tell which. Faceted embeddings separate these concerns. Each artwork can carry up to five independent vectors, each encoding a different dimension of the work:
These facets power the Explore workbench, hybrid search, and the search facet pills (By Sight, By Meaning, By Metadata). The system degrades gracefully: any operation works with whatever facets are populated, falling back to the base semantic embedding when a specialized facet is unavailable.
Intellectual Lineage
The idea that meaning can be represented as position in a space has a history. The primitives this app uses (high-dimensional geometry, distributional semantics, cosine similarity, nonlinear projection) draw on work from mathematics, linguistics, and computer science that converged only recently.
The 1536 dimensions in each embedding are not designed by humans. No one decided that dimension 47 measures abstractness or that dimension 1200 measures warmth. The model discovered axes that, taken together, place semantically similar text near each other. Individual dimensions have no names. Meaning is distributed across all of them simultaneously, and the useful structure lives in the distances and directions between points, not in what any single axis represents.
The full stack beneath this app: Riemann's n-dimensional geometry, Firth's distributional hypothesis, Salton's cosine similarity, neural embedding training, and UMAP projection. The mathematical pieces existed for decades or over a century. The computational power and training data to combine them arrived in the last ten years.
Similarity Sources
Each entity type offers multiple ways to find neighbors. These are the "sources" you can switch between when viewing any item:
Similarity Banding
Neighbor results are grouped into bands by similarity score, computed as
100 × (1 − cosine distance)
. This is a convenience scale, not a calibrated probability. Higher means the model considers the descriptions more alike.
The "distant echoes" band often surfaces the most surprising connections. Two works with a score of 30 may share a quality neither would be filed under.
Explore
The Browser answers "what is near this work?" Explore answers harder questions: "what lies between these two works?", "where does this concept concentrate across time?", "what shares one quality but diverges on another?" It is a workbench for performing semantic operations on the embedding space itself, not just querying it.
Every operation takes two inputs (artworks or text queries) loaded into Slot A and Slot B, plus a facet selector that determines which embedding dimension to operate on. The five operations:
Each operation can run against any populated embedding facet (semantic, visual, conceptual, catalog, image), so the same pair of inputs can produce different results depending on which dimension of similarity you choose to examine.
Color System
Every artwork with an image has a 9-color palette extracted by dominant color clustering. These palettes are flattened into 27-dimensional vectors (RGB channels for each of 9 swatches) and indexed for fast euclidean distance search.
This creates a parallel similarity axis independent of meaning. Two works might be semantically unrelated but share a striking palette. The color system lets you explore this dimension directly.
Knowledge Graph
Embeddings capture that two things are similar, but not why. The knowledge graph adds named, directed relationships between entities. Each edge has a predicate, a confidence score, and evidence text explaining the connection.
Relationships were extracted using OpenAI's batch API, processing the collection in four passes (terms, artists, exhibitions, artworks). The graph currently contains 168,540 edges across 24 predicate types.
Relationship Types
Graph Traversal
The graph supports multi-hop traversal with confidence decay and shortest-path finding between any two entities. A recursive CTE walks edges up to three hops, multiplying confidence at each step so indirect connections are naturally weighted lower than direct ones.
This means you can ask: "How does this Cubist painting connect to this Minimalist sculpture?" and get a legible path through named relationships, not just a similarity percentage.
The Browser
The Browser is a single-page application that combines focused exploration with hierarchical collection browsing. All state lives in the URL hash, so every view is linkable and back-button friendly.
It operates in three modes:
AI Guide
The Guide system generates curated exhibition tours using Claude. Each tour is a sequence of artwork stops with narrative HTML, organized under immersive portals with themed entry experiences.
Tours are generated from version-controlled prompt sets, so the guide's voice and approach can be iterated independently of the application code. The system currently has 2 portals and 13 tours.
Portal experiences are rendered as full HTML documents in iframes, with postMessage-based navigation that connects door clicks to tour launches. Stop narratives are standalone HTML with injected navigation controls.
Semantic Search
Search is hybrid: every query runs through both full-text search (PostgreSQL tsvector) and vector similarity across all populated embedding columns simultaneously. Results from each channel are merged using Reciprocal Rank Fusion, so a work that ranks well in both keyword matching and semantic similarity rises to the top, while a work that only one system finds still appears.
Search facet pills let you shift the emphasis. "All" merges every signal. "By Sight" weights the visual embedding. "By Meaning" weights the conceptual embedding. "By Metadata" weights the catalog embedding. The same query can surface different works depending on which dimension you prioritize.
This is fundamentally different from keyword search. The query "loneliness in urban space" will surface Edward Hopper paintings, architecture photographs, and installation art that speaks to that theme, even if no metadata mentions the word "lonely."
Data Pipeline
The collection data comes from MoMA's open-access CSV datasets, enriched through several processing stages:
- Import: Artists, artworks, and exhibitions loaded from CSV with relational joins.
- Base embedding: All entities processed through OpenAI text-embedding-3-small (1536d) in batches. Artworks include art terms and visual descriptions in their input text.
- Faceted embeddings: Artworks re-embedded at 3072d via text-embedding-3-large across three facets (visual, conceptual, catalog), each built from a different slice of metadata.
- Image embeddings: Artwork images processed through CLIP ViT-L/14 locally, producing 768d vectors that encode pixel-level visual features independent of any text.
- Full-text indexing: PostgreSQL tsvector columns built from title, medium, classification, department, descriptions, and significance, weighted by field importance.
- Color extraction: 9-color palettes extracted from artwork images, flattened to 27-dim vectors.
- Wikipedia enrichment: Artist biographies and portraits fetched from Wikidata and Wikipedia REST APIs.
- Auto-tagging: Art terms matched to artworks by embedding proximity, extending the curated tagging.
- Graph extraction: Knowledge graph relationships extracted via OpenAI batch API in four passes.
- Indexing: HNSW indexes on all embedding and color vector columns for fast similarity search. Half-precision (halfvec) indexes on the 3072d columns.
Architecture
MoMA Browser is a Rails 8 application with no JavaScript build step. Hotwire (Turbo + Stimulus) handles interactivity, with importmaps for module loading. The Browser SPA and Explore workbench are each single Stimulus controllers that manage all state client-side, communicating with the server via JSON endpoints.
Three Layers of Meaning
The semantic infrastructure has three distinct layers that work together:
Embeddings give you intuition. Relational links give you facts. The knowledge graph gives you understanding. Together they make a collection that doesn't just organize art, but begins to interpret it.
MoMA Browser is an independent project by Jeremy Roush built in collaboration with Claude. It is not affiliated with, endorsed by, or produced by the Museum of Modern Art. All collection data comes from MoMA's open-access dataset on GitHub, published under a CC0 1.0 license. Similarity scores, graph edges, and AI-generated tours are exploratory and may be incomplete or wrong; treat them as starting points, not citations. Embedding generation uses OpenAI; guide tours use Anthropic's Claude. No search queries or personal data are stored.
v1.14.0 · Version History