About MoMA Browser

An experimental application that makes MoMA's collection navigable through meaning, not just metadata. Built with faceted semantic embeddings, a knowledge graph, and color analysis to reveal connections no filing system could produce.

160,128 artworks

1,727 exhibitions

22,369 artists & curators

349 art terms

168,540 graph edges

8 departments

The Idea

Museums organize by department, date, and medium. These categories are useful but opaque to the connections that make art interesting: how a 1912 Cubist collage relates to a 1965 Minimalist sculpture, or why a photographer's color palette echoes a painter working thirty years earlier.

But the collection already holds deeper structure than its filing system reveals. Nearly a century of curators, directors, and acquisition committees decided what entered these walls and why. They titled the works, wrote the catalog entries, grouped the departments, named the movements, and authored the biographical notes that frame each artist's contribution. Every word in that accumulated record encodes a judgment about what matters and how things relate. The collection is not raw data waiting for computation. It is a body of curatorial thought, built by people whose position at the nexus of art, scholarship, and institutional authority let them shape what the public understands modern art to be.

Those decisions did not stay inside the museum. Each acquisition, each exhibition, each catalog essay provoked response: Clement Greenberg championing Abstract Expressionism, feminist scholars reframing who was absent, postcolonial critics challenging how non-Western art was presented, generations of students and writers drawing comparisons the curators never intended. MoMA shaped the discourse around modern art, and the discourse shaped how the world wrote about MoMA. That surrounding discourse is part of the linguistic environment modern embedding models inherit. So when a Rothko sits near certain Turners in semantic space, the neighborhood is driven by MoMA's own descriptions but also by the model's learned sense of which words and ideas tend to travel together in writing about color, transcendence, and the sublime.

MoMA Browser makes that accumulated structure navigable. Every entity in the collection has been represented as a point in a shared semantic space using OpenAI's embedding model. When we embed the descriptions, biographies, and curatorial framing attached to an item, the model compresses that language into a coordinate. Works that are related in meaning sit near each other in this space, regardless of when they were made, what department filed them, or what medium they use. The embeddings do not conjure relationships from nothing. They register what people have already said about how things connect. They also do not guarantee truth.

Like any map, this one distorts. Embeddings are sensitive to what is written down, what is missing, and what language dominates the record. Similarity is not influence, and proximity is not proof. The point is to surface candidates for attention, then let you look, read, and decide what holds.

The result is a collection you can drift through by meaning. Click a Rothko and find not just other Rothkos, but the particular Turners and Monets that share its luminous quality. Search for "anxiety about technology" and find works across a century that speak to the same concern. Follow the knowledge graph from Surrealism through its influences to the African art that also shaped Cubism.

Semantic Space

Every artwork, exhibition, artist, and art term has been embedded into a shared semantic space. The base layer uses OpenAI's text-embedding-3-small (1536 dimensions), with each entity's available metadata combined into a rich text description. Artworks carry additional faceted embeddings at higher resolution (3072 dimensions via text-embedding-3-large), each built from a different slice of the metadata so that "similar" can mean different things depending on what you are looking for.

Artworks

Title, medium, classification, department, artist names and bios, date, descriptions, visual description, associated art terms

160,128 embedded

Exhibitions

Title, dates, description, curator bios, artist names and bios (up to 50)

1,727 embedded

Artists

Name, biography, nationality, gender, dates, MoMA bio, Wikipedia bio, sampled work mediums and classifications

22,369 embedded

Art Terms

Term name and full definition

349 embedded

Because all four entity types share the same embedding space, cross-type queries work naturally. An artist's embedding can find semantically similar artworks. A term's embedding can find works that embody that concept even if they were never explicitly tagged.

Faceted Embeddings

A single embedding compresses everything about an artwork into one point. This is powerful but lossy: two works might be near each other because of medium, because of subject matter, or because of historical context, and you cannot tell which. Faceted embeddings separate these concerns. Each artwork can carry up to five independent vectors, each encoding a different dimension of the work:

Semantic 1536d

The full composite: title, medium, department, artist, descriptions, terms. The original and most complete embedding.

160,128 artworks

Visual 3072d

Visual description and physical appearance: what the work looks like, its materials and techniques.

127,632 artworks

Conceptual 3072d

Meaning and significance: what the work is about, its themes, context, and critical framing.

160,127 artworks

Catalog 3072d

Metadata and provenance: title, department, classification, dates, artist, acquisition history.

160,128 artworks

Image (CLIP) 768d

Pixel-level visual features via CLIP ViT-L/14. What the work looks like to a vision model, independent of any text description.

90,910 artworks

These facets power the Explore workbench, hybrid search, and the search facet pills (By Sight, By Meaning, By Metadata). The system degrades gracefully: any operation works with whatever facets are populated, falling back to the base semantic embedding when a specialized facet is unavailable.

Intellectual Lineage

The idea that meaning can be represented as position in a space has a history. The primitives this app uses (high-dimensional geometry, distributional semantics, cosine similarity, nonlinear projection) draw on work from mathematics, linguistics, and computer science that converged only recently.

1854

Riemann Generalized geometry beyond three dimensions, establishing the mathematical framework for high-dimensional spaces.

1916

Saussure Argued that meaning is relational: defined by differences between signs, not inherent in them. A word means what it means because of how it differs from other words.

1957

Firth "You shall know a word by the company it keeps." The distributional hypothesis: words appearing in similar contexts carry similar meaning. This is the founding insight behind all embedding models.

1975

Salton Represented documents as vectors and compared them by the angle between them (cosine similarity). The same distance measure this app uses today.

1990

Deerwester et al. Latent Semantic Analysis: applied matrix decomposition to term-document tables. The first system where semantic similarity became geometric proximity.

2008

van der Maaten & Hinton t-SNE: a nonlinear method that could project high-dimensional data to 2D while preserving local neighborhoods. Made it possible to see clusters in complex data for the first time.

2013

Mikolov et al. (Google) Word2Vec: trained a neural network to predict words from context; the internal weights became the embedding. Showed that learned vector spaces have algebraic structure (king minus man plus woman equals queen).

2017

Vaswani et al. (Google) The transformer architecture. Attention mechanisms replaced sequential processing, enabling models to capture long-range relationships in text. The foundation for both language models and modern embedding models.

2018

McInnes, Healy & Melville UMAP: dimensionality reduction built on algebraic topology. Faster than t-SNE with better preservation of global structure. Used in this app's /time and /space views to compress 1536 dimensions to 1D and 3D.

The 1536 dimensions in each embedding are not designed by humans. No one decided that dimension 47 measures abstractness or that dimension 1200 measures warmth. The model discovered axes that, taken together, place semantically similar text near each other. Individual dimensions have no names. Meaning is distributed across all of them simultaneously, and the useful structure lives in the distances and directions between points, not in what any single axis represents.

The full stack beneath this app: Riemann's n-dimensional geometry, Firth's distributional hypothesis, Salton's cosine similarity, neural embedding training, and UMAP projection. The mathematical pieces existed for decades or over a century. The computational power and training data to combine them arrived in the last ten years.

Similarity Sources

Each entity type offers multiple ways to find neighbors. These are the "sources" you can switch between when viewing any item:

Artwork 6 sources

Semantic Color Temporal Departmental Medium Classification

Exhibition 4 sources

Resonant Works Similar Exhibitions Artists Curators

Artist 5 sources

Artworks Exhibitions Similar Artists Semantic Works Semantic Exhibitions

Art Term 3 sources

Tagged Artworks Semantic Artworks Related Terms

Similarity Banding

Neighbor results are grouped into bands by similarity score, computed as 100 × (1 − cosine distance) . This is a convenience scale, not a calibrated probability. Higher means the model considers the descriptions more alike.

Very Similar

score ≥ 65

score 45 – 64

Distant Echoes

score < 45

The "distant echoes" band often surfaces the most surprising connections. Two works with a score of 30 may share a quality neither would be filed under.

Explore

The Browser answers "what is near this work?" Explore answers harder questions: "what lies between these two works?", "where does this concept concentrate across time?", "what shares one quality but diverges on another?" It is a workbench for performing semantic operations on the embedding space itself, not just querying it.

Every operation takes two inputs (artworks or text queries) loaded into Slot A and Slot B, plus a facet selector that determines which embedding dimension to operate on. The five operations:

Bridge

Finds artworks that lie on the path between two points in embedding space. Useful for discovering transitional works that connect two seemingly unrelated pieces.

Axis

Projects the collection onto the vector between two poles, showing where works fall on a spectrum. The uneven spacing reveals where meaning clusters along that axis.

Contrast

Finds works that are similar on one facet but diverge on another. Match on visual appearance, diverge on conceptual meaning, and discover works that look alike but mean different things.

Temporal

Distributes a semantic query across decades, showing how a concept's presence in the collection changes over time. An interactive bar chart with drill-down into each decade's works.

Topology

Clusters a scope of the collection using HDBSCAN, then projects the clusters to 2D with UMAP. Reveals the natural groupings that emerge from the embedding space.

Each operation can run against any populated embedding facet (semantic, visual, conceptual, catalog, image), so the same pair of inputs can produce different results depending on which dimension of similarity you choose to examine.

Color System

Every artwork with an image has a 9-color palette extracted by dominant color clustering. These palettes are flattened into 27-dimensional vectors (RGB channels for each of 9 swatches) and indexed for fast euclidean distance search.

This creates a parallel similarity axis independent of meaning. Two works might be semantically unrelated but share a striking palette. The color system lets you explore this dimension directly.

Browse by Color

HSV color picker to find works by specific hue, with department and decade filters

Color Neighbors

For any artwork, find the 24 works with the most similar palette

Color Similarity Scoring

When viewing semantic neighbors, see how palette similarity compares to meaning similarity

Aggregate Palettes

Average color signatures computed per department and per decade, visible on the home page

Knowledge Graph

Embeddings capture that two things are similar, but not why. The knowledge graph adds named, directed relationships between entities. Each edge has a predicate, a confidence score, and evidence text explaining the connection.

Relationships were extracted using OpenAI's batch API, processing the collection in four passes (terms, artists, exhibitions, artworks). The graph currently contains 168,540 edges across 24 predicate types.

Relationship Types

associated_with collaborated_with contemporary_of curated_by depicted_by evolved_from evolved_into exemplifies featured_in founded influenced_by member_of opposed_to part_of_series pioneered related_to responded_to retrospective_of student_of study_for subgenre_of surveyed teacher_of variant_of

Graph Traversal

The graph supports multi-hop traversal with confidence decay and shortest-path finding between any two entities. A recursive CTE walks edges up to three hops, multiplying confidence at each step so indirect connections are naturally weighted lower than direct ones.

This means you can ask: "How does this Cubist painting connect to this Minimalist sculpture?" and get a legible path through named relationships, not just a similarity percentage.

The Browser

The Browser is a single-page application that combines focused exploration with hierarchical collection browsing. All state lives in the URL hash, so every view is linkable and back-button friendly.

It operates in three modes:

Semantic Mode

Focus on any entity and explore its neighbors through different similarity sources. Switch sources to see the same work through different lenses.

Catalog Mode

Browse the collection hierarchically: art terms by category, exhibitions by decade, artists alphabetically, artworks by department. Click any item to enter semantic mode.

History Mode

A personal library of everything you have viewed, organized by time, department, or art term. No signup required; a cookie-based identity tracks your viewing history across sessions.

Navigation Modes

The Browser uses a three-column "Aperture" layout designed for sustained exploration. The center cycles between viewing a single work, browsing its neighbors as a grid, and a split view with both. Side panels provide source switching and full metadata. View modes include a responsive grid, a horizontal filmstrip for scanning, and a sortable data table.

Everything is keyboard-driven. The interface is designed to stay out of the way once you learn the shortcuts, which are discoverable through the help overlay.

Semantic Drift

As you click from one work to a neighbor, to its neighbor, the app tracks your path through embedding space. A breadcrumb trail shows the similarity between each step. Drift far enough and you'll arrive at works that bear no obvious relationship to where you started, connected only by the chain of small similarities that carried you there.

AI Guide

The Guide system generates curated exhibition tours using Claude. Each tour is a sequence of artwork stops with narrative HTML, organized under immersive portals with themed entry experiences.

Tours are generated from version-controlled prompt sets, so the guide's voice and approach can be iterated independently of the application code. The system currently has 2 portals and 13 tours.

Portal experiences are rendered as full HTML documents in iframes, with postMessage-based navigation that connects door clicks to tour launches. Stop narratives are standalone HTML with injected navigation controls.

Semantic Search

Search is hybrid: every query runs through both full-text search (PostgreSQL tsvector) and vector similarity across all populated embedding columns simultaneously. Results from each channel are merged using Reciprocal Rank Fusion, so a work that ranks well in both keyword matching and semantic similarity rises to the top, while a work that only one system finds still appears.

Search facet pills let you shift the emphasis. "All" merges every signal. "By Sight" weights the visual embedding. "By Meaning" weights the conceptual embedding. "By Metadata" weights the catalog embedding. The same query can surface different works depending on which dimension you prioritize.

This is fundamentally different from keyword search. The query "loneliness in urban space" will surface Edward Hopper paintings, architecture photographs, and installation art that speaks to that theme, even if no metadata mentions the word "lonely."

Data Pipeline

The collection data comes from MoMA's open-access CSV datasets, enriched through several processing stages:

Import: Artists, artworks, and exhibitions loaded from CSV with relational joins.
Base embedding: All entities processed through OpenAI text-embedding-3-small (1536d) in batches. Artworks include art terms and visual descriptions in their input text.
Faceted embeddings: Artworks re-embedded at 3072d via text-embedding-3-large across three facets (visual, conceptual, catalog), each built from a different slice of metadata.
Image embeddings: Artwork images processed through CLIP ViT-L/14 locally, producing 768d vectors that encode pixel-level visual features independent of any text.
Full-text indexing: PostgreSQL tsvector columns built from title, medium, classification, department, descriptions, and significance, weighted by field importance.
Color extraction: 9-color palettes extracted from artwork images, flattened to 27-dim vectors.
Wikipedia enrichment: Artist biographies and portraits fetched from Wikidata and Wikipedia REST APIs.
Auto-tagging: Art terms matched to artworks by embedding proximity, extending the curated tagging.
Graph extraction: Knowledge graph relationships extracted via OpenAI batch API in four passes.
Indexing: HNSW indexes on all embedding and color vector columns for fast similarity search. Half-precision (halfvec) indexes on the 3072d columns.

Architecture

MoMA Browser is a Rails 8 application with no JavaScript build step. Hotwire (Turbo + Stimulus) handles interactivity, with importmaps for module loading. The Browser SPA and Explore workbench are each single Stimulus controllers that manage all state client-side, communicating with the server via JSON endpoints.

PostgreSQL + pgvector

Vector similarity search with HNSW indexes. Cosine distance for embeddings, euclidean for color vectors. Half-precision (halfvec) indexes for 3072d columns.

OpenAI embeddings

text-embedding-3-small (1536d) for base embeddings, text-embedding-3-large (3072d) for faceted embeddings. CLIP ViT-L/14 (768d) for image embeddings.

Hybrid search

Reciprocal Rank Fusion merging full-text search (tsvector) with vector similarity across all populated embedding columns.

Hotwire

Turbo Frames for server-rendered partial updates. Stimulus controllers for client-side behavior.

Tailwind CSS

Utility-first styling with per-session accent color via CSS custom property.

Solid Cache

Fingerprint-based caching for expensive aggregations (stats, palettes, department lists).

HNSW tuning

Three ef_search levels (200/400/600) via a shared HnswSearchable concern, scoped with SET LOCAL in transactions.

Three Layers of Meaning

The semantic infrastructure has three distinct layers that work together:

Embeddings

Continuous similarity

"These two things feel related"

Relational Links

Discrete, factual connections

"This artist made this work"

Knowledge Graph

Named, traversable reasoning

"This work was influenced by that movement"

Embeddings give you intuition. Relational links give you facts. The knowledge graph gives you understanding. Together they make a collection that doesn't just organize art, but begins to interpret it.

MoMA Browser is an independent project by Jeremy Roush built in collaboration with Claude. It is not affiliated with, endorsed by, or produced by the Museum of Modern Art. All collection data comes from MoMA's open-access dataset on GitHub, published under a CC0 1.0 license. Similarity scores, graph edges, and AI-generated tours are exploratory and may be incomplete or wrong; treat them as starting points, not citations. Embedding generation uses OpenAI; guide tours use Anthropic's Claude. No search queries or personal data are stored.

v1.16.0 · Version History