28/4/2026

Domain Specific Foundation Models

mins read

Research

1. Introduction

Over the past few years, the AI race has been dominated by Big Tech players (OpenAI, Google, Anthropic, Meta) scaling ever-larger language models. Since ChatGPT was launched in 2022, massive capital has flowed into LLMs, and the window for venture-backed challengers in text has largely narrowed.

But intelligence isn’t only text. The physical world runs on sound, light, molecules, proteins, and motion, and foundation models for these domains are still early in terms of technology and/or commercial usage. The race to build foundation models beyond text remains open, with active research and early commercialization across multiple domains.

This paper focuses on domain-specific foundation models — models trained for a particular domain (e.g., audio, vision, biology) and adapted to multiple downstream tasks within that domain.[1] The focus is on foundation model builders and enabling platforms, and generally excludes narrow tools and point-solution application companies.

Throughout this paper, we separate the model layer from the application layer. The model layer covers companies building domain-specific foundation models and the infrastructure behind them (architectures, data, training, and inference). The application layer covers products that use these models in real workflows.

💡Definition of Foundation Model

“A foundation model is any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks"

1.1 The Domains In Scope Of This Paper

This paper excludes text (LLMs) given its maturity with already emerging winners (Big Tech). We focus on five non-text domains that capture most current foundation-model activity: Audio & Speech, Vision, Embodied AI, Materials Science, and Biology. Each sits at a different maturity level (Audio/Vision most mature; Embodied AI at an inflection point; Materials/Biology earliest). The current maturity of each domain is illustrated below.

💡Maturity shorthand

GPT-2 era: technical promise, limited commercial adoption
GPT-3 era: strong capability, early product-market fit
ChatGPT era: mainstream usability, rapid adoption, clear winners

2. Audio & Speech

💡

Of the five domains examined in this paper, Audio & Speech is the most mature in terms of both technology and commercial usage. This is partly due to being interlinked with LLMs.

2.1 Technology

The majority of audio AI products today rely on a three-stage pipeline which is shown in the picture below.

In this architecture, the LLM remains the core foundation model. Companies like ElevenLabs, Deepgram, and Cartesia are building sophisticated Speech-To-Text (STT) and Text-To-Speech (TTS) models, but they still rely on external LLMs.

That said, genuinely audio-native foundation models do exist in more niche sub-domains such as, for example, music generation models. These are often audio-native (no LLM in the loop). E.g., Suno uses diffusion/transformer approaches to generate audio directly.

2.2 Market Dynamics

The global voice AI market is already valued at ~$12B and is expected to reach $30B by 2030.[3] A few clear commercial winners have already emerged in this space:

ElevenLabs is trying to be the full-stack AI voice tool (Raised $781M)
Suno is one of the fastest-scaling consumer AI products ever built, launched in December 2023, and was at ~$300m of ARR as of February 2026 (raised $375m)
Cartesia is building ultra-low-latency voice models using state-space models (SSM) architecture rather than transformers (raised $191m)
Hume AI is the niche leader in emotion-aware audio with its empathic voice interface (raised $80M)

Critically, Big Tech is also going after this domain. OpenAI, Google, and Meta are all building native voice capabilities directly into their foundation models, collapsing the STT → LLM → TTS pipeline into a single system. This threatens standalone audio AI companies by removing the need for third-party speech models in many use cases.

Startup landscape

Beyond the foundation model builders, several startups are operating within the Audio & Speech application layer:

Company	HQ	Founded	Stage / Funding	Investors
Gradium	Paris, France	2025	Seed / $70M	FirstMark Capital, Eurazeo
Parloa	Berlin, Germany	2018	Series B / $66M ($98M total)	Altimeter Capital, EQT Ventures
PolyAI	London, UK	2017	Series D / $86M ($200M+ total)	Khosla Ventures, NVentures
GetVocal	Paris, France	2023	Series A / $26M ($30M total)	Creandum, Speedinvest
Synthflow AI	Berlin, Germany	2023	Series A / $20M ($30M total)	Accel, Atlantic Labs
Gladia	Paris, France	2022	Series A / $16M ($20M total)	XAnge, XTX Ventures

3. Vision (Image & Video)

💡

Vision is a crowded space with many of the same Big Tech players dominating LLMs competing here. Image generation foundation models have already commoditized, while video is heading in the same direction.

Generative vision has matured to the point where it is no longer obvious where value accrues. On one hand, the economics and competitive dynamics of training frontier image/video models increasingly resemble a scale game dominated by the deepest balance sheets — a reality underscored by OpenAI’s shutdown of Sora in March 2026 after reportedly burning ~$1M/day for just $2.1M in lifetime revenue.[4] On the other hand, “model layer” is not monolithic: there are still pockets where differentiated data, distribution, or platform integration can create leverage.

3.1 Technology

Image generation models have two main architectures: diffusion models (Stable Diffusion, Imagen, DALL-E) that iteratively refine noise into images, and visual transformers (Black Forest Labs' FLUX) that apply LLM-style attention to visual data.

Recent releases such as GPT-4o image generation underscore how quickly image-generation capabilities are being productized and bundled into broader AI platforms.[5]

Video generation extends these techniques across time, but the jump in complexity is dramatic: a 10-second clip at 24fps requires 240 temporally consistent frames, pushing compute requirements by orders of magnitude. A high-quality image costs $0.02–0.04 via API; a minute of video runs $2.40–30.00.[6]

3.2 Image Generation: Already Commoditized

The quality gap between leading image models has effectively closed — FLUX.2 Pro and GPT Image 1.5 are statistically tied on benchmarks, and open-source models now match commercial offerings in blind evaluations.[7] Pricing has compressed 10x in two years, with viable API options starting at $0.003/image.[8] When Google's Gemini image generation attracted 13 million new users in four days and produced 5 billion images within weeks, it confirmed that image generation is now a feature, not a product.[9]

The leading players are already clear. Midjourney dominates consumer/prosumer with $500M in 2025 revenue, entirely bootstrapped.[10] Black Forest Labs has emerged as the B2B infrastructure winner ($300M raised at $3.25B valuation), powering Grok and signing ~$300M in platform deals with Adobe, Canva, and Meta.[11][12] Google runs generation across 650 million MAUs.[13] Critically, Google now offers AI product photography for free in Merchant Center,[14] and Zalando reports AI imagery at 85% lower cost with conversion rates within 4% of professional photography.[15] The broader market sits at ~$3B in 2025, growing 30%+ annually — but the value is accruing to incumbents, not new entrants.[16]

3.3 Video Generation: Expensive and Unsettled

Unlike images, video remains fiercely competitive with no clear winner — and, tellingly, Chinese-origin models now dominate. On the Artificial Analysis leaderboard, HappyHorse 1.0 (Alibaba-affiliated) leads at Elo 1,362, followed by ByteDance's Seedance 2.0 and Kuaishou's Kling 3.0. The top Western model, Runway's Gen-4.5, ranks only #7.[6]

Sora's collapse is instructive: it hit #1 in the App Store at launch, then downloads fell 66% in three months.[4] Disney had committed $1B tied to Sora and learned of the shutdown less than an hour before the public.[17] Even OpenAI concluded that standalone video generation does not justify the economics.

Capital concentration is extreme: Luma AI raised $900M at $4B, Runway raised $315M at $5.3B, and total sector funding hit $3.08B in 2025 (+95% YoY). However, both players are now pivoting their narrative to "world models" — a sign that pure video generation alone may not sustain their valuations. The economics are improving (inference costs down ~1,000x in three years) but remain punishing: most models cap out at 5–20 seconds of coherent output, and no open-source model comes close to commercial leaders.

3.4 Market Dynamics

In many cases, value creation appears to be shifting toward the application layer — companies that treat generation models as interchangeable infrastructure and build differentiation through domain expertise, proprietary data, and workflow integration.

Vertical visual AI platforms are the most compelling category. Raspberry AI ($28M raised, a16z-led) has signed 70+ fashion brands by training custom models on brand-specific design DNA. Omi (€13M, Paris) generates photorealistic 3D product visuals for Clarins, Nestlé, and Meta. Photoroom ($500M valuation) has crossed 200M downloads. These companies embed industry-specific workflows that horizontal model providers cannot easily replicate.

Video editing and repurposing tools may actually be a better category than video generation itself. Descript has crossed $100M ARR with text-based editing. OpusClip ($215M valuation, SoftBank) has generated 172M clips and 57B views. Lightricks ($1.8B valuation) combines an open-source video model with a commercial studio. These businesses benefit from a structural asymmetry: they harvest AI productivity gains without bearing the full weight of generation compute costs.

Company	HQ	Founded	Stage / Funding	Investors
Raspberry AI	San Francisco, US	2019	Series A / $28M+	a16z
Omi	Paris, France	2023	Seed / €13M	Dawn Capital
Photoroom	Paris, France	2019	Late-stage / $100M+ total (private)	Balderton, Y Combinator (early)
Descript	San Francisco, US	2017	Growth / $100M+ total (private)	Andreessen Horowitz, Spark Capital
OpusClip	San Francisco, US	2021	Series A / ~$20M+	SoftBank Vision Fund (reported)
Lightricks	Jerusalem, Israel	2013	Late-stage / $200M+ total	Insight Partners, Goldman Sachs
Runway	New York, US	2018	Series C+ / $500M+ total	Google, NVIDIA, Coatue
Luma AI	San Francisco, US	2021	Late-stage / $900M+ total	—
Pika	Palo Alto, US	2023	Series A / $100M+ total	Lightspeed, Nat Friedman
Stability AI	London, UK	2020	Late-stage / $100M+ total	Coatue, Lightspeed
Midjourney	San Francisco, US	2021	Bootstrapped	—
Black Forest Labs	Freiburg, Germany	2024	Series B / $300M	—

4. Embodied AI

Foundation models in Embodied AI are large-scale, pretrained AI systems that learn generalizable representations of the physical world. They model how objects look, how they behave under force, and how a robot should move to accomplish a task. The models are typically general but built in such a way that they can be adapted to specific robots and use cases.

Instead of hard-coding behavior, you train a model that learns to act in the physical world. The goal is to build models that generalize across tasks, robots, and environments.

4.1 Technology

The embodied AI value chain can be grouped into four layers:

Layer	Description	Key Players
Hardware / Form Factor	The physical robots themselves	Figure, qX, Apptronik, Agility Robotics, Boston Dynamics, ABB, FANUC, Universal Robots
System Integrators & Middleware	Companies that take foundation models and fine-tune/deploy them for specific robotic applications	Sereact, Covariant, Collaborative Robotics, Skild AI
Foundation Models (focus on this paper)	VLAs, World Models that serve as the general-purpose "brain”	Physical Intelligence, Google DeepMind, NVIDIA, Skild AI
Silicon & Compute	Custom chips and GPU infrastructure for robot inference and simulation	NVIDIA, Qualcomm, Intel

There are several types of foundation models for Embodied AI. The two that are currently the most widely adopted are Vision-Language-Action (VLA) models and World Models.

4.2 Vision-Language-Action (VLA) Models

VLAs are the core "brain" of many AI-powered robots today. A VLA takes in camera images of the robot's surroundings and written text instructions as input. The output is the physical movement of the robot to perform the instructions (e.g. "pick up the red mug"). What makes VLAs powerful is that they are built on top of large vision-language models that have been pretrained on massive amounts of internet data. This means the robot inherits a broad understanding of the world and can handle objects and situations it has never specifically been trained on.

VLA models are still a new concept that was first coined in 2023:[18]

2023: Google publishes RT-2 and coins the term "Vision-Language-Action model" for the first time. This model proved that a VLM pretrained on internet data can directly output robot actions
2024: Stanford released the open-source model OpenVLA, allowing anyone to experiment and do research on VLA models
2025: NVIDIA releases GR00T N1, Physical Intelligence releases π₀, and Google releases Gemini Robotics. With these models, VLAs go from being theoretical to being used in commercial robots at scale

4.3 World Models

World models are generative models whose purpose is to predict what will happen next in a physical environment. This is useful for two reasons. First, it lets the robot plan ahead by simulating outcomes before actually moving. Second, it can be used to generate large amounts of synthetic training data — critical given the scarcity of real-world robot data.

World models as a concept have been around since 2018, but only in recent years have they become practical tools for training robots used in production.

2018: Ha & Schmidhuber publish the paper "World Models" - introducing the idea[19]
2020–2024: Hafner et al. publish the models Dreamer V1, V2, and V3. a series of RL agents that learn world models and use them to imagine future outcomes before acting
2024: Google DeepMind releases Genie 1 & 2
2025: NVIDIA launches Cosmos and Google releases Genie 3 - world models that are now becoming core infrastructure for training robots and autonomous vehicles

4.4 Market Dynamics

Foundation models for Embodied AI have the largest addressable market potential in 2030 than for the other domains included in this paper. Some market segments where foundation models for embodied AI are expected to be multi-billion dollar markets in 2030 include humanoid robots, warehouse & logistics robotics, and autonomous vehicles. But as of 2026 - most of this potential remains unrealized. Today's commercial revenue from AI-native robotics is growing rapidly but is still small, concentrated in warehouse picking and a handful of manufacturing pilots.

The Field Approaches Its Inflection Point

Lab-to-deployment in 18 months: VLA models went from academic papers to commercial warehouse and factory deployments (Sereact, Figure, Agility) by early 2025
Real commercial contracts: Figure–BMW, Apptronik–Mercedes-Benz, Agility–Amazon/GXO, Sereact–European logistics customers.
Hardware scaling: Agility Robotics opened RoboFab, the first humanoid robot factory, with initial capacity to produce ~10,000 units annually. Figure is building a factory targeting similar volumes
Open-source momentum: OpenVLA, LeRobot, and NVIDIA's open GR00T N1 are democratising access to capable robot foundation models - making it easier than ever before to build companies leveraging these models
Talent migration: Top researchers from Google DeepMind, Meta FAIR, and Stanford are founding or joining robotics startups (Physical Intelligence, Skild AI, Collaborative Robotics)

Why It's Still Early

The gap between impressive demos and reliable production deployment is shrinking, but it remains significant:

Sim-to-real gap: Behaviors that work perfectly in simulation frequently fail in the real world due to differences in surfaces, friction, lighting, and sensor noise. Scaling deployments still requires extensive on-site testing and fine-tuning
Data scarcity: High-quality real-world robot demonstration data is expensive and slow to collect. Open X-Embodiment has 1M+ trajectories, but this is orders of magnitude less than the billions of text tokens used to train LLMs[20]
Unit economics remain unproven: Most humanoid robot companies are pre-revenue or deeply unprofitable. The cost of a humanoid robot (~$50K–$150K target) must compete with human labour costs, and the payback period for customers is still unclear

The current state can be compared to when GPT-3 was released. It was impressive and useful for specific use cases, but the "ChatGPT moment" of broad, reliable, general-purpose deployment has not yet arrived.

The Leading VLA Models

VLA foundation models are dominated by Big Tech players as well as well-funded startups, making it hard for new entrants to break in without significant capital.

Company	Model	Funding / Valuation	Comment
Physical Intelligence	π₀, π₀.5	$2.4B raised, ~$39B val (2025)	Leading pure-play robot foundation model company.
Google DeepMind	RT-2, Gemini Robotics	Big Tech	Gemini Robotics; data + compute advantage.
NVIDIA	GR00T N1	Big Tech	GR00T + Isaac/Omniverse; core robotics stack.
Skild AI	Foundation model (stealth)	$300M raised at $1.5B val	CMU spin-out building a general-purpose robot brain.
Hugging Face	LeRobot	N/A	Open-source robotics stack (models, data, tools).
Covariant (now Amazon Robotics AI)	RFM-1	Acquired by Amazon (2024)	Early robot FM (RFM-1); acquired by Amazon.

The Leading World Models

Similarly to building VLA models, World models are also currently capital intensive to build/train and is mostly dominated by Big Tech players as well as a few well-funded startups.

Company	Model	Funding / Valuation	Comment
NVIDIA	Cosmos, Isaac Sim, Omniverse	Production	Dominant sim + synthetic data stack.
Google DeepMind	Genie 1, 2, 3	Research → Platform	Genie world models for interactive environment generation.
Runway	Gen-series → World Models	Pivot	Pivoting from video gen toward world simulation.

Recent Investment Rounds

Total disclosed VC funding into Embodied AI startups exceeded $5B in 2024–2025. That figure is driven by a few very large rounds at the model and hardware layer — for example, Physical Intelligence, Figure (about $675M), and Skild AI (about $300M) — as well as repeated large financings across humanoid robotics, including 1X, Agility, and Apptronik. That makes Embodied AI the most heavily funded non-text domain in this paper after Vision (video generation). Unlike Vision, though, the capital is spread across more companies and stages, suggesting the market is still early and not yet consolidated.

Startup landscape

Embodied AI is seeing rapid capital formation, and valuations re-rate quickly. Many prominent teams move into $1B+ valuations and $100M+ rounds within a short time, making them difficult for many investors to access at early stages. The two tables below reflect this split: first, companies that are earlier stage; then, important category leaders that we track for ecosystem context but that are largely later stage due to valuation/stage.

Company	HQ	Founded	Stage / Funding	Investors
Sereact	Stuttgart, Germany	2020	Series B / €110M	HV Capital, Creandum, Headline
Genesis AI	Paris / Palo Alto	2024	Seed / $105M	Eclipse, Khosla Ventures, Bpifrance, Alven
Skild AI	Pittsburgh, PA	2023	$1.7B+ / $14B+ val	SoftBank, NVentures, Lightspeed, Coatue
FieldAI	Irvine, CA	2023	$405M / $2B val	Bezos Expeditions, Temasek, NVentures, Intel Capital
World Labs	San Francisco, CA	2024	$1B / ~$5B val	NVIDIA, AMD, Autodesk, a16z, NEA
AMI Labs	Paris, France	2025	$1.03B seed / $3.5B val	Cathay Innovation, NVIDIA, HV Capital
Wayve	London, UK	2017	$1.5B total / $8.6B val	NVIDIA, Microsoft, Uber, Mercedes-Benz, SoftBank
Waabi	Toronto, Canada	2021	$1B+ / Series C	Khosla Ventures, G2 Venture Partners, Uber
Liquid AI	Boston, MA	2023	$297M / $2B val	AMD (lead), OSS Capital
Physical Intelligence	San Francisco, CA	2024	$1.1B total / $5.6B val	CapitalG, Thrive Capital, Lux Capital, Index Ventures

5. Materials Science

Foundation models in materials science are large-scale, pre-trained AI systems that learn generalizable representations across diverse chemical structural data (mainly inorganic materials) — then fine-tune or prompt for specific downstream tasks. These models fall into three functional categories:

Property prediction models predict material properties — energy, stability, band gap — from structure.
Generative models perform inverse design, generating novel structures given desired property constraints.
Simulation acceleration models replace expensive density functional theory (DFT)calculations, enabling molecular dynamics simulations 10’000× faster than DFT at comparable accuracy

5.1 Technology and Data

Five primary architectural families dominate the field:

Graph Neural Networks (GNNs) and Equivariant NNs
Treat crystal structures as graphs of atomic connections. Models such as DeepMind's GNoME and Microsoft's MatterSim learn energy and force landscapes from millions of DFT-computed structures.[21] Meta FAIR's EquiformerV2 (released in OMat24) sets new state-of-the-art benchmarks.
Generative Diffusion Models
Adapted from image generation, these models directly generate novel material structures conditioned on desired properties. Microsoft's MatterGen is the flagship — trained on 608,000 stable materials, it creates new crystal structures given prompts specifying mechanical, electronic, or magnetic properties. MatterGen achieved the first experimental synthesis of a generative-model-designed material — TaCr₂O₆ with bulk modulus within 20% of its design target (Nature, January 2025).[22]
Transformers with Geometric Inductive Biases
Meta's eSEN combines self-attention with rotationally equivariant spherical-harmonic encodings. EPFL submitted the largest model on the Matbench Discovery leaderboard — PET-OAM-XL at 730M parameters — ranking second overall in January 2026.
Language Models for Materials (MatLMs)
Domain-specific BERT/GPT models (MatBERT, CatBERTa, BatteryBERT, TransPolymer) trained on scientific literature and chemical databases. These extract synthesis conditions, predict properties, and mine patents. IBM released FM4M, an open-source multi-modal foundation model family with 100,000+ HuggingFace downloads.
Multimodal and Agentic AI Systems
MIT's Llamole integrates an LLM backbone with graph-based modules to design synthesizable molecules and generate synthesis plans. MatAgent (University of Tokyo, 2025) uses an LLM "brain" to guide inorganic materials search with natural language reasoning.

Critical Open Datasets

The release of massive open datasets in 2024–2025 was a watershed. Meta's OMat24 contains 110M+ DFT calculations — roughly two orders of magnitude larger than previous datasets. OMol25 (100M+ molecular DFT calculations requiring 6B CPU hours) is the largest quantum chemistry dataset ever created, described as dividing the field into "pre-OMol25" and "post-OMol25" eras. CuspAI and Meta jointly released OpenDAC, the world's largest dataset on CO₂ sorbent materials with 100M+ datapoints. These datasets reduce the "data moat" advantage and democratize foundation model development — but still remain heavily skewed toward inorganic crystalline materials, with critical gaps for polymers, composites, and amorphous materials.

5.2 Maturity: The Field Approaches Its Inflection Point but Awaits a Defining Breakthrough

Multiple authoritative voices now describe AI in materials science as approaching its inflection point.[23]

5.2.1 Evidence for the Inflection

Scale of discovery:
GNoME identified 2.2M new stable crystal structures — equivalent to 800 years of prior research — and 736 were independently verified experimentally
Benchmark performance:
The top model on Matbench Discovery achieves F1 = 0.931 for crystal stability prediction, serving as a high-quality filter for filtering candidate materials. Meta's UMA demonstrated a single model matching or beating specialized models across molecules, materials, and catalysts without fine-tuning.
Autonomous lab validation:
Berkeley's A-Lab synthesized 41 novel materials in 17 continuous days — 71% success rate on computationally predicted targets.
Nobel recognition:
AlphaFold winning the 2024 Nobel Prize in Chemistry validated the entire paradigm of AI-driven molecular science
Talent migration:
Researchers behind ChatGPT and DeepMind’s GNoME have begun founding new materials‑AI startups (e.g., Periodic Labs, founded by ChatGPT co‑creator Liam Fedus and former DeepMind materials lead Ekin Dogus Cubuk, who worked on GNoME) — echoing the 2014–2016 wave of deep‑learning talent that seeded the modern NLP startup ecosystem.
Massive capital influx:
$300M seed rounds and $100M+ Series A rounds for pre-revenue companies (Periodic Labs, CuspAI, Lila Sciences) echo the ChatGPT-era VC frenzy

5.2.2 Why It's Still Early

However, MIT Technology Review reported in December 2025: "So far there has been no 'eureka' moment, no ChatGPT-like breakthrough — no discovery of new miracle materials or even slightly better ones."[24] Most predicted structures were trivial variants of known materials or computationally stable at absolute zero but impractical under real-world conditions.

Key friction points:

Data scarcity:
Materials science has orders of magnitude less training data than NLP or even biology. The primary constraint remains high-quality experimental data
Prediction-to-synthesis gap:
AI can generate millions of candidate structures, but validating them experimentally is costly and slow. Over 80% of AI-recommended materials exhibit crystallographic disorder causing properties to diverge from theory
Synthesizability:
Predicting a stable structure is easier than predicting whether it can actually be made. New LLM frameworks like SynCry are beginning to tackle this
Data integrity:
AI-generated microscopy images are indistinguishable from experimental data by experts, and 20–30% of materials characterization analyses contain errors
Domain gaps:
Training data is heavily skewed toward inorganic crystalline materials, leaving critical gaps for polymers, composites, and amorphous materials. Properties like manufacturability, processability, and real-world performance under operating conditions cannot be reliably predicted from structure alone

The current state is analogous to the GPT-2 era in the text domain: the technology clearly works at unprecedented scale, investment is surging, yet systematic commercialization and reliable real-world deployment lag behind proof-of-concept results.

5.3 Core Application Areas

Energy Materials (Batteries, Fuel Cells, Photovoltaics)
AI is actively screening electrolytes, cathode/anode materials, and solid-state ion conductors. SES AI generated $9.3M in H1 2025 revenue from AI-enhanced battery materials contracts. Aionics partners with Porsche (Cellforce Group) for bespoke EV batteries.[26] Microsoft's Azure Quantum Elements collaboration with Pacific Northwest National Lab identified a novel solid electrolyte reducing lithium use by ~70%, going from computation to prototype in under 9 months.[27] Related coverage also appeared in Science.[28] AI-driven active learning has demonstrated a 75% reduction in organic solar cell material discovery time.[29]
Catalysis and Carbon Capture
Foundation models guide catalyst design for CO₂ conversion, hydrogen evolution, and green chemical synthesis. CuspAI's OpenDAC dataset enables design of direct-air-capture materials. Orbital Materials' first commercial product — carbon capture using AI-designed sorbents — is in early-stage commercialization. Copernic Catalysts with Schrödinger achieved 47% energy reduction in ammonia synthesis.
Semiconductors and Quantum Materials
MIT's LLM-based synthesis framework improved prediction accuracy for quantum material synthesis pathways from under 40% to nearly 90%. Periodic Labs has already secured semiconductor customers for next-generation heat dissipation materials. GNoME's discoveries include candidate superconductors and novel optical materials.
Polymer and Specialty Chemical Design
TransPolymer and related LLMs enable inverse design of polymers with targeted thermal, mechanical, and dielectric properties. Chemify's Chemputation platform generates novel molecules for agricultural, pharmaceutical, and materials applications. Matmerize's PolymRize platform serves enterprise customers including Asahi Kasei.

5.4 Market Dynamics

5.4.1 Market size

This domain is a small market today but growing quickly: Generative AI in material science is estimated at $1.49B in 2025, projected to reach $12.90B by 2035 (CAGR 24.1%).[25] For context, this is still materially smaller than adjacent AI-native R&D categories like AI drug discovery; the implication is that near-term value capture is driven more by enterprise budgets and specific high-value use cases (batteries, semiconductors, carbon capture) than by a single, large, homogeneous "platform" market.

5.4.2 Startup Landscape

The startup landscape is still early but increasingly crowded, with a mix of data-platform players, autonomous-lab companies, and vertical specialists targeting batteries, carbon capture, polymers, and synthesis workflows. The table below includes both earlier-stage companies and a few category-defining outliers (notably Periodic Labs and Lila Sciences) that matter because they are setting the pace for capital formation and competitive intensity in the space.

Total disclosed VC funding into AI materials science startups exceeded $1.2B in 2024–2025 — and notably, Periodic Labs and Lila Sciences reached unicorn valuations pre-revenue, underscoring both conviction and valuation risk in the category.

Company	HQ	Founded	Stage / Funding	Investors
Periodic Labs	San Francisco, CA	2025	Seed / $300M	a16z, NVIDIA, Accel, Bezos, Eric Schmidt
Lila Sciences	Cambridge, MA	2023	Seed + Series A / $550M total	Flagship Pioneering, General Catalyst, Braidwell, In-Q-Tel
CuspAI	Cambridge, UK	2024	Series A / $130M+	NEA, Temasek, NVentures, Samsung, Hyundai
Chemify	Glasgow, UK	2019	Series B / ~$93M+	Gates Foundation
Orbital Materials	London / Princeton	2022	Seed + Series A / $21M	Radical Ventures, Toyota, NVentures
Kebotix	Cambridge, MA	2017	Series A / $16M+	Novo Holdings, One Way Ventures, SIT Capital
NobleAI	San Francisco, CA	2017	Series A / $31M+	M12, Sway Ventures, Dorilton Ventures
Polaron	London, UK	2023	Seed / ~£7M	Speedinvest
Materials Nexus	Cambridge, UK	2020	Seed / ~$5M	Ada Ventures (lead)
Materials Zone	Tel Aviv	2018	Series A / ~$9.7M	Insight Partners (lead)
Aqemia	Paris, France	2019	Series A–B / €38M+	Cathay Innovation (lead)
Radical AI	New York	2024	Seed / $55M	RTX Ventures (lead), NVentures

5.5 Moats and Risks

Where the Moat Lies: Data Flywheels

The competitive dynamics will likely not follow the LLM playbook. Unlike text data, materials science data is scarce, expensive to generate, fragmented, and often proprietary. This reshapes defensibility.

The most defensible positions emerge at four levels:

Proprietary experimental data + autonomous lab feedback loops — the strongest moat. Companies operating their own wet labs (Periodic Labs, Lila Sciences, Radical AI, Orbital Materials) generate proprietary data in closed loops where each experiment compounds the advantage
Synthesis-aware modeling — CuspAI's approach of generating materials that chemical companies can actually manufacture, bridging the prediction-to-production gap
Domain-specific workflow integration — Citrine and NobleAI embedding into enterprise R&D workflows creates switching costs
Vertical application expertise with deep industry partnerships (Aionics in battery electrolytes, Matmerize in polymers) builds customer lock-in and proprietary datasets within specific end markets

The likely outcome is multiple specialized foundation models for different material classes and length scales, with the most durable value accumulating at the application and data layers.

Key Risks

💡The simulation-to-reality gap is the single greatest risk

AI can predict millions of computationally stable materials, but "stable" does not mean "useful" or "synthesizable." Properties like manufacturability, processability, and real-world performance under operating conditions cannot yet be reliably predicted from atomic structure alone.

Data scarcity is structural, not temporary: Generating a single high-quality experimental dataset costs $100K+ per campaign.
No commercial "win" yet: Despite $1.2B+ in startup funding, no AI-designed material has achieved commercial deployment outside drug candidates in clinical trials
Long validation cycles: Traditional materials development takes 20+ years lab-to-market. AI may compress early stages significantly, but regulatory timelines remain lengthy
Customer willingness to pay: The chemical/materials industry is cost-sensitive and low-margin.
Valuation risk: Periodic Labs ($1.5B) and Lila Sciences ($1.3B) reached unicorn valuations pre-revenue, creating execution risk
Talent scarcity: The intersection of deep AI expertise and materials science knowledge is an extremely small talent pool, concentrated at a handful of institutions

6. Biology

Similar to foundation models in material science, foundation models in biology are large AI systems pretrained on vast biological data — protein sequences, DNA, cell measurements, medical images, clinical records — that can then be adapted for specific downstream tasks like designing new drugs, predicting disease mutations, or diagnosing cancer from tissue slides. Just as GPT learned the "language" of the internet, these models learn the "language" of biology.

The 2024 Nobel Prize in Chemistry for AlphaFold validated the entire paradigm.[30] Over 200 distinct biology foundation models have now been published, and the pace is accelerating. AI-designed drugs are entering Phase 2/3 clinical trials, pathology AI is FDA-cleared and deployed in thousands of clinics, and the first AI drug discovery company (Insilico Medicine) has gone public.

One example of this clinical progress is Insilico Medicine’s published Phase IIa results for rentosertib.[31]

6.1 Technology: Five Distinct Fields

Biology foundation models span five layers, each at a different stage of development:

1. Protein Structure & Design
These models predict 3D protein structures and increasingly help design entirely new proteins, antibodies, and binders. This is the most mature biology subcategory, with strong scientific validation and early translation into drug discovery. Key names include AlphaFold 3, IsoDDE, ESM3, Chai-2, and RFdiffusion / ProteinMPNN.
2. Genomic / DNA Models
These models learn the “language” of DNA to predict variant effects, classify sequences, and support gene-editing design. The category is earlier than protein models, but it is progressing quickly as models scale and become more biologically useful. Key names include Evo 2, Nucleotide Transformer, and OpenCRISPR-1.[32]
3. Single-Cell & Virtual Cell Models
These models aim to predict how individual cells respond to drugs, mutations, or perturbations, with the long-term goal of simulating cellular behavior. This remains mostly a research-stage category, but it could become foundational for future virtual-cell platforms. Key names include Geneformer V2, scGPT, and CZI’s Virtual Cell Platform.
4. Digital Pathology
These models analyze tissue-slide images to detect disease, classify tumors, and predict treatment response. This is the most commercially mature biology subcategory, with real clinical deployment and FDA-cleared products already in use. Key names include H-Optimus-1, PLUTO-4, and Paige PanCancer.
5. Clinical / Multimodal
These models try to combine molecular, biological, imaging, and clinical data into one system that can support diagnosis, trial design, and treatment decisions. This category is still early and ambitious, but it could become the broadest and most valuable layer if multimodal integration works at scale. Key names include Owkin, xTrimo V3, and Med-PaLM 2.

6.2 Maturity: Biology's GPT-2 Moment

Biology foundation models have clearly moved beyond pure research, but the field is still uneven in maturity. Protein structure prediction is already production-grade, digital pathology is commercial, and multiple AI-designed drug programs are in the clinic — enough to prove the paradigm is real, but not enough to say the category has fully broken out.

That is why biology looks more like the GPT-2/GPT-3 era than the ChatGPT era. The models are impressive to experts, capital is flowing, and the milestones are meaningful — including the 2024 Nobel Prize for AlphaFold, Insilico Medicine’s IPO, Generate Biomedicines’ IPO filing, and 173+ AI-originated drug programs in development — but the field still lacks a single simple, undeniable breakthrough that changes mainstream adoption overnight.

6.3 Market Dynamic & Startup Landscape

6.3.1 Market Size

AI in drug discovery is still a relatively modest market today — roughly $2–7B in 2025 — but it is growing quickly toward an estimated $8–25B by 2030, implying 20–30% annual growth. More importantly, that near-term market likely understates the strategic ceiling: McKinsey estimates generative AI could create $60–110B of annual value across the broader pharma value chain if the technology delivers at scale.

Demand is not the core constraint. The top 20 pharma companies already spend roughly $180B per year on R&D, and large partnership announcements such as Novartis–Schrödinger ($2.3B), Isomorphic–Lilly ($1.7B), and Sanofi–Helixon ($1.7B) show that major buyers are actively allocating budget to AI-enabled discovery platforms.

6.3.2 Startup Landscape

Company	HQ	Founded	Stage / Funding	Investors
Isomorphic Labs	London	2021	Series A / $600M	Alphabet, Thrive Capital, GV
EvolutionaryScale	San Francisco	2023	Seed / $142M	Nat Friedman, Daniel Gross, Lux Capital, AWS, NVentures
Chai Discovery	San Francisco	2024	Series B / $130M	General Catalyst, Oak HC/FT, Menlo Ventures, OpenAI, Thrive Capital
Generate Biomedicines	Cambridge, MA	2018	Series C / $273M	Flagship Pioneering, Amgen, NVentures, ADIA, Fidelity
Recursion	Salt Lake City	2013	Post-IPO Equity / $200M	Lux Capital, DCVC, Mubadala, Leaps by Bayer
Insitro	S. San Francisco	2018	Series C / $400M	a16z, ARCH Venture Partners, Foresite Capital, CPP Investments
Insilico Medicine	Hong Kong	2014	Series E / $110M	Value Partners, Lilly, Tencent, Temasek
Bioptimus	Paris	2024	Series A / $41M	Cathay Innovation, Sofinnova Partners, Bpifrance, Andera Partners
Owkin	Paris	2016	Series B / $50M	Sanofi, Bristol Myers Squibb, GV, Cathay Innovation
Cradle	Amsterdam	2021	Series B / $73M	IVP, Index Ventures, Kindred Capital
Basecamp Research	London	2019	Series B / $60M	Singular, S32, True Ventures, Hummingbird Ventures
Relation Therapeutics	London	2019	Follow-on / $26M	NVentures, DCVC, Magnetic Ventures, Novartis
Aqemia	Paris	2019	Series A / $38M	Cathay Innovation, Wendel, Bpifrance, Eurazeo, Elaia
Converge Bio	Israel	2024	Series A / $25M	Bessemer Venture Partners, TLV Partners, Vintage Investment Partners, Saras Capital

6.4 Moats and Risks

Where the Moat Lies

Data, not models. AlphaFold 3 was closed-source; within months, three teams replicated it. Over 380 new bio-AI models published annually. Model moats erode fast. But proprietary experimental data (Recursion’s 50+ PB, Basecamp’s 9.8B protein sequences, Terray’s 5B chemistry interactions) cannot be replicated without years of infrastructure
Wet-lab integration. The companies closing the predict → synthesize → test → retrain loop at scale build compounding advantage. AI-only approaches have failed at translation (BenevolentAI’s Phase IIa failure is the cautionary tale)
Distribution. Schrödinger’s 30+ years embedding into pharma workflows creates real switching costs. Benchling’s 150%+ net dollar retention demonstrates platform lock-in

Key Risks

No AI-designed drug has received FDA approval yet. The 90% clinical failure rate has not demonstrably improved. AI may accelerate early discovery but the most expensive failures happen in Phase II/III
Public market carnage. Recursion −57% from IPO peak. AbCellera −93%. BenevolentAI delisted at ~€0.11/share. Generate Bio IPO opened 21% below pricing
The biobucks illusion. A $1.7B partnership headline typically means ~$35M in actual upfront cash. Rigorous diligence on real vs. aspirational revenue is essential
Open-source commoditization. The most impactful releases (Chai-1/2, Boltz-1/2, Evo 2, ESM3) have been open. Companies whose primary asset is a model — without proprietary data — face existential risk
Long timelines. Drug development takes 10–15 years. Even AI-compressed preclinical timelines (Insilico: 30 months vs. 6–8 years typical) still leave years before regulatory approval

7. Conclusion

Foundation models beyond text are progressing rapidly, but the technical maturity and commercial adoption levels vary significantly by domain. Across domains, training frontier models is typically capital-intensive, while many downstream products and platforms can build on increasingly capable model ecosystems.

Of the five domains covered, Embodied AI and Biology appear to be at particularly dynamic stages (rapid capability progress alongside evolving market structure), while Audio & Speech and Vision show more signs of consolidation and bundling by large platforms. Materials Science is advancing quickly in research benchmarks and dataset scale, but widespread commercial validation remains limited relative to the scale of investment to date.

A recurring theme is that Big Tech’s role differs by domain: in Audio & Speech and Vision, large platforms are often direct competitors through bundling; in Embodied AI and Materials Science, large platforms also play major roles as infrastructure providers (compute, simulation, developer ecosystems) and investors.

‍

References and Resources

The information contained in this article is provided for informational and educational purposes only and does not constitute an investment recommendation or any other type of professional advice. The views and opinions are those of the author at the time of publication and are subject to change at any time. Any mention of a company name or security is not a recommendation to purchase.

‍

Published on:

28/4/2026

Authors

Bin Jin, PhD

Data Science Manager

Jacob Emanuel Toresson

Associate

Domain Specific Foundation Models

1. Introduction

💡Definition of Foundation Model

1.1 The Domains In Scope Of This Paper

💡Maturity shorthand

2. Audio & Speech

💡

2.1 Technology

2.2 Market Dynamics

3. Vision (Image & Video)

💡

3.1 Technology

3.2 Image Generation: Already Commoditized

3.3 Video Generation: Expensive and Unsettled

3.4 Market Dynamics

4. Embodied AI

4.1 Technology

4.2 Vision-Language-Action (VLA) Models

4.3 World Models

4.4 Market Dynamics

5. Materials Science

5.1 Technology and Data

5.2 Maturity: The Field Approaches Its Inflection Point but Awaits a Defining Breakthrough

5.2.1 Evidence for the Inflection

5.2.2 Why It's Still Early

5.3 Core Application Areas

5.4 Market Dynamics

5.4.1 Market size

5.4.2 Startup Landscape

5.5 Moats and Risks

💡The simulation-to-reality gap is the single greatest risk

6. Biology

6.1 Technology: Five Distinct Fields

6.2 Maturity: Biology's GPT-2 Moment

6.3 Market Dynamic & Startup Landscape

6.3.1 Market Size

6.3.2 Startup Landscape

6.4 Moats and Risks

7. Conclusion

References and Resources

Authors

Bin Jin, PhD

Jacob Emanuel Toresson

Related articles

Want to know more?

Send us a message