1. Introduction
Over the past few years, the AI race has been dominated by Big Tech players (OpenAI, Google, Anthropic, Meta) scaling ever-larger language models. Since ChatGPT was launched in 2022, massive capital has flowed into LLMs, and the window for venture-backed challengers in text has largely narrowed.
But intelligence isn’t only text. The physical world runs on sound, light, molecules, proteins, and motion, and foundation models for these domains are still early in terms of technology and/or commercial usage. The race to build foundation models beyond text remains open, with active research and early commercialization across multiple domains.
This paper focuses on domain-specific foundation models — models trained for a particular domain (e.g., audio, vision, biology) and adapted to multiple downstream tasks within that domain.[1] The focus is on foundation model builders and enabling platforms, and generally excludes narrow tools and point-solution application companies.
Throughout this paper, we separate the model layer from the application layer. The model layer covers companies building domain-specific foundation models and the infrastructure behind them (architectures, data, training, and inference). The application layer covers products that use these models in real workflows.
1.1 The Domains In Scope Of This Paper
This paper excludes text (LLMs) given its maturity with already emerging winners (Big Tech). We focus on five non-text domains that capture most current foundation-model activity: Audio & Speech, Vision, Embodied AI, Materials Science, and Biology. Each sits at a different maturity level (Audio/Vision most mature; Embodied AI at an inflection point; Materials/Biology earliest). The current maturity of each domain is illustrated below.

2. Audio & Speech
2.1 Technology
The majority of audio AI products today rely on a three-stage pipeline which is shown in the picture below.

In this architecture, the LLM remains the core foundation model. Companies like ElevenLabs, Deepgram, and Cartesia are building sophisticated Speech-To-Text (STT) and Text-To-Speech (TTS) models, but they still rely on external LLMs.
That said, genuinely audio-native foundation models do exist in more niche sub-domains such as, for example, music generation models. These are often audio-native (no LLM in the loop). E.g., Suno uses diffusion/transformer approaches to generate audio directly.
2.2 Market Dynamics
The global voice AI market is already valued at ~$12B and is expected to reach $30B by 2030.[3] A few clear commercial winners have already emerged in this space:
- ElevenLabs is trying to be the full-stack AI voice tool (Raised $781M)
- Suno is one of the fastest-scaling consumer AI products ever built, launched in December 2023, and was at ~$300m of ARR as of February 2026 (raised $375m)
- Cartesia is building ultra-low-latency voice models using state-space models (SSM) architecture rather than transformers (raised $191m)
- Hume AI is the niche leader in emotion-aware audio with its empathic voice interface (raised $80M)
Critically, Big Tech is also going after this domain. OpenAI, Google, and Meta are all building native voice capabilities directly into their foundation models, collapsing the STT → LLM → TTS pipeline into a single system. This threatens standalone audio AI companies by removing the need for third-party speech models in many use cases.
Startup landscape
Beyond the foundation model builders, several startups are operating within the Audio & Speech application layer:
3. Vision (Image & Video)
Generative vision has matured to the point where it is no longer obvious where value accrues. On one hand, the economics and competitive dynamics of training frontier image/video models increasingly resemble a scale game dominated by the deepest balance sheets — a reality underscored by OpenAI’s shutdown of Sora in March 2026 after reportedly burning ~$1M/day for just $2.1M in lifetime revenue.[4] On the other hand, “model layer” is not monolithic: there are still pockets where differentiated data, distribution, or platform integration can create leverage.
3.1 Technology
Image generation models have two main architectures: diffusion models (Stable Diffusion, Imagen, DALL-E) that iteratively refine noise into images, and visual transformers (Black Forest Labs' FLUX) that apply LLM-style attention to visual data.
Recent releases such as GPT-4o image generation underscore how quickly image-generation capabilities are being productized and bundled into broader AI platforms.[5]
Video generation extends these techniques across time, but the jump in complexity is dramatic: a 10-second clip at 24fps requires 240 temporally consistent frames, pushing compute requirements by orders of magnitude. A high-quality image costs $0.02–0.04 via API; a minute of video runs $2.40–30.00.[6]
3.2 Image Generation: Already Commoditized

The quality gap between leading image models has effectively closed — FLUX.2 Pro and GPT Image 1.5 are statistically tied on benchmarks, and open-source models now match commercial offerings in blind evaluations.[7] Pricing has compressed 10x in two years, with viable API options starting at $0.003/image.[8] When Google's Gemini image generation attracted 13 million new users in four days and produced 5 billion images within weeks, it confirmed that image generation is now a feature, not a product.[9]
The leading players are already clear. Midjourney dominates consumer/prosumer with $500M in 2025 revenue, entirely bootstrapped.[10] Black Forest Labs has emerged as the B2B infrastructure winner ($300M raised at $3.25B valuation), powering Grok and signing ~$300M in platform deals with Adobe, Canva, and Meta.[11][12] Google runs generation across 650 million MAUs.[13] Critically, Google now offers AI product photography for free in Merchant Center,[14] and Zalando reports AI imagery at 85% lower cost with conversion rates within 4% of professional photography.[15] The broader market sits at ~$3B in 2025, growing 30%+ annually — but the value is accruing to incumbents, not new entrants.[16]
3.3 Video Generation: Expensive and Unsettled

Unlike images, video remains fiercely competitive with no clear winner — and, tellingly, Chinese-origin models now dominate. On the Artificial Analysis leaderboard, HappyHorse 1.0 (Alibaba-affiliated) leads at Elo 1,362, followed by ByteDance's Seedance 2.0 and Kuaishou's Kling 3.0. The top Western model, Runway's Gen-4.5, ranks only #7.[6]
Sora's collapse is instructive: it hit #1 in the App Store at launch, then downloads fell 66% in three months.[4] Disney had committed $1B tied to Sora and learned of the shutdown less than an hour before the public.[17] Even OpenAI concluded that standalone video generation does not justify the economics.
Capital concentration is extreme: Luma AI raised $900M at $4B, Runway raised $315M at $5.3B, and total sector funding hit $3.08B in 2025 (+95% YoY). However, both players are now pivoting their narrative to "world models" — a sign that pure video generation alone may not sustain their valuations. The economics are improving (inference costs down ~1,000x in three years) but remain punishing: most models cap out at 5–20 seconds of coherent output, and no open-source model comes close to commercial leaders.
3.4 Market Dynamics
In many cases, value creation appears to be shifting toward the application layer — companies that treat generation models as interchangeable infrastructure and build differentiation through domain expertise, proprietary data, and workflow integration.
Vertical visual AI platforms are the most compelling category. Raspberry AI ($28M raised, a16z-led) has signed 70+ fashion brands by training custom models on brand-specific design DNA. Omi (€13M, Paris) generates photorealistic 3D product visuals for Clarins, Nestlé, and Meta. Photoroom ($500M valuation) has crossed 200M downloads. These companies embed industry-specific workflows that horizontal model providers cannot easily replicate.
Video editing and repurposing tools may actually be a better category than video generation itself. Descript has crossed $100M ARR with text-based editing. OpusClip ($215M valuation, SoftBank) has generated 172M clips and 57B views. Lightricks ($1.8B valuation) combines an open-source video model with a commercial studio. These businesses benefit from a structural asymmetry: they harvest AI productivity gains without bearing the full weight of generation compute costs.
4. Embodied AI
Foundation models in Embodied AI are large-scale, pretrained AI systems that learn generalizable representations of the physical world. They model how objects look, how they behave under force, and how a robot should move to accomplish a task. The models are typically general but built in such a way that they can be adapted to specific robots and use cases.
Instead of hard-coding behavior, you train a model that learns to act in the physical world. The goal is to build models that generalize across tasks, robots, and environments.
4.1 Technology
The embodied AI value chain can be grouped into four layers:
There are several types of foundation models for Embodied AI. The two that are currently the most widely adopted are Vision-Language-Action (VLA) models and World Models.
4.2 Vision-Language-Action (VLA) Models
VLAs are the core "brain" of many AI-powered robots today. A VLA takes in camera images of the robot's surroundings and written text instructions as input. The output is the physical movement of the robot to perform the instructions (e.g. "pick up the red mug"). What makes VLAs powerful is that they are built on top of large vision-language models that have been pretrained on massive amounts of internet data. This means the robot inherits a broad understanding of the world and can handle objects and situations it has never specifically been trained on.
VLA models are still a new concept that was first coined in 2023:[18]
- 2023: Google publishes RT-2 and coins the term "Vision-Language-Action model" for the first time. This model proved that a VLM pretrained on internet data can directly output robot actions
- 2024: Stanford released the open-source model OpenVLA, allowing anyone to experiment and do research on VLA models
- 2025: NVIDIA releases GR00T N1, Physical Intelligence releases π₀, and Google releases Gemini Robotics. With these models, VLAs go from being theoretical to being used in commercial robots at scale
4.3 World Models
World models are generative models whose purpose is to predict what will happen next in a physical environment. This is useful for two reasons. First, it lets the robot plan ahead by simulating outcomes before actually moving. Second, it can be used to generate large amounts of synthetic training data — critical given the scarcity of real-world robot data.
World models as a concept have been around since 2018, but only in recent years have they become practical tools for training robots used in production.
- 2018: Ha & Schmidhuber publish the paper "World Models" - introducing the idea[19]
- 2020–2024: Hafner et al. publish the models Dreamer V1, V2, and V3. a series of RL agents that learn world models and use them to imagine future outcomes before acting
- 2024: Google DeepMind releases Genie 1 & 2
- 2025: NVIDIA launches Cosmos and Google releases Genie 3 - world models that are now becoming core infrastructure for training robots and autonomous vehicles
4.4 Market Dynamics
Foundation models for Embodied AI have the largest addressable market potential in 2030 than for the other domains included in this paper. Some market segments where foundation models for embodied AI are expected to be multi-billion dollar markets in 2030 include humanoid robots, warehouse & logistics robotics, and autonomous vehicles. But as of 2026 - most of this potential remains unrealized. Today's commercial revenue from AI-native robotics is growing rapidly but is still small, concentrated in warehouse picking and a handful of manufacturing pilots.
The Field Approaches Its Inflection Point
- Lab-to-deployment in 18 months: VLA models went from academic papers to commercial warehouse and factory deployments (Sereact, Figure, Agility) by early 2025
- Real commercial contracts: Figure–BMW, Apptronik–Mercedes-Benz, Agility–Amazon/GXO, Sereact–European logistics customers.
- Hardware scaling: Agility Robotics opened RoboFab, the first humanoid robot factory, with initial capacity to produce ~10,000 units annually. Figure is building a factory targeting similar volumes
- Open-source momentum: OpenVLA, LeRobot, and NVIDIA's open GR00T N1 are democratising access to capable robot foundation models - making it easier than ever before to build companies leveraging these models
- Talent migration: Top researchers from Google DeepMind, Meta FAIR, and Stanford are founding or joining robotics startups (Physical Intelligence, Skild AI, Collaborative Robotics)
Why It's Still Early
The gap between impressive demos and reliable production deployment is shrinking, but it remains significant:
- Sim-to-real gap: Behaviors that work perfectly in simulation frequently fail in the real world due to differences in surfaces, friction, lighting, and sensor noise. Scaling deployments still requires extensive on-site testing and fine-tuning
- Data scarcity: High-quality real-world robot demonstration data is expensive and slow to collect. Open X-Embodiment has 1M+ trajectories, but this is orders of magnitude less than the billions of text tokens used to train LLMs[20]
- Unit economics remain unproven: Most humanoid robot companies are pre-revenue or deeply unprofitable. The cost of a humanoid robot (~$50K–$150K target) must compete with human labour costs, and the payback period for customers is still unclear
The current state can be compared to when GPT-3 was released. It was impressive and useful for specific use cases, but the "ChatGPT moment" of broad, reliable, general-purpose deployment has not yet arrived.
The Leading VLA Models
VLA foundation models are dominated by Big Tech players as well as well-funded startups, making it hard for new entrants to break in without significant capital.
The Leading World Models
Similarly to building VLA models, World models are also currently capital intensive to build/train and is mostly dominated by Big Tech players as well as a few well-funded startups.
Recent Investment Rounds
Total disclosed VC funding into Embodied AI startups exceeded $5B in 2024–2025. That figure is driven by a few very large rounds at the model and hardware layer — for example, Physical Intelligence, Figure (about $675M), and Skild AI (about $300M) — as well as repeated large financings across humanoid robotics, including 1X, Agility, and Apptronik. That makes Embodied AI the most heavily funded non-text domain in this paper after Vision (video generation). Unlike Vision, though, the capital is spread across more companies and stages, suggesting the market is still early and not yet consolidated.
Startup landscape
Embodied AI is seeing rapid capital formation, and valuations re-rate quickly. Many prominent teams move into $1B+ valuations and $100M+ rounds within a short time, making them difficult for many investors to access at early stages. The two tables below reflect this split: first, companies that are earlier stage; then, important category leaders that we track for ecosystem context but that are largely later stage due to valuation/stage.
5. Materials Science
Foundation models in materials science are large-scale, pre-trained AI systems that learn generalizable representations across diverse chemical structural data (mainly inorganic materials) — then fine-tune or prompt for specific downstream tasks. These models fall into three functional categories:
- Property prediction models predict material properties — energy, stability, band gap — from structure.
- Generative models perform inverse design, generating novel structures given desired property constraints.
- Simulation acceleration models replace expensive density functional theory (DFT)calculations, enabling molecular dynamics simulations 10’000× faster than DFT at comparable accuracy
5.1 Technology and Data
Five primary architectural families dominate the field:
- Graph Neural Networks (GNNs) and Equivariant NNs
- Treat crystal structures as graphs of atomic connections. Models such as DeepMind's GNoME and Microsoft's MatterSim learn energy and force landscapes from millions of DFT-computed structures.[21] Meta FAIR's EquiformerV2 (released in OMat24) sets new state-of-the-art benchmarks.
- Generative Diffusion Models
- Adapted from image generation, these models directly generate novel material structures conditioned on desired properties. Microsoft's MatterGen is the flagship — trained on 608,000 stable materials, it creates new crystal structures given prompts specifying mechanical, electronic, or magnetic properties. MatterGen achieved the first experimental synthesis of a generative-model-designed material — TaCr₂O₆ with bulk modulus within 20% of its design target (Nature, January 2025).[22]
- Transformers with Geometric Inductive Biases
- Meta's eSEN combines self-attention with rotationally equivariant spherical-harmonic encodings. EPFL submitted the largest model on the Matbench Discovery leaderboard — PET-OAM-XL at 730M parameters — ranking second overall in January 2026.
- Language Models for Materials (MatLMs)
- Domain-specific BERT/GPT models (MatBERT, CatBERTa, BatteryBERT, TransPolymer) trained on scientific literature and chemical databases. These extract synthesis conditions, predict properties, and mine patents. IBM released FM4M, an open-source multi-modal foundation model family with 100,000+ HuggingFace downloads.
- Multimodal and Agentic AI Systems
- MIT's Llamole integrates an LLM backbone with graph-based modules to design synthesizable molecules and generate synthesis plans. MatAgent (University of Tokyo, 2025) uses an LLM "brain" to guide inorganic materials search with natural language reasoning.
Critical Open Datasets
The release of massive open datasets in 2024–2025 was a watershed. Meta's OMat24 contains 110M+ DFT calculations — roughly two orders of magnitude larger than previous datasets. OMol25 (100M+ molecular DFT calculations requiring 6B CPU hours) is the largest quantum chemistry dataset ever created, described as dividing the field into "pre-OMol25" and "post-OMol25" eras. CuspAI and Meta jointly released OpenDAC, the world's largest dataset on CO₂ sorbent materials with 100M+ datapoints. These datasets reduce the "data moat" advantage and democratize foundation model development — but still remain heavily skewed toward inorganic crystalline materials, with critical gaps for polymers, composites, and amorphous materials.
5.2 Maturity: The Field Approaches Its Inflection Point but Awaits a Defining Breakthrough
Multiple authoritative voices now describe AI in materials science as approaching its inflection point.[23]
5.2.1 Evidence for the Inflection
- Scale of discovery:
- GNoME identified 2.2M new stable crystal structures — equivalent to 800 years of prior research — and 736 were independently verified experimentally
- Benchmark performance:
- The top model on Matbench Discovery achieves F1 = 0.931 for crystal stability prediction, serving as a high-quality filter for filtering candidate materials. Meta's UMA demonstrated a single model matching or beating specialized models across molecules, materials, and catalysts without fine-tuning.
- Autonomous lab validation:
- Berkeley's A-Lab synthesized 41 novel materials in 17 continuous days — 71% success rate on computationally predicted targets.
- Nobel recognition:
- AlphaFold winning the 2024 Nobel Prize in Chemistry validated the entire paradigm of AI-driven molecular science
- Talent migration:
- Researchers behind ChatGPT and DeepMind’s GNoME have begun founding new materials‑AI startups (e.g., Periodic Labs, founded by ChatGPT co‑creator Liam Fedus and former DeepMind materials lead Ekin Dogus Cubuk, who worked on GNoME) — echoing the 2014–2016 wave of deep‑learning talent that seeded the modern NLP startup ecosystem.
- Massive capital influx:
- $300M seed rounds and $100M+ Series A rounds for pre-revenue companies (Periodic Labs, CuspAI, Lila Sciences) echo the ChatGPT-era VC frenzy
5.2.2 Why It's Still Early
However, MIT Technology Review reported in December 2025: "So far there has been no 'eureka' moment, no ChatGPT-like breakthrough — no discovery of new miracle materials or even slightly better ones."[24] Most predicted structures were trivial variants of known materials or computationally stable at absolute zero but impractical under real-world conditions.
Key friction points:
- Data scarcity:
- Materials science has orders of magnitude less training data than NLP or even biology. The primary constraint remains high-quality experimental data
- Prediction-to-synthesis gap:
- AI can generate millions of candidate structures, but validating them experimentally is costly and slow. Over 80% of AI-recommended materials exhibit crystallographic disorder causing properties to diverge from theory
- Synthesizability:
- Predicting a stable structure is easier than predicting whether it can actually be made. New LLM frameworks like SynCry are beginning to tackle this
- Data integrity:
- AI-generated microscopy images are indistinguishable from experimental data by experts, and 20–30% of materials characterization analyses contain errors
- Domain gaps:
- Training data is heavily skewed toward inorganic crystalline materials, leaving critical gaps for polymers, composites, and amorphous materials. Properties like manufacturability, processability, and real-world performance under operating conditions cannot be reliably predicted from structure alone
The current state is analogous to the GPT-2 era in the text domain: the technology clearly works at unprecedented scale, investment is surging, yet systematic commercialization and reliable real-world deployment lag behind proof-of-concept results.
5.3 Core Application Areas
- Energy Materials (Batteries, Fuel Cells, Photovoltaics)
- AI is actively screening electrolytes, cathode/anode materials, and solid-state ion conductors. SES AI generated $9.3M in H1 2025 revenue from AI-enhanced battery materials contracts. Aionics partners with Porsche (Cellforce Group) for bespoke EV batteries.[26] Microsoft's Azure Quantum Elements collaboration with Pacific Northwest National Lab identified a novel solid electrolyte reducing lithium use by ~70%, going from computation to prototype in under 9 months.[27] Related coverage also appeared in Science.[28] AI-driven active learning has demonstrated a 75% reduction in organic solar cell material discovery time.[29]
- Catalysis and Carbon Capture
- Foundation models guide catalyst design for CO₂ conversion, hydrogen evolution, and green chemical synthesis. CuspAI's OpenDAC dataset enables design of direct-air-capture materials. Orbital Materials' first commercial product — carbon capture using AI-designed sorbents — is in early-stage commercialization. Copernic Catalysts with Schrödinger achieved 47% energy reduction in ammonia synthesis.
- Semiconductors and Quantum Materials
- MIT's LLM-based synthesis framework improved prediction accuracy for quantum material synthesis pathways from under 40% to nearly 90%. Periodic Labs has already secured semiconductor customers for next-generation heat dissipation materials. GNoME's discoveries include candidate superconductors and novel optical materials.
- Polymer and Specialty Chemical Design
- TransPolymer and related LLMs enable inverse design of polymers with targeted thermal, mechanical, and dielectric properties. Chemify's Chemputation platform generates novel molecules for agricultural, pharmaceutical, and materials applications. Matmerize's PolymRize platform serves enterprise customers including Asahi Kasei.
5.4 Market Dynamics
5.4.1 Market size
This domain is a small market today but growing quickly: Generative AI in material science is estimated at $1.49B in 2025, projected to reach $12.90B by 2035 (CAGR 24.1%).[25] For context, this is still materially smaller than adjacent AI-native R&D categories like AI drug discovery; the implication is that near-term value capture is driven more by enterprise budgets and specific high-value use cases (batteries, semiconductors, carbon capture) than by a single, large, homogeneous "platform" market.
5.4.2 Startup Landscape
The startup landscape is still early but increasingly crowded, with a mix of data-platform players, autonomous-lab companies, and vertical specialists targeting batteries, carbon capture, polymers, and synthesis workflows. The table below includes both earlier-stage companies and a few category-defining outliers (notably Periodic Labs and Lila Sciences) that matter because they are setting the pace for capital formation and competitive intensity in the space.
Total disclosed VC funding into AI materials science startups exceeded $1.2B in 2024–2025 — and notably, Periodic Labs and Lila Sciences reached unicorn valuations pre-revenue, underscoring both conviction and valuation risk in the category.
5.5 Moats and Risks
Where the Moat Lies: Data Flywheels
The competitive dynamics will likely not follow the LLM playbook. Unlike text data, materials science data is scarce, expensive to generate, fragmented, and often proprietary. This reshapes defensibility.
The most defensible positions emerge at four levels:
- Proprietary experimental data + autonomous lab feedback loops — the strongest moat. Companies operating their own wet labs (Periodic Labs, Lila Sciences, Radical AI, Orbital Materials) generate proprietary data in closed loops where each experiment compounds the advantage
- Synthesis-aware modeling — CuspAI's approach of generating materials that chemical companies can actually manufacture, bridging the prediction-to-production gap
- Domain-specific workflow integration — Citrine and NobleAI embedding into enterprise R&D workflows creates switching costs
- Vertical application expertise with deep industry partnerships (Aionics in battery electrolytes, Matmerize in polymers) builds customer lock-in and proprietary datasets within specific end markets
The likely outcome is multiple specialized foundation models for different material classes and length scales, with the most durable value accumulating at the application and data layers.
Key Risks
- Data scarcity is structural, not temporary: Generating a single high-quality experimental dataset costs $100K+ per campaign.
- No commercial "win" yet: Despite $1.2B+ in startup funding, no AI-designed material has achieved commercial deployment outside drug candidates in clinical trials
- Long validation cycles: Traditional materials development takes 20+ years lab-to-market. AI may compress early stages significantly, but regulatory timelines remain lengthy
- Customer willingness to pay: The chemical/materials industry is cost-sensitive and low-margin.
- Valuation risk: Periodic Labs ($1.5B) and Lila Sciences ($1.3B) reached unicorn valuations pre-revenue, creating execution risk
- Talent scarcity: The intersection of deep AI expertise and materials science knowledge is an extremely small talent pool, concentrated at a handful of institutions
6. Biology
Similar to foundation models in material science, foundation models in biology are large AI systems pretrained on vast biological data — protein sequences, DNA, cell measurements, medical images, clinical records — that can then be adapted for specific downstream tasks like designing new drugs, predicting disease mutations, or diagnosing cancer from tissue slides. Just as GPT learned the "language" of the internet, these models learn the "language" of biology.
The 2024 Nobel Prize in Chemistry for AlphaFold validated the entire paradigm.[30] Over 200 distinct biology foundation models have now been published, and the pace is accelerating. AI-designed drugs are entering Phase 2/3 clinical trials, pathology AI is FDA-cleared and deployed in thousands of clinics, and the first AI drug discovery company (Insilico Medicine) has gone public.
One example of this clinical progress is Insilico Medicine’s published Phase IIa results for rentosertib.[31]
6.1 Technology: Five Distinct Fields
Biology foundation models span five layers, each at a different stage of development:
- 1. Protein Structure & Design
- These models predict 3D protein structures and increasingly help design entirely new proteins, antibodies, and binders. This is the most mature biology subcategory, with strong scientific validation and early translation into drug discovery. Key names include AlphaFold 3, IsoDDE, ESM3, Chai-2, and RFdiffusion / ProteinMPNN.
- 2. Genomic / DNA Models
- These models learn the “language” of DNA to predict variant effects, classify sequences, and support gene-editing design. The category is earlier than protein models, but it is progressing quickly as models scale and become more biologically useful. Key names include Evo 2, Nucleotide Transformer, and OpenCRISPR-1.[32]
- 3. Single-Cell & Virtual Cell Models
- These models aim to predict how individual cells respond to drugs, mutations, or perturbations, with the long-term goal of simulating cellular behavior. This remains mostly a research-stage category, but it could become foundational for future virtual-cell platforms. Key names include Geneformer V2, scGPT, and CZI’s Virtual Cell Platform.
- 4. Digital Pathology
- These models analyze tissue-slide images to detect disease, classify tumors, and predict treatment response. This is the most commercially mature biology subcategory, with real clinical deployment and FDA-cleared products already in use. Key names include H-Optimus-1, PLUTO-4, and Paige PanCancer.
- 5. Clinical / Multimodal
- These models try to combine molecular, biological, imaging, and clinical data into one system that can support diagnosis, trial design, and treatment decisions. This category is still early and ambitious, but it could become the broadest and most valuable layer if multimodal integration works at scale. Key names include Owkin, xTrimo V3, and Med-PaLM 2.
6.2 Maturity: Biology's GPT-2 Moment
Biology foundation models have clearly moved beyond pure research, but the field is still uneven in maturity. Protein structure prediction is already production-grade, digital pathology is commercial, and multiple AI-designed drug programs are in the clinic — enough to prove the paradigm is real, but not enough to say the category has fully broken out.
That is why biology looks more like the GPT-2/GPT-3 era than the ChatGPT era. The models are impressive to experts, capital is flowing, and the milestones are meaningful — including the 2024 Nobel Prize for AlphaFold, Insilico Medicine’s IPO, Generate Biomedicines’ IPO filing, and 173+ AI-originated drug programs in development — but the field still lacks a single simple, undeniable breakthrough that changes mainstream adoption overnight.
6.3 Market Dynamic & Startup Landscape
6.3.1 Market Size
AI in drug discovery is still a relatively modest market today — roughly $2–7B in 2025 — but it is growing quickly toward an estimated $8–25B by 2030, implying 20–30% annual growth. More importantly, that near-term market likely understates the strategic ceiling: McKinsey estimates generative AI could create $60–110B of annual value across the broader pharma value chain if the technology delivers at scale.
Demand is not the core constraint. The top 20 pharma companies already spend roughly $180B per year on R&D, and large partnership announcements such as Novartis–Schrödinger ($2.3B), Isomorphic–Lilly ($1.7B), and Sanofi–Helixon ($1.7B) show that major buyers are actively allocating budget to AI-enabled discovery platforms.
6.3.2 Startup Landscape
6.4 Moats and Risks
Where the Moat Lies
- Data, not models. AlphaFold 3 was closed-source; within months, three teams replicated it. Over 380 new bio-AI models published annually. Model moats erode fast. But proprietary experimental data (Recursion’s 50+ PB, Basecamp’s 9.8B protein sequences, Terray’s 5B chemistry interactions) cannot be replicated without years of infrastructure
- Wet-lab integration. The companies closing the predict → synthesize → test → retrain loop at scale build compounding advantage. AI-only approaches have failed at translation (BenevolentAI’s Phase IIa failure is the cautionary tale)
- Distribution. Schrödinger’s 30+ years embedding into pharma workflows creates real switching costs. Benchling’s 150%+ net dollar retention demonstrates platform lock-in
Key Risks
- No AI-designed drug has received FDA approval yet. The 90% clinical failure rate has not demonstrably improved. AI may accelerate early discovery but the most expensive failures happen in Phase II/III
- Public market carnage. Recursion −57% from IPO peak. AbCellera −93%. BenevolentAI delisted at ~€0.11/share. Generate Bio IPO opened 21% below pricing
- The biobucks illusion. A $1.7B partnership headline typically means ~$35M in actual upfront cash. Rigorous diligence on real vs. aspirational revenue is essential
- Open-source commoditization. The most impactful releases (Chai-1/2, Boltz-1/2, Evo 2, ESM3) have been open. Companies whose primary asset is a model — without proprietary data — face existential risk
- Long timelines. Drug development takes 10–15 years. Even AI-compressed preclinical timelines (Insilico: 30 months vs. 6–8 years typical) still leave years before regulatory approval
7. Conclusion
Foundation models beyond text are progressing rapidly, but the technical maturity and commercial adoption levels vary significantly by domain. Across domains, training frontier models is typically capital-intensive, while many downstream products and platforms can build on increasingly capable model ecosystems.
Of the five domains covered, Embodied AI and Biology appear to be at particularly dynamic stages (rapid capability progress alongside evolving market structure), while Audio & Speech and Vision show more signs of consolidation and bundling by large platforms. Materials Science is advancing quickly in research benchmarks and dataset scale, but widespread commercial validation remains limited relative to the scale of investment to date.
A recurring theme is that Big Tech’s role differs by domain: in Audio & Speech and Vision, large platforms are often direct competitors through bundling; in Embodied AI and Materials Science, large platforms also play major roles as infrastructure providers (compute, simulation, developer ecosystems) and investors.
%20Research%20page%20%20(7).jpg)




