Announcing Benchling Inference powered by Baseten: Scalable GPUs for scientific AI

Mihir Trivedi
Product Manager, Scientific Models
hero image

Today we are launching Benchling Inference, powered by Baseten. Benchling Inference provides scalable and cost-effective GPU capacity to train and run scientific models, without managing infrastructure. It comes preloaded with today's top scientific models and the integrations to make in silico discovery work out-of-the-box for biopharma companies.

We frequently hear that scientific model workloads are too bursty for traditional compute. You wait on the physical lab, data comes in waves, then you need to run 100,000 predictions in a few hours before going quiet for days. Cost-effective capacity that scales up fast and also scales to zero is difficult to acquire and maintain.

We've been running Baseten internally for Benchling's own Model Hub. We’ve learned a lot about tailoring inference for drug discovery and now we want customers to have the same access.

General-purpose inference wasn't designed for scientific workloads

The largest hyperscalers are on track to spend nearly $700 billion on AI infrastructure in 2026. Most of that is optimized for LLM workloads with reserved compute and predictable demand. 

There’s been an explosion in scientific models. Between 2020 and 2025, the number of new biology AI models released each year grew from 28 to more than 380. The challenge for most computational biology teams: HPC queues with multi-week backlogs, sending spreadsheets back and forth with wet lab researchers, large GPU reservations for jobs that run a fraction of the time, and predictions rationed during active campaigns. Protein structure prediction, molecular docking, and sequence analysis all have different cold start characteristics, memory requirements, and throughput patterns. 

The market is also split. Biotech startups can't get infrastructure at prices that work for them. Large pharma has invested heavily in HPC clusters, cloud commits, and computational teams, and new infrastructure has to work alongside that.

How Benchling Inference works

Baseten has spent six years building at the leading edge of inference infrastructure: making it reliable and economical to run specialized models in production, across 15+ cloud providers, with cold starts in 5–10 seconds. The foundation is the Baseten Inference Stack, a tightly integrated combination of a high-performance runtime (custom kernels, speculative decoding, KV cache optimizations) and inference-optimized infrastructure (intelligent request routing, autoscaling, active-active reliability across clouds). Every model deployed on Baseten seamlessly inherits these optimizations.

Benchling Inference combines Baseten with the scientific configuration and integrations that make inference work out-of-the-box for biopharma. By aggregating demand, we are also bringing better economics to biotech startups.

Scientists can either deploy third-party models or serve internal models built on their data from a unified compute environment. For teams with data sovereignty requirements, the Baseten Inference Stack runs identically in Baseten Cloud, your virtual private cloud (VPC), or a hybrid of both so predictions never have to leave your environment. Benchling also supports NVIDIA NIM microservices for additional GPU acceleration on specific models and algorithms. Computational scientists working in Jupyter notebooks or via SDK can call inference directly through Benchling.

Benchling Inference is available now. Sign up for access here

Powering breakthroughs for over 1,300 biotechnology companies

Helix Image