Vellum is coming to the AI Engineering World's Fair in SF. Come visit our booth and get a live demo!
Native support for SambaNova inference in Vellum

Now you can run Llama 3.1 405b, with 200 t/s via SambaNova on Vellum!

2 min
Written by
Reviewed by:
No items found.

The Llama 3.1 405B model, with its 405 billion parameters, offers exceptional capabilities but requires substantial computational resources.

Running this model effectively requires high-performance hardware, including multiple GPUs with extensive VRAM.

SambaNova addresses these computational demands through its SN40L Reconfigurable Dataflow Unit (RDU), a processor specifically designed for AI workloads. The SN40L features a three-tier memory system comprising on-chip distributed SRAM, on-package High Bandwidth Memory (HBM), and off-package DDR DRAM. This architecture enables the chip to handle models with up to 5 trillion parameters and sequence lengths exceeding 256,000 tokens on a single system node.

Today, they offer the Llama 3.1 405B model (comparable to GPT-4o) at speeds of up to 200 tokens per second—2x faster than GPT-4o.

With this integration, you can test the Llama 3.1 405B model, and evaluate how it compares with your current model selection.

How the native integration works

Starting today, you can enable the Llama 3.1 405b - Samba Nova model in your workspace.

To enable it, you only need to get your API key from your SambaNova profile, and add it as a Secret named SAMBANOVA on the “API keys” page:

Then, you should enable the model from your workspace, by selecting the secret you just defined:

Then, in your prompts and workflow nodes, simply select the model you just enabled:

What do you get with SambaNova

Comparison of 405 vs GPT-4o, check the leaderboard here.

SambaNova's integration with Vellum brings key advantages for developers working with the Llama 3.1 405B model:

Fast Performance: SambaNova Cloud runs Llama 3.1 405B at 200 tokens per second, which is 2x faster than running GPT4o.

Lower output cost: SambaNova's picing is $5 for input tokens and $10 for output tokens, compared to GPT-4o’s $5 for input and $15 for output.

Accurate Outputs: SambaNova keeps the original 16-bit precision of the model, so you get reliable and accurate results without cutting corners. Check how the Llama 3.1 450b compares with other models in our LLM leaderboard.

Handles Complex Applications: The platform is designed to support demanding use cases like real-time workflows and multi-agent systems, making it flexible for a variety of projects.

If you want to test the inference speed with SambaNova —  get in touch! We provide the tooling & best practices for building and evaluating AI systems that you can trust in production.

ABOUT THE AUTHOR
Anita Kirkovska
Founding Growth Lead

An AI expert with a strong ML background, specializing in GenAI and LLM education. A former Fulbright scholar, she leads Growth and Education at Vellum, helping companies build and scale AI products. She conducts LLM evaluations and writes extensively on AI best practices, empowering business leaders to drive effective AI adoption.

ABOUT THE reviewer

No items found.
lAST UPDATED
Dec 9, 2024
Share Post
Expert verified
Related Posts
Guides
September 16, 2025
4 min
Google's AP2: A new protocol for AI agent payments
Guides
September 15, 2025
6 min
We don’t speak JSON
LLM basics
September 12, 2025
10 min
Top 13 AI Agent Builder Platforms for Enterprises in 2025
LLM basics
September 12, 2025
8 min
Top 12 AI Workflow Platforms in 2025
Customer Stories
September 8, 2025
8
How Marveri enabled lawyers to shape AI products without blocking developers
LLM basics
September 4, 2025
15
The ultimate LLM agent build guide
The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska
Head of Engineering

This is just a great newsletter. The content is so helpful, even when I’m busy I read them.

Jeremy Hicks
Solutions Architect

Experiment, Evaluate, Deploy, Repeat.

AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Email Signup
Sorts the trigger and email categories
Come to our next webinar
Description for our webinar
New CTA
Sorts the trigger and email categories

Start with some of these healthcare examples

SOAP Note Generation Agent
This agentic workflow generates a structured SOAP note from a medical transcript by extracting subjective and objective information, assessing the data, and formulating a treatment plan.
Personalized healthcare explanations of a patient-doctor match
An AI workflow that extracts PII data and match evidence then summarizes to the user why a patient was matched with a specific provider, highlighting factors like insurance, condition, and symptoms.

Start with some of these insurance examples

No items found.

Start with some of these agents

LinkedIn Content Planning Agent
This agentic workflow generates a 30 day LinkedIn content plan based on your content goals, target audience, and business information. It automates the process of generating content ideas, drafting posts, and organizing them into Notion for easy access and management.
Q&A RAG Chatbot with Cohere reranking
This agent is an Q&A chatbot (RAG) that leverages a search engine and a reranking model to provide accurate answers based on internal policy documents. It processes user queries and retrieves relevant policy excerpts, ensuring responses are well-cited and contextually grounded.
PDF Data Extraction to CSV
This agentic workflow extracts data from PDF files and converts it into structured CSV format. It processes each page of the PDF, generating separate CSV outputs for menu items, invoices, and product specifications.