Vellum is coming to the AI Engineering World's Fair in SF. Come visit our booth and get a live demo!

Fine-tuning open source models: why is it relevant now?

Why fine tuning is now relevant with open source models

Written by
Reviewed by
No items found.

Five months ago, we wrote a blog on when fine tuning may be a good idea for your LLM application - there were clear cost and latency benefits for specialized tasks. However, 5 months is a long time in the world of LLMs! Since then, retrieval augmented generation has been far more popular and fine-tuning isn’t supported on the latest instruction tuned models from OpenAI or Anthropic either. More recently though, fine tuning has started to make a comeback coinciding with the rise of open source models. New open source models are being released quickly, with the hotly anticipated Llama 2 coming out yesterday (other top models are Falcon-40b, MPT-30b). And these models are very well suited for fine-tuning.

Why You Should Fine-Tune

"Prompt and prosper" may seem like the ideal mantra for working with LLMs, but eventually you'll find that relying exclusively on prompts can paint you into a corner. The initial ease of using prompts often gives way to challenges that become more pronounced over time. High costs, sub-optimal handling of edge cases, limited personalization, high latency, a tendency towards hallucination, and the gradual erosion of your competitive advantage are all potential issues that can take the sheen off your LLM deployment.

Enter fine-tuning: a method that enables you to optimize your LLMs for specific tasks, resulting in lower costs, improved accuracy, and lower latency. In the following sections, we'll explore fine tuning more, demonstrating how this approach is likely to be an important approach moving forward.

What is Fine-Tuning?

In the realm of AI (not just LLMs), fine-tuning involves training a pre-existing model on a smaller, task-specific dataset to adapt it to a particular task or domain.

The foundation model, a pre-trained LLM, serves as the initial starting point. The weights of this network are then further optimized based on the data specific to the task at hand. This process allows the model to develop a nuanced understanding of the particular context and language patterns it's being fine-tuned for.

The result is a model that uses its pre-trained proficiency in general language to become an expert in your specific application, thanks to the additional layer of learning imparted through fine-tuning. In essence, fine-tuning is a process of specialization that enhances the general skills of a language model to perform better on task-specific applications.

The Resurgence of Fine-Tuning with Open Source Models

The AI industry is moving fast, and new developments constantly make us rethink our strategies. Recently released high quality open source models are doing just that. 

The reason for this renewed interest lies in their performance. Open source models are showing potential that can be harnessed using fine-tuning, making them an attractive choice for LLM applications. By employing your own data, you can tune these models to align better with your specific needs. This move not only adds an extra layer of specialization to the model but also empowers you to maintain control of your AI strategy.

Advantages and Disadvantages of Fine-Tuning

Before we get too deep into fine-tuning, it's crucial to understand its benefits and potential drawbacks. Later we’ll share a step by step guide to fine tuning.

Benefits of Fine-Tuning

  1. Improved performance on specific tasks: By tailoring the model to your specific requirements, fine-tuning can result in a significant performance boost.
  2. Lower cost / latency: As the model becomes more efficient at its tasks, it uses fewer resources, leading to cost savings (no need to send the same prompt to the model in each request)
  3. Enhanced privacy: Since fine-tuning uses your own data and is deployed by you, it adds an extra layer of privacy to your operations.

However, there are also some challenges to keep in mind.

Challenges with Fine-Tuning

  1. Time consuming: Fine-tuning a model requires a significant time investment. This includes training and optimizing time for the model, in addition to determining the best practices and techniques for your approach
  2. Specific expertise needed: Fine tuning is a difficult task (often why users turn to prompting despite lower performances for specific tasks). Achieving optimal results typically requires a considerable amount of knowledge and expertise in parsing data, training, inference techniques, etc. 
  3. Infrastructure overhead: Finetuning an LLM on a large dataset can be a costly process, often requiring a complex setup and expensive GPU resources
  4. Lack of contextual knowledge: Finetuned models are trained to perform very specific tasks and often lack the versatility demonstrated by closed source models like GPT-4

A Step-by-Step Guide to Fine-Tuning Models

Embarking on the fine-tuning journey might seem daunting, but it doesn't have to be. Here's a straightforward guide to set you on the right path:

  1. Collect a substantial amount of quality data: Begin with collecting high-quality prompt and completion pairs. The better your data quality, the better your fine-tuned model will be. If you are working with prompts, store inputs and outputs according to Terms of Service. This data is invaluable and can later be used for fine-tuning your model. The better your data quality, the better your fine-tuned model will be. The amount of data needed to construct a well-performing model  is dependent on the use case and type of data. 
  2. Clean your data: Get rid of the instructions and keep only the inputs. The goal here is to have clean, structured data.
  3. Split your dataset: Split your dataset into training and validation sets (we suggest considering how much data you actually need for validation here instead of an arbitrary 80/20 split) to evaluate the performance of your fine-tuned model.
  4. Experiment with hyper-parameters: Test different foundation models and play around with hyper-parameters like learning rate, number of epochs, etc. The goal is to find the best cost, quality, and latency tradeoff for your specific use case.
  5. Fine-tuning: Armed with your optimized parameters, it's time to fine-tune. Be prepared - each fine-tuning task can take some time to run.
  6. Use your fine-tuned model: Once fine-tuned, use your model by passing only inputs and not the original prompts.
  7. Regularly update your model: To guard against data drift and ensure your model improves over time, repeat this process as your dataset grows and as new foundation models are released.

Considerations to Keep in Mind

Fine-tuning is a potent tool, but like any tool, its effectiveness depends on how well you wield it. Here are some considerations to keep in mind:

  • Overfitting: Be wary of overfitting - a common pitfall where the model becomes too attuned to the training data and performs poorly on unseen data.
  • Quality of the dataset: The quality of your dataset plays a pivotal role in determining the efficacy of the fine-tuned model.
  • Hyper-parameters: Choosing the right hyper-parameters can make or break your fine-tuning process.
  • Privacy and Security Implications: Ensuring the privacy of your data during the fine-tuning process is crucial. Ensure that proper data handling and security protocols are in place.

Conclusion and Next Steps

Fine-tuning models can provide significant benefits and solve many of the challenges associated with using large language models. Despite some potential pitfalls, with the right approach and considerations, fine-tuning can be a robust tool in your AI arsenal.

To delve even deeper into fine-tuning, consider exploring more resources on the topic, such as online courses, tutorials, and research papers. And remember, you're not alone on this journey. Need help getting started or fine-tuning your model? Feel free to reach out to me at akash@vellum.ai

ABOUT THE AUTHOR
Akash Sharma
Co-founder & CEO

Akash Sharma, CEO and co-founder at Vellum (YC W23) is enabling developers to easily start, develop and evaluate LLM powered apps. By talking to over 1,500 people at varying maturities of using LLMs in production, he has acquired a very unique understanding of the landscape, and is actively distilling his learnings with the broader LLM community. Before starting Vellum, Akash completed his undergrad at the University of California, Berkeley, then spent 5 years at McKinsey's Silicon Valley Office.

ABOUT THE reviewer

No items found.
lAST UPDATED
Jul 20, 2023
share post
Expert verified
Related Posts
Product Updates
October 1, 2025
7
Vellum Product Update | September
Guides
September 30, 2025
15
A practical guide to AI automation
LLM basics
September 25, 2025
8 min
Top Low-code AI Agent Platforms for Product Managers
LLM basics
September 25, 2025
8 min
The Best AI Agent Frameworks For Developers
Product Updates
September 24, 2025
7 min
Introducing AI Apps: A new interface to interact with AI workflows
LLM basics
September 18, 2025
7 min
Top 11 low‑code AI workflow automation tools
The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska
Head of Engineering

This is just a great newsletter. The content is so helpful, even when I’m busy I read them.

Jeremy Hicks
Solutions Architect

Experiment, Evaluate, Deploy, Repeat.

AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component, Use {{general-cta}}

Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component  [For enterprise], Use {{general-cta-enterprise}}

The best AI agent platform for enterprises
Production-grade rigor in one platform: prompt builder, agent sandbox, and built-in evals and monitoring so your whole org can go AI native.

[Dynamic] Ebook CTA component using the Ebook CMS filtered by name of ebook.
Use {{ebook-cta}} and add a Ebook reference in the article

Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
Button Text

LLM leaderboard CTA component. Use {{llm-cta}}

Check our LLM leaderboard
Compare all open-source and proprietary model across different tasks like coding, math, reasoning and others.

Case study CTA component (ROI)

40% cost reduction on AI investment
Learn how Drata’s team uses Vellum and moves fast with AI initiatives, without sacrificing accuracy and security.

Case study CTA component (cutting eng overhead) = {{coursemojo-cta}}

6+ months on engineering time saved
Learn how CourseMojo uses Vellum to enable their domain experts to collaborate on AI initiatives, reaching 10x of business growth without expanding the engineering team.

Case study CTA component (Time to value) = {{time-cta}}

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.

[Dynamic] Guide CTA component using Blog Post CMS, filtering on Guides’ names

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.
New CTA
Sorts the trigger and email categories

Dynamic template box for healthcare, Use {{healthcare}}

Start with some of these healthcare examples

SOAP Note Generation Agent
Personalized healthcare explanations of a patient-doctor match

Dynamic template box for insurance, Use {{insurance}}

Start with some of these insurance examples

Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.
AI agent for claims review and error detection

Dynamic template box for eCommerce, Use {{ecommerce}}

Start with some of these eCommerce examples

E-commerce shopping agent

Dynamic template box for Marketing, Use {{marketing}}

Start with some of these marketing examples

Competitor research agent
Scrape relevant case studies from competitors and extract ICP details.

Dynamic template box for Legal, Use {{legal}}

Start with some of these legal examples

PDF Data Extraction to CSV
Extract unstructured data (PDF) into a structured format (CSV).

Dynamic template box for Supply Chain/Logistics, Use {{supply}}

Start with some of these supply chain examples

Risk assessment agent for supply chain operations

Dynamic template box for Edtech, Use {{edtech}}

Start with some of these edtech examples

Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.

Dynamic template box for Compliance, Use {{compliance}}

Start with some of these compliance examples

No items found.

Dynamic template box for Customer Support, Use {{customer}}

Start with some of these customer support examples

Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.

Template box, 2 random templates, Use {{templates}}

Start with some of these agents

Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.
Retail pricing optimizer agent
Analyze product data and market conditions and recommend pricing strategies.

Template box, 6 random templates, Use {{templates-plus}}

Build AI agents in minutes

Automated Code Review Comment Generator for GitHub PRs
PDF Data Extraction to CSV
Extract unstructured data (PDF) into a structured format (CSV).
E-commerce shopping agent
React Agent for Web Search and Page Scraping
Gather information from the internet and provide responses with embedded citations.
Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.
Retail pricing optimizer agent
Analyze product data and market conditions and recommend pricing strategies.

Build AI agents in minutes for

{{industry_name}}

Competitor research agent
Scrape relevant case studies from competitors and extract ICP details.
AI agent for claims review and error detection
E-commerce shopping agent
Retail pricing optimizer agent
Analyze product data and market conditions and recommend pricing strategies.
Risk assessment agent for supply chain operations
Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.

Case study results overview (usually added at top of case study)

What we did:

1-click

This is some text inside of a div block.

28,000+

Separate vector databases managed per tenant.

100+

Real-world eval tests run before every release.