Sign up for our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn more


More than 40% of marketing, sales, and customer service organizations have embraced generative AI, placing it second only to IT and cybersecurity. Of all the generative AI technologies, conversational AI will spread rapidly across these industries due to its ability to bridge the current communication gaps between businesses and customers.

Yet many marketing leaders I’ve spoken with remain stuck at the crossroads of how to implement that technology. They don’t know which of the available major language models (LLMs) to choose, and whether to go open source or closed source. They worry about spending too much money on a new and unfamiliar technology.

Businesses can buy off-the-shelf conversational AI tools, but if they need them to be an essential part of their business operations, they can build them themselves.

To help lower the fear factor for those who choose to build, I wanted to share some of the internal research my team and I did in our own quest to find the best LLM to build our conversational AI. We spent some time looking at the different LLM providers and how much you should expect to pay for each, depending on the inherent costs and the type of usage you expect your audience to have.

We chose to compare GPT-4o (OpenAI) and Llama 3 (Meta). These are two of the main LLMs that most companies will be weighing against each other, and we consider them to be the highest quality models out there. They also allow us to compare a closed source (GPT) and an open source (Llama) LLM.

How do you calculate the LLM costs for a conversational AI?

The two most important financial considerations when choosing an LLM are the start-up costs and the final processing costs.

Startup costs include everything needed to get the LLM up and running towards your end goal, including development and operational costs. Processing costs are the actual cost of each call once your tool is live.

When it comes to installation, the cost-to-value ratio depends on what you are using the LLM for and how much you will be using it. If you want to deploy your product as quickly as possible, then you might be happy to pay a premium for a model that comes with little to no settings, like the GPT-4o. The Llama 3 can take weeks to set up, when you could be fine-tuning a GPT product for market.

However, if you manage a large number of clients or want more control over your LLM, you may be better off paying the higher start-up costs up front and reaping the benefits later.

When it comes to conversation processing costs, we look at the use of tokens, as this allows for the most direct comparison. LLMs like GPT-4o and Llama 3 use a basic metric called a “token” — a unit of text that these models can process as input and output. There is no universal standard for how tokens are defined across different LLMs. Some calculate tokens per word, per subword, per character, or other variations.

All these factors make it difficult to directly compare LLMs to each other. However, we have managed to approximate this by simplifying the inherent costs of each model as much as possible.

We found that while GPT-4o is cheaper in terms of initial cost, Llama 3 proves to be exponentially more cost effective over time. Let’s take a look at why, starting with the installation considerations.

What are the basic costs of each LLM?

Before we dive into the cost per session for each LLM, we first need to know how much it will cost us to achieve that.

GPT-4o is a closed source model hosted by OpenAI. This means you only need to set up your tool to ping GPT’s infrastructure and data libraries via a simple API call. Minimal setup is required.

Llama 3, on the other hand, is an open source model that needs to be hosted on your own private servers or on cloud infrastructure providers. Your company can download the model components for free — after that, it’s up to you to find a host.

Hosting costs are a consideration here. Unless you buy your own servers, which is relatively uncommon in the beginning, you will have to pay a cloud provider a fee to use their infrastructure — and each provider may have a different way of adjusting the pricing structure.

Most hosting providers will “rent” you an instance and charge you per hour or per second for the compute capacity. For example, AWS’s ml.g5.12xlarge instance charges per server time. Others may bundle usage into multiple packages and charge you a fixed annual or monthly fee based on a variety of factors, such as your storage needs.

However, Amazon Bedrock charges based on the number of tokens processed, meaning it can prove to be a cost-effective solution for your business even if your usage volumes are low. Bedrock is a managed, serverless platform from AWS that also simplifies LLM deployment by handling the underlying infrastructure.

In addition to the direct costs, getting your conversational AI up and running on Llama 3 will require you to spend a lot more time and money on operations, including the initial selection and setup of a server or serverless option and performing maintenance. You will also need to spend more on developing things like error logging tools and system alerts for any issues that may arise with the LLM servers.

The most important factors to consider when calculating fundamental cost-performance are implementation time, level of product usage (if you’re handling millions of calls per month, the installation costs will quickly be offset by the bottom-line savings), and the level of control you need over your product and data (open source models work best here).

What are the costs per call for large LLMs?

Now we can examine the basic cost of each conversation unit.

For our modeling we used the heuristic: 1,000 words = 7,515 characters = 1,870 tokens.

We assumed that the average consumer conversation would contain 16 messages between the AI ​​and the human in total. This equated to an input of 29,920 tokens and an output of 470 tokens — for a total of 30,390 tokens. (The input is much higher due to prompt rules and logic.)

On GPT-4o, the price per 1,000 input tokens is $0.005 and per 1,000 output tokens is $0.015, resulting in a ‘benchmark’ call costing approximately $0.16.

GPT-4o input/outputNumber of tokensPrice per 1,000 tokensCosts
Input Tokens29,920$0.00500$0.14960
Output Tokens470$0.01500$0.00705
Total cost per call$0.15665

For Llama 3-70B on AWS Bedrock, the price per 1,000 input tokens is $0.00265 and per 1,000 output tokens is $0.00350, resulting in a ‘benchmark’ call costing approximately $0.08.

Llama 3-70B input/outputNumber of tokensPrice per 1,000 tokensCosts
Input Tokens29,920$0.00265$0.07929
Output Tokens470$0.00350$0.00165
Total cost per call$0.08093

In summary, once the two models are fully configured, the cost of a conversation on Llama 3 would be almost 50% lower than an equivalent conversation on GPT-4o. However, all server costs would have to be added to the Llama 3 calculation.

Keep in mind that this is just a snapshot of the full cost of each LLM. Many other variables come into play as you build the product for your unique needs, such as whether you use a multi-prompt approach or a single-prompt approach.

For companies looking to deploy conversational AI as a core service, but not as a fundamental element of their brand, the investment in building AI themselves may not be worth the time and effort compared to the quality you get from off-the-shelf products.

Whichever path you choose, integrating conversational AI can be incredibly useful. Just make sure you’re always guided by what makes sense for the context of your business and the needs of your customers.

Sam Oliver is a Scottish tech entrepreneur and serial startup founder.

Data Decision Makers

Welcome to the VentureBeat community!

DataDecisionMakers is a place where experts, including the technical people who work with data, can share insights and innovations in the field of data.

Want to learn more about groundbreaking ideas and current information, best practices and the future of data and data technology? Come to DataDecisionMakers.

You may even consider contributing an article of your own!

Read more from DataDecisionMakers