Demystifying AI Models: How to Choose the Right Ones

AIBusiness IntelligenceData StorageDigital transformationInnovationNetworksRegulationSecurity
Juan Bernabe-Moreno, IBM Research Director for UK&I

Large Language Models (LLMs) have revolutionized artificial intelligence, transforming how businesses interact with and generate content across various mediums. As foundational tools of generative AI, LLMs offer immense potential but also raise important challenges around trust, transparency, and efficiency, making it crucial for companies to choose the right AI strategy.

Large Language Models (LLMs), foundation models trained on massive text datasets, have transformed AI and taken the world – and businesses – by storm. Ever evolving, they continue to cement themselves as the basis of generative AI in the enterprise. LLMs are enabling AI platforms to generate, understand and interact with content, including language, images, or code – with enormous implications on today’s businesses and consumers.

While their popularity is undisputable, the rise of LLMs, these fundamental building blocks of AI, has also led to debates around performance, energy efficiency and data provenance. Proprietary LLMs typically do not provide much transparency into the data used to train them, or give the user the ability to tune them further. It makes it difficult for businesses to determine which models are safe and can be trusted.

Companies also often struggle to choose between LLMs built as general purpose, ‘one-size-fits-all’ models for all tasks versus smaller, purpose-built models for specific use cases. And finally, there are simply a lot of models on the market to pick from, making it tricky to develop the right strategy to maximise the benefits of AI while minimising the risk and cost. 

Overall, while most of our publicly available data has already been folded into foundation models, on the business side of things, the situation is different. Only about 1% of enterprise data is in LLMs today – which is simply not enough to unlock real value from AI for any company.

While features like context windows that define the amount of data a user can fit in a prompt, or latency – the time a model takes to generate an output – are important, what are the other crucial elements to consider when choosing an LLM strategy best fitted for your business?

Open Source models to build trust

First, you’d have to choose between Open-Source and proprietary models—and make sure that whatever model you choose is a trusted one.

Businesses should be in control of their AI and know where the data the model is trained on comes from. Data transparency is often an issue for proprietary models, and open source is an important alternative. Open and transparent AI is beneficial for society as it spurs innovation and helps ensure AI safety. An open approach democratises access to a lot of foundational and broadly applicable advances and leads to more scrutiny from the tech community, boosting safety and security. That’s exactly why IBM, for example, open-sourced a wide selection of its most recent family of LLMs, the Granite models.

However, open source models, just like proprietary models, should be properly governed to mitigate and minimise the risks of AI and ensure they are secure, safe, trustworthy, and fair. This goal underpins the AI Alliance IBM co-launched with Meta last December and other efforts, such as the AI Governance Alliance of the World Economic Forum and the EU’s European AI Alliance. Then there is the the EU’s AI Act â€“ the world’s first comprehensive set of AI laws aimed at protecting our safety and fundamental rights as AI moves ever closer to being embedded in our daily lives.

While regulation on global and national levels is important, it’s imperative for companies to constantly monitor their models nonetheless – especially the data and the outcomes. Specific governance toolkits can automate and simplifuy this process, and IBM’s watsonx.gov is one of them. Such systems monitor and improve LLMs outcomes continously, making it easier for companies to manage risks due to the quality of the training data by increasing transparency around how the model makes decisions and ensuring compliance with technology regulation.

Task-specific models to cut costs

Another crucial consideration is the level of customisation, or the ability to modify a model’s output based on new data input. It’s important to have the ability to easily adapt a model to new scenarios and use cases. 

Fine-tuning is especially important. Consider the IBM Granite family of foundation models. Instead of creating a new model each time, users can build on Granite, applying methodologies like InstructLab to create fine-tuned models for specific use cases. InstructLab allows for continuous development of base models through a series of community contributions, enabling the model to ‘learn’ similar to the way people do. Not only does it cut costs, this method also drastically reduces the carbon footprint of AI.

Another example is geospatial data. Satellites that monitor Earth accumulate an immense data volume daily. Teams at IBM and NASA tapped into this opportunity to develop an Earth observation geospatial foundation model, which they made open-source so it can be fine-tuned for different tasks – be it flood detection to forest restoration. In the future, the model could also be used for better wildfire behaviour forecasts and urban heatwave predictions.

Smaller models for greater efficiency

The next thing to consider is the model size. General purpose LLMs are large and expensive both to train and run. Thus, it may be best to select specific models for the company’s unique use cases, a more cost and energy efficient approach. Additionally, one-size-fits-all models could sometimes struggle with performance across a broad range of tasks, so smaller models created to serve specific needs could be more effective.

The latest advances have shown that reducing the size of purpose-built models doesn’t jeopardise their performance or accuracy – one example is the TinyTimeMixer model recently open-sourced by IBM, a time-series forecasting model created to help businesses better predict changes over time in a given asset, security, or economic variable. With less than 1 million parameters, TTM is the number one most downloaded time series foundation model on Hugging Face in its weight class, and still outperforms SOTA models that are 1000 times larger with billions of parameters. Hugging Face’s Open LLM Leaderboard ranks foundation models across a range of general benchmarks – and it’s easy to see that smaller, purpose-built models with a fraction of the parameter count and training resources often deliver similar benchmark metrics as the one-size-fits-all options.

From assistants to agents

Inference options are also important, as models vary from self-managed deployment to API calls. LLMs first gave users digital ‘assistants,’ like a code assistant that helps people understand, transform and validate code. IBM’s Granite Code model, for example, is trained on 116 programming languages.

But now LLMs are rapidly evolving towards nearly autonomous ‘agents’ – AI that can go beyond outsourcing tasks and modify its behaviour based on new inputs. These agents are now able to plan a trip, check how much vacation you have left or get your latest bank statement.

To deal with such tasks, the model has to call upon the right API (application programming interface) from a conversational prompt to be able to use various software tools – like calculators or models to search the web. Dubbed ‘function calling’, this capability is an important measure of LLM competency. As they evolve, in the future these AI agents should be able to do even more sophisticated tasks like acting on feedback from customers.

It’s imperative for businesses to have more enterprise data in neural networks, in trusted models with data transparency running on governed platforms. Once you’ve chosen your model – a trusted, ideally open-sourced base model – begin encoding data in a systematic way. Then you can deploy, scale – and start creating value with your AI. In our world of endless streams of data, we need all the help we can get to make use of it.

As it continues to evolve, AI designed for the enterprise is perfectly positioned to do just that.

By Juan Bernabe-Moreno, IBM Research Director for UK&I

Latest Whitepapers