How to Evaluate AI Vendors as a CFO

AI vendor sales cycles are designed to create urgency and obscure comparability. The CFO evaluating AI tools in 2026 faces dynamics structurally similar to the CFO evaluating ERP systems in 2015: every vendor claims effortless integration, nobody demos the hard parts, the reference customers were selected for you, and the total cost of ownership numbers in the proposal bear little relation to what implementation actually costs.

I have been through enough technology evaluations to know how this goes. The framework below is built around the questions that separate vendors who will deliver from vendors who will deliver a good demo.

The urgency problem

The AI market in 2025 and 2026 is characterised by vendor-created urgency. The pitch is: your competitors are moving, the technology is advancing rapidly, first movers have an advantage, and if you wait another year you will be behind. Some version of this appears in almost every enterprise AI sales process.

Some of it is true. AI is advancing rapidly. Finance functions that invest wisely in the right tools and foundations now will be better positioned in three years. But the urgency framing encourages you to compress the evaluation process, reduce scrutiny on integration and implementation, and prioritise speed to signature over quality of decision. That is not in your interest.

A bad AI implementation costs you more than a delayed good one. Evaluate properly.

Part one: integration and data access

The most important question in any enterprise AI evaluation is: how does it actually connect to your existing systems?

In a demo, integration looks simple. The vendor shows you data flowing from your ERP to their platform, producing a result in seconds. What the demo does not show you is what happened in the three months before the demo to make that data flow work.

Ask these questions directly.

What systems have you integrated with in production? Not “we support integration with” or “our platform is compatible with.” Which specific customer implementations are live with the ERP version you are running, in the configuration you are running it in?

What does the integration require from your IT team? A pre-built connector that deploys in hours is a different thing from a custom integration requiring three months of professional services. Be specific about which one you are buying.

What is the data extraction method? Some AI tools work by reading data directly from your ERP database. Some work via API. Some require a data export that you maintain. Each has different implications for maintenance, security, and data freshness. Know which one you are buying and what the ongoing operational requirement is.

What are the data format requirements? If the tool requires data in a specific format and your data is not in that format, there is pre-processing work required. Who does it, how long does it take, and what happens when your source data format changes?

Part two: model transparency and explainability

In finance, explainability is not a nice-to-have. It is an audit requirement.

When an AI tool makes a recommendation, an approval, or a posting, you need to be able to explain to an auditor why that recommendation was made and what data it was based on. “The AI said so” is not an acceptable audit trail. “The AI matched this invoice to this purchase order based on these three data fields, with a confidence score of 94%, and the match was approved by these credentials at this timestamp” is an audit trail.

Ask the vendor to show you the audit trail for a specific decision in a real customer environment. Not a diagram of how the audit trail works: the actual logged output for an actual transaction. If they cannot show you this, that tells you something about whether their tool is built for finance professionals operating in a regulated environment, or for a different market where audit accountability is a lower priority.

Ask specifically about explainability for exceptions. When the AI cannot make a decision with confidence and escalates to a human, what information does the human see? Can the human understand the decision logic well enough to make an informed judgment?

Model explainability also matters for continuous improvement. If you cannot understand why the model is making the decisions it makes, you cannot improve it when it makes wrong ones. You can only retrain it and hope the problem goes away. That is not a sustainable approach in a finance environment where accuracy requirements are high and errors have consequences.

Part three: exception handling

Every AI tool makes errors. In a finance context, the question is not whether errors happen. The question is: how does the exception workflow work, and is it designed for a finance professional or an engineer?

Ask the vendor to walk you through a specific error scenario: the AI makes a wrong matching decision, it gets approved by a finance team member, and it is later identified as incorrect. What is the process? How is it reversed? What is the audit trail? How does the system learn from it to reduce recurrence?

The quality of exception handling tells you a lot about how the vendor thinks about their product’s deployment environment. A vendor whose product is designed for finance has thought carefully about this. A vendor whose product was designed for a different context and subsequently repositioned at finance typically has not.

Ask about escalation logic. When does the AI escalate versus decide autonomously? What are the confidence thresholds? Can you configure them? Overly confident systems that rarely escalate produce errors that are hard to catch. Overly cautious systems that escalate everything are expensive overhead. You want configurable thresholds that can be tuned over time as you understand the error distribution in your specific context.

Part four: implementation reality

The most valuable thing you can do in an AI vendor evaluation is talk to customers who implemented six months ago, not six weeks ago.

At six weeks, the implementation team is still on site, the customer is in the honeymoon period, and the problems are mostly known and being resolved. At six months, the implementation team has left, the customer is running it independently, and any structural problems have become visible.

Ask for five customer references. Specify that you want customers who have been in production for at least six months with a similar ERP, similar team size, and similar use case. If the vendor offers a curated list of two or three, that is a signal. Good vendors have enough successful customers that this request is easy to fulfil.

When you speak to those customers, ask: what took longer than expected? What was harder than the vendor said it would be? What would you do differently if you started again? What does your team actually use it for today versus what you planned at the start?

The parallel to ERP implementations is instructive. The failure modes in ERP implementations are well-documented and they repeat. AI implementations are showing similar patterns: underestimated integration complexity, data quality problems that were known but not addressed before go-live, change management treated as an afterthought, and a gap between the scope agreed in the contract and the scope that was delivered.

A vendor who has been through a difficult implementation and learned from it is often a better choice than one who has only had easy ones. Ask them what they changed in their implementation methodology because of problems they encountered.

Part five: total cost of ownership

The licensing number in the proposal is rarely the biggest cost in an AI implementation. In most cases it is not close to the biggest cost.

Licensing. The number you see first. Understand whether it is per user, per transaction volume, per module, or a flat fee. Understand what happens to the cost as your transaction volumes grow or your team expands.

Integration and professional services. The upfront integration work is usually scoped in the proposal. The ongoing maintenance of that integration when your ERP is upgraded, when your data structures change, or when you add entities is frequently not. Ask specifically what the ongoing integration maintenance cost looks like after year one.

Data preparation. Most AI tools require data preparation work before go-live. Who does this? If it is your team, what is the time cost? If it is the vendor, what is the fee? If it is a third party, how is that scoped?

Change management and training. This is almost always underestimated. Finance teams used to doing things a particular way take time to change their behaviour. The cost of that change management, done properly, is significant. Done improperly, it is the reason your AI tool sits underused two years after go-live.

Ongoing configuration and optimisation. AI tools require tuning over time. Thresholds need adjustment as you learn the error patterns. New exception categories need to be added. The model may need periodic retraining. Who does this work and what does it cost?

Add these together and compare against the productivity and accuracy gains from your pilot measurement. The CFO AI strategy framework covers how to structure that measurement. If total cost exceeds the measurable return in year one, that does not necessarily mean the investment is wrong. It means you need to be clear about what you are buying and over what time horizon it pays back.

The question that matters most

The best AI vendor for your finance function is the one whose tool you are ready to use properly.

Every evaluation comes back to this. A sophisticated agentic AI platform is no use to a finance function with inconsistent data and undocumented processes. A well-designed narrow tool that does one thing well and integrates cleanly with your existing systems may deliver more value in year one than a more ambitious platform your function is not positioned to deploy fully.

Before you sign anything, complete the AI readiness assessment for your finance function. Know what you are ready for. Then choose a vendor whose tool fits where you actually are, not where their pitch imagines you to be.

The urgency in the market is real. The consequences of getting this wrong are also real. Evaluate properly. Build on foundations that will make the investment worthwhile.

Explore the complete AI in Finance Strategy for the full framework within which vendor selection sits.

Maebh Collins is a Fellow Chartered Accountant (FCA, ICAEW) with Big 4 training and twenty years of operational experience as a founder and senior finance leader. She writes about AI in finance transformation from the inside out.

Back to Blog | AI in Finance →