A deep dive into Fin's answering speed

We know that a fast, responsive AI assistant is crucial for a great customer experience. This article provides a transparent look into our philosophy on speed, the significant performance improvements we’ve made, how we monitor performance, and why we will always prioritize the quality and accuracy of Fin's answers above all else.

Quality and accuracy first

Our primary goal for Fin is to be the best and most powerful AI agent for customer service. This means we prioritize quality above all else: delivering high resolution rates, handling complex queries, and following your support procedures in your unique brand voice. Since launching in 2023, Fin’s average resolution rate has climbed from 23% to 67%, with many customers seeing rates in the 70-90% range.

Achieving this level of quality requires sophisticated engineering and the use of cutting-edge language models, which are often not the fastest. Over time, Fin has become far more powerful and configurable. You can now use guidance to align Fin with your company's voice and policies, and Procedures allow Fin to automate complex queries like refunds and transaction disputes. While some competitors may choose to optimize for speed over quality, we will not make that tradeoff.

Our journey to improve Fin's speed

While quality is our priority, the user experience is critical. By November 2024, as we continued to add powerful capabilities, Fin's median Time to First Token (TTFT) had climbed to 17 seconds. While this was still dramatically faster than median human support times of 19 minutes, it didn’t feel good enough.

Our engineering team was not happy with the experience and invested a substantial amount of effort in making Fin faster. The team worked hard, and by early 2025, we had achieved a median TTFT of approximately 8 seconds and a 95th percentile of around 20 seconds.

Key improvements included:

Rearchitecting Fin: We completed a major rewrite of Fin's internals, which yielded broad performance improvements beyond just speed.
Optimizing Core Logic: We made significant changes to make fewer calls to large language models (LLMs), increase parallelization, and use new, more efficient LLMs.
Improving Messenger Integration: For conversations in the Intercom Messenger, we kick off Fin’s response as early as possible and stream it back in real-time.

Why some answers require more time

Behind the scenes, Fin's system is a complex combination of different services working together. When a customer's query requires an external action, like checking an order status from Shopify or looking up a user in your database, Fin has to call that external service and wait for it to respond before providing an answer. Each of these dependencies can add a few seconds to the total response time.

Here are a few common examples of these steps:

Tasks and data connectors: If you've set up Fin to use an external service (like checking an order status), it has to wait for that service to provide the information.
Image recognition: When a customer includes an image in their message, it takes a few extra seconds for Fin to process and understand what's in it.
Attribute classification: If your workflow is set up to automatically classify, tag, or set a priority based on a conversation's content, each of those steps can add a slight delay.

How we measure and maintain Fin's speed

To get a true sense of the experience from your customers' point of view, we measure what we call Time to First Token (TTFT) This metric captures the exact time from when a customer hits "send" to when Fin's response first begins to appear. This reflects the real end-user experience and keeps our measurements aligned with how Fin actually feels in use.

Our engineering team is constantly monitoring latency and is always looking for ways to make Fin faster. To ensure performance stays on track, we have implemented several internal processes:

Internal SLOs: We introduced Service Level Objectives (SLOs) to ensure that any time performance targets are breached, they are investigated promptly.
Monitoring Outliers: In addition to core metrics, we receive weekly reports showing the customers with the slowest Fin experiences. This allows us to discover and address issues that impact these specific customers.

We believe current performance levels are close to the practical limit of what is possible given our primary focus on quality and configurability. Our priority is to maintain this speed while continuing to improve resolution rates and capabilities. Please note that new features sometimes add latency when first introduced, but we work to refine and optimize them over time.

Configuring Fin for Speed

Customers who are sensitive to latency can choose to trade off some of Fin's power for more speed. Using Fin in its simplest mode will result in significantly faster replies than setting it up with more advanced features like Guidance, Tasks, and Actions.

The table below shows how different configurations can impact response times. We believe the added capabilities are worth the latency, especially when compared to human support alternatives that can be two orders of magnitude slower.