AI-Powered End-to-End Testing | Applitools

Engineering a Playwright-Native Developer Experience: One Flag, Three Strategies

Noam Gaash — Thu, 19 Mar 2026 20:19:13 +0000

Hello everyone! I’m Noam, an SDK developer on the Applitools JS-SDKs team. While my day-to-day focus is on core engineering, I work closely with our field teams and occasionally join technical deep-dive sessions with customers.

In these conversations, we frequently encounter questions about performance and the engineering philosophy behind our integration. Specifically, there is often curiosity about how to make visual testing feel more “Playwright-native” and natural to developers.

In this post, I’ll share the design logic behind these architectural choices so you can apply these patterns in your own CI pipelines in a way that fits your organization’s needs.

Adding `unresolved` to Playwright

Integrating visual regression testing into Playwright requires combining two different status models: Playwright’s binary Pass/Fail and the visual testing concept of unresolved.

In visual testing, instead of having two (passed and failed) states, there’s an additional third state: unresolved. This state indicates a difference was detected, but a human decision is required to determine if it is a bug or a valid change that should be approved as a new baseline.

Playwright doesn’t support this third state out of the box. Visual test maintenance using Playwright’s native toHaveScreenshot API forces the developer into a cumbersome cycle requiring three separate test executions:

First, the developer needs to run to see the failure.
Then, they need to run with the --update-snapshots flag to create new baseline images.
Then, most developers would run again to validate that everything works with the updated baseline as expected—which isn’t always the case, because the Playwright native comparison method (pixelmatch) tends to be very flaky, unlike Visual AI.

After this local cycle, the developer must commit the new baseline images to the repository—bloating the git history—and wait for a new CI execution to provide final feedback. For dev-centered organizations that focus on feedback loop velocity, this workflow is… suboptimal. Personally, I believe that’s one of the reasons visual testing isn’t as popular as it should be among Playwright users.

When we engineered the Applitools fixture, one of our goals was to support this Unresolved state natively, without disrupting Playwright’s core lifecycle—specifically its Worker Processes and Retry mechanisms.

The solution rests on two key engineering decisions: moving rendering to the background (async architecture) and giving developers control over the exit signal and performance tradeoffs (failTestsOnDiff).

We don’t block test execution when Applitools is rendering

The core value of visual testing lies in AI-based comparison to eliminate false positives and multi-platform rendering.

Architecturally, these processes are cloud-native services.

AI-as-a-Service: Just like massive LLMs or other generative models, the Visual AI engine runs on specialized cloud infrastructure optimized for heavy inference. It cannot simply be “installed” on a lightweight CI agent.
Platform Constraints: Authentic cross-platform rendering (e.g., iOS Safari on a Linux CI agent) is physically impossible on a single local machine.

Since these operations inherently occur remotely, performing them synchronously would force the local test runner to idle while waiting for network round-trips and cloud processing.

To solve this, we designed the fixture around an asynchronous architecture:

Instant Capture: When eyes.check() is called, we synchronously capture the DOM and CSS resources (instead of a rasterized image). This operation is extremely fast.
Immediate Release: We purposefully use soft assertions by design. We release the Playwright test thread immediately so the functional logic can proceed to the next step or test case without blocking.
Background Heavy Lifting: The heavy work—uploading assets, rendering across different browsers and operating systems, and performing the AI comparison in the Applitools cloud—starts immediately in the background, managed by the Worker process.

The “Draining Queue” Effect

This architecture explains why the Playwright Worker sometimes remains active after the final test completes.

The background tasks are limited only by your account’s concurrency settings, and the screenshot size. For example, when rendering a 10,000 px page on a small mobile device, the rendering infrastructure might need time for scrolling and stitching. If your functional tests execute faster than the background workers can process the queue (rendering & comparing), the Worker process stays alive at the end solely to “drain the queue” and ensure data integrity.

While it does ensure your test logic runs at maximum speed, offloading the processing cost to the background, this experience might cause friction and frustration as the developers see that workers are “hanging” after tests are completed. When facing such issues, our support team is here to advise and assist with various solutions—we can investigate execution logs and if needed even make custom suggestions to tailor Eyes-Playwright to your needs.

Solving the Matrix Problem

Standard Playwright documentation recommends defining multiple projects in playwright.config.ts to cover different browsers (Chromium, Firefox, WebKit) and various viewport sizes.

While this ensures coverage, it introduces a linear performance penalty (O(N)). To test three browsers across two viewports, your CI must execute the functional logic (clicks, waits, navigation) six times. It’s 6x more load on the CI machine and the testing environment.

We recommend shifting this workload to the Ultrafast Grid (UFG).

In this mode, you execute the Playwright test once, typically on Chromium. We upload the DOM state, and our cloud infrastructure renders that state across all configured browsers and viewports in parallel.

This transforms an O(N) execution problem into an O(1) execution problem, significantly shortening the feedback loop.

The Strategy: `failTestsOnDiff`

Since the actual comparison happens asynchronously and potentially completes after the test logic finishes, we need a mechanism to map the visual result back to the Playwright status.

This is controlled by the failTestsOnDiff flag. It’s not just a boolean; it’s a strategic choice for your CI pipeline.

Strategy A—Recommended for CI: `failTestsOnDiff: false`

The Logic: This is the configuration our own Front-End team uses. We believe that Visual Change ≠ Test Failure.
Behavior: The Playwright test passes (Green). The unresolved status is reported externally via our SCM integration (GitHub/GitLab).
Why: Retrying a visual test is computationally wasteful—the pixels won’t change on the second run. By keeping the test “Green,” we avoid triggering Playwright’s retry mechanism. The decision is moved to the Pull Request, where it belongs.

Read more about SCM integration or hop directly to our GitHub, Bitbucket, Gitlab or Azure Devops articles.

Strategy B—Recommended for Strict Pipelines: `failTestsOnDiff: 'afterAll'`

The Logic: You need a “Red” pipeline to block deployment, but you want to avoid the noise of retries and gain a significant performance improvement.
Behavior: Individual tests pass, but the Worker Process exits with a failure code if any diffs were found in the suite.
Why: This provides a hard gatekeeper for the build status. It allows the Eyes rendering farms to continue processing visual test results in the background without blocking the execution thread, allowing the worker to move on to handle more tests efficiently.

Strategy C—Recommended for Local Debugging: `failTestsOnDiff: 'afterEach'`

The Logic: Immediate feedback loop.
Behavior: Fails the test immediately in the afterEach hook.
Why: Best for local development where you want to see the failure immediately in the console. It is also useful if you use the trace: retainOnFailure setting in Playwright, as it ensures traces are preserved for unresolved visual assertions. Not recommended for CI due to the retry loops described above.

TL;DR – When to use each setting

Mode	`afterEach`	`afterAll`	`false`
Performance	Less performant The Playwright worker will wait after each test for all renders to be completed and for the visual AI to compare the results	Best performance The Playwright workers will collect the resources and manage the rendering and Visual AI comparisons in the background	Best performance Similar to `afterAll`
Observability	Best Applitools reporter will show all statuses correctly, other reporters will consider `unresolved` tests as failing	Good Applitools reporter will show all statuses correctly, other reporters will consider `unresolved` tests as passing. You will get a failure of the worker process, and other reporters won’t link it to a specific test case.	Great in pull request (If SCM integration is enabled). The Applitools reporter will reflect the tests perfectly. Other reporters will consider `unresolved` tests as passing.
Best fit	Local testing	Local testing AND CI environments without SCM integration	CI environments with SCM integration

Closing the Visibility Gap: The Custom Reporter

If you adopt Strategy A (false) or Strategy B (afterAll), you introduce a secondary challenge: Visibility.

Since Playwright technically marks these tests as Passed to avoid retries, the standard Playwright HTML Report will show them as “Green,” potentially masking unresolved visual differences that require attention.

To bridge this gap without forcing developers to switch context, we developed a Custom Applitools Reporter.

This reporter extends the standard Playwright HTML report. It injects the actual visual status (Passed, Failed, or unresolved) directly into the test results view.

True Status: You see which tests have visual diffs, even if the Playwright exit code was successful.
Direct Links: It provides a direct link from the test report to the specific batch results in the Applitools Dashboard.
Context: It enriches the report with UFG render status and batch information.

This ensures you get the best of both worlds: The optimization of a “Green” CI run (no retries), with the transparency of a report that highlights exactly where manual review is needed.

Summary

The Applitools Playwright fixture is designed to be non-blocking and scalable. By leveraging asynchronous architecture and Applitools UltraFast Grid, we offload the heavy lifting from your CI. By correctly configuring failTestsOnDiff, you ensure that your pipeline reflects your team’s engineering culture—whether that’s strict gating or modern, PR-based visual review.

Quick Answers

What is visual regression testing in Playwright

Visual regression testing in Playwright verifies that changes to an application’s UI do not introduce unintended visual differences. Playwright can perform basic visual regression checks using screenshot comparisons like toHaveScreenshot, while dedicated visual testing tools (such as Applitools Eyes) extend this by detecting meaningful UI changes, managing baselines, and enabling review workflows for approving visual updates.

What is the best way to do visual testing in Playwright?

Playwright supports basic visual testing through screenshot comparisons such as toHaveScreenshot, but this approach can become difficult to maintain at scale. Dedicated visual testing tools, like Applitools Eyes, extend Playwright by adding Visual AI comparison, cross-browser rendering, and review workflows that allow teams to detect visual regressions without maintaining large sets of screenshot baselines.

How does Playwright screenshot testing (toHaveScreenshot) compare to visual regression testing tools?

Playwright’s toHaveScreenshot performs pixel-by-pixel image comparisons against stored baseline images. While this works for simple cases, it often requires updating and maintaining many snapshots. Visual regression testing tools like Applitools Eyes use Visual AI to detect meaningful UI changes while ignoring insignificant rendering differences, provide review workflows to approve or reject visual changes, and allows custom match levels for different regions of the screen.

Can Playwright run visual tests across multiple browsers and devices

Yes, but with a limited scope. Natively, Playwright supports three browser engines (Chromium, Firefox, and WebKit), but it does not execute tests across different real operating systems or mobile devices. This lack of OS-level rendering limits coverage and imposes a risk of missing platform-specific visual bugs. For example, see how a frontend team caught a visual bug specific to Mac Retina screens that a standard engine check would miss.

How can you run cross-browser visual tests in Playwright without running tests multiple times?

Normally, cross-browser testing requires executing the same tests separately for each browser configuration. Tools like Applitools Ultrafast Grid allow tests to run once while visual rendering is executed across multiple browsers and viewport combinations in parallel. This removes the need to multiply test execution across the full browser matrix.

Why is cross-browser testing in Playwright so slow?

Natively, cross-browser testing introduces a significant performance penalty. Playwright must execute the entire test logic (clicks, waits, network requests) separately for every browser and viewport configuration. Modern visual testing tools (e.g., Applitools Ultrafast Grid) eliminate this overhead by executing the test logic just once locally, performing the cross-browser rendering and visual comparison in parallel in the cloud.

The post Engineering a Playwright-Native Developer Experience: One Flag, Three Strategies appeared first on AI-Powered End-to-End Testing | Applitools.

A New Chapter in Customer Success at Applitools

Applitools Team — Wed, 11 Mar 2026 19:28:19 +0000

By Kunal Rao, Chief Customer Officer, Applitools

Teams adopt Applitools to deliver results, not just run tests. That means faster releases, greater confidence in test results, and stronger engineering productivity.

Our responsibility is to ensure that value is realized consistently and without surprises.

To make that experience even stronger, we are evolving how we partner with our customers through Customer Success. The goal is simple: every customer should clearly understand what success looks like, how we are progressing toward it, and how our work together is driving meaningful outcomes for their business.

This evolution brings a more structured and proactive way of working together—built around joint planning, proactive engagement, and shared accountability.

Aligning Early on What Success Looks Like

Every good partnership starts with clarity.

Early in our relationship, we will work with you to define what success means in measurable terms. Together, we will identify your most important use cases, the outcomes you want to achieve, and the milestones that indicate progress.

These goals will be captured in a Joint Success Plan—a simple, shared framework that outlines priorities, stakeholders, timelines, and measurable outcomes. This ensures there is always a clear view of what we are working toward and how progress will be tracked.

Accelerating Time to Value

One of the most important goals of Customer Success is helping teams realize value quickly.

Rather than trying to implement everything at once, we will focus on the shortest path to meaningful impact. Our onboarding approach will prioritize your highest-value workflows and provide clear, prescriptive guidance to get them running smoothly.

This includes:

Structured onboarding plans focused on your top use cases
Practical enablement and proven templates based on real customer success stories
Clear milestones that show progress toward measurable outcomes

The result is faster adoption, early wins, and a strong foundation for long-term success.

Proactive Engagement—Before Issues Become Blockers

Customer Success should not be reactive.

We are investing in better signals and monitoring to understand adoption patterns, product usage, and areas where teams might benefit from additional support.

This allows us to reach out proactively with insights, recommendations, and best practices—often before small issues become larger blockers.

Our goal is to ensure you always have the guidance you need to move forward confidently.

Clear Progress and Shared Accountability

Transparency is essential to a strong partnership.

Through our success plans and regular engagement cadence, you will always have visibility into:

Your goals and milestones
Current progress and upcoming priorities
Action plans for any risks or blockers

We will also hold Executive Business Reviews (EBRs) to ensure alignment at the leadership level and maintain accountability on both sides. These conversations help us step back, evaluate progress, and ensure Applitools continues to support your evolving priorities. We will also share roadmap updates and gather your input as we shape the product around our customers’ needs.

Coordinated Support Across Teams

When additional expertise is needed, we will bring the right people to the table—whether from Support, Product, or Services.

Our goal is to provide a coordinated experience so you never feel like you are navigating the organization alone or repeating the same information across teams.

Instead, you will have a unified plan and a single partnership focused on your goals.

What Happens Next

Your Applitools team will work with you to confirm your success goals and establish a cadence for check-ins and value reviews tailored to your needs. Together, we will ensure that your priorities remain aligned with the outcomes you want to achieve with Applitools.

If there is a specific initiative you want to accelerate this quarter, we encourage you to share it with us. We will help you build a plan and move it forward.

Our Commitment

At Applitools, we are focused on one thing: helping you get measurable value from our product—consistently and predictably.

To support that commitment, we are building a new operating model that clarifies how we work together, how we measure progress, and how we continually improve the experience for our customers.

We will continue refining this approach based on what we learn from every engagement—your feedback, adoption patterns, support interactions, and renewal conversations.

Our promise is straightforward:
We will show up as a proactive partner, stay accountable to outcomes, and help your team realize the full value of your investment in Applitools.

Thank you for trusting us with that responsibility.

— Kunal

Kunal Rao

Chief Customer Officer, Applitools

Kunal Rao is a seasoned customer success and go-to-market executive with more than 20 years of global experience building, scaling, and transforming post-sales organizations at technology companies. Read more.

The post A New Chapter in Customer Success at Applitools appeared first on AI-Powered End-to-End Testing | Applitools.

What Test Execution Demands That Generative AI Can’t Guarantee

Applitools Team — Thu, 26 Feb 2026 19:39:00 +0000

TL;DR
• Generative AI is highly effective for creating tests, data, and analysis, but execution has different requirements.
• Test execution demands repeatability, determinism, and explainable failures.
• Probabilistic systems, including LLMs, introduce variability that leads to flaky tests and loss of trust.
• Teams that separate where generative AI helps from where deterministic execution is required scale testing more reliably.

Generative AI has dramatically changed how teams create tests. Requirements can be translated into test cases in seconds. Automation scripts can be bootstrapped with natural language. Test data can be generated on demand.

But many teams are discovering an uncomfortable truth: faster test creation does not automatically lead to more reliable releases.

Execution is where confidence is earned or lost. And test execution demands guarantees that generative AI—including large language models (LLMs)—was never designed to provide.

Where generative AI fits well in testing

Generative AI excels in parts of the testing lifecycle that tolerate variation. These are areas where approximation is acceptable and speed matters more than precision.

Teams are successfully using AI to:

Generate test cases from requirements
Assist with unit and integration test authoring
Create realistic and varied test data
Summarize test results and surface patterns

In most of these cases, teams are relying on LLMs to generate intent, not to make final execution or release decisions.

These use cases benefit from flexibility. Minor differences in output rarely introduce risk, and human review is often part of the workflow.

The challenge emerges when that same probabilistic behavior is extended into execution.

Why test execution is fundamentally different

Test execution is not a creative task. It is a verification task.

Execution requires:

The same test to behave the same way, run after run
Assertions that are precise and stable
Failures that can be reproduced and diagnosed
Outcomes that can be explained clearly to stakeholders

Generative AI systems—particularly LLMs—are probabilistic by design. That variability is useful for exploration and generation, but it works against the repeatability and determinism execution depends on.

As AI accelerates development, repeatability becomes more important than intelligence in test execution.

How probabilistic execution creates real problems

When probabilistic systems are used to drive execution, teams often encounter the same failure modes:

Tests that pass one run and fail the next without code changes
Assertions that subtly change or disappear
Longer debugging cycles because failures can’t be reproduced
Rising compute costs from repeated executions
Engineers losing confidence in automation results

When failures aren’t repeatable, teams stop trusting their tests—and that’s when automation becomes a bottleneck instead of a benefit.
– Shaping Your 2026 Testing Strategy

Once trust erodes, teams compensate. Manual validation creeps back in. Releases slow down. Automation becomes something teams work around rather than rely on.

Execution amplifies risk: security, governance, and explainability

Execution is also where risk concentrates.

When AI systems drive test execution, they may:

Send application context externally
Make decisions that can’t be fully explained
Produce outcomes that are difficult to audit

These concerns are most visible in regulated and high-risk environments, but they apply broadly. Any team responsible for production releases needs to be able to explain why a test failed—or why a release was approved.

Reliable execution is not just a technical concern. It’s a governance concern.

Why deterministic execution matters at scale

Deterministic systems behave predictably. Given the same inputs, they produce the same outcomes.

In test execution, this enables:

Reliable failure reproduction
Faster root cause analysis
Lower maintenance overhead
Clear audit trails
Reduced noise in pipelines

What test execution demands is not intelligence, but guarantees: the same inputs producing the same outcomes, every time.

Reliable test execution depends on determinism, not creativity.

Rethinking AI’s role in execution

The goal is not to abandon generative AI. It’s to use it where it fits.

Effective teams are separating responsibilities:

Generative AI for creation, exploration, and analysis
Deterministic systems for execution and verification

This separation allows teams to move quickly without sacrificing confidence.

What this means for engineering and QE teams

As AI becomes more deeply embedded in testing workflows, the key decision is no longer whether to use AI—but where.

Teams that succeed will:

Accept variability where it’s safe
Demand determinism where decisions are made
Measure success by signal quality, not test count
Optimize for trust before speed

The biggest risk in AI-driven testing isn’t lack of automation—it’s lack of trust.

Choosing confidence over convenience

Generative AI has changed how tests are created. It should not change the standards by which tests are trusted.

Execution is where reliability matters most. Teams that recognize this distinction will scale testing with confidence, even as AI continues to reshape software development.

Watch Shaping Your 2026 Testing Strategy now.

Quick Answers

Why can’t generative AI reliably execute tests?

Generative AI systems, including LLMs, are probabilistic by design. This variability leads to inconsistent execution flows, unstable assertions, and failures that are difficult to reproduce.

Is generative AI bad for test automation?

No. Generative AI is highly effective for test creation, data generation, and analysis. Problems arise when it is used to drive execution and release decisions.

What does deterministic test execution mean?

Deterministic test execution produces consistent results given the same inputs, enabling repeatable failures, faster debugging, and greater trust in automation.

Why does execution matter more than test creation?

Test creation accelerates coverage, but execution determines confidence. Reliable releases depend on predictable, explainable test outcomes.

How should teams combine generative AI and LLMs with deterministic systems?

Use generative AI and LLMs where flexibility is helpful, and deterministic systems where verification and decision-making require guarantees.

The post What Test Execution Demands That Generative AI Can’t Guarantee appeared first on AI-Powered End-to-End Testing | Applitools.

AI Testing in 2026: Why Signal, Trust, and Intentional Choices Matter More Than Ever

Applitools Team — Tue, 10 Feb 2026 21:06:00 +0000

TL;DR
• AI is now foundational to software testing, but more AI often creates more noise.
• AI-assisted development increases code volume and pressure on QA teams.
• The biggest bottleneck in testing today is signal-to-noise, not execution speed.
• Successful testing strategies in 2026 prioritize trust, explainability, and reliable results.

AI has quietly moved from the edges of software testing into the center of it. For most teams, it’s no longer a question of whether AI plays a role in testing, but how deeply—and how intentionally.

Quality and Engineering leaders are feeling this shift firsthand. AI-assisted development is increasing the volume and pace of code changes. Release cycles are accelerating. At the same time, testing teams are being asked to scale confidence without scaling headcount.

In this environment, speed alone is not the differentiator. Trust is.

In AI-driven testing, speed without trust slows teams down.

AI is no longer optional in testing

Across the software delivery lifecycle, AI is already embedded in day-to-day workflows. Teams are using it to generate test cases from requirements, assist with automation, create test data, and analyze results. In many organizations, this adoption didn’t start with QA—it started with developers.

What’s changed is that AI is no longer experimental or isolated. It’s shaping how testing actually happens.

This matters because AI-assisted coding changes the scale of the testing problem. More code is being produced, faster than before, and not all of it is high quality. That shift pushes pressure downstream, straight onto QA and QE teams.

More AI hasn’t reduced pressure on QA—it’s increased it

For many Engineering Managers, AI has delivered productivity gains on the development side while increasing complexity on the testing side. Test suites grow larger. Pipelines generate more results. Failures are harder to interpret.

As Applitools CEO Anand Sundaram recently described, the imbalance is real:

“You have more code to be tested, sometimes not the best code, more coverage required, and fewer people.”
– Shaping Your 2026 Testing Strategy

This combination exposes a deeper issue. As tooling improves, teams don’t just get more data, they get more noise. And noise is expensive.

The real bottleneck is signal-to-noise

Most mature teams are no longer blocked by how fast they can run tests. They’re blocked by how confidently they can interpret the results.

As AI accelerates development, signal quality matters more than test volume.

False positives, flaky tests, and inconsistent outcomes force teams into defensive behaviors: re-running pipelines, manually validating changes, and delaying releases “just to be safe.” Over time, automation stops accelerating delivery and starts slowing it down.

This is where many AI-driven testing initiatives struggle. AI can generate more tests and more output, but without reliable signals, that output doesn’t lead to better decisions.

Not all AI is suitable for testing decisions

One clear theme for 2026 is that AI is not a single, interchangeable capability. Different phases of the testing lifecycle have very different requirements.

Large language models excel at tasks that tolerate variation: generating test ideas, creating data, summarizing results, and assisting with analysis. But test execution and release decisions demand consistency, repeatability, and explainability.

This distinction becomes especially clear when you look at test execution. Unlike test generation or analysis, execution depends on consistent behavior and repeatable outcomes.

When test outcomes change run to run, teams lose trust. When failures can’t be reproduced, debugging slows down. And when decisions can’t be explained clearly, confidence erodes—both within engineering and with leadership.

Trust, explainability, and repeatability matter more than novelty

As AI adoption grows, testing teams are being forced to answer harder questions. Can we trust these results? Can we explain them? Can we confidently make release decisions based on them?

These questions matter in regulated and high-risk environments, but they’re just as relevant for any team shipping customer-facing software at speed. Reliability is not a constraint on velocity—it’s what makes velocity sustainable.

Teams operating under stricter compliance requirements have already learned that explainability and repeatability are non-negotiable for AI-driven testing decisions. (Read more—AI Testing in Regulated Environments: Smarter Testing Starts With Stability, Not More Code.)

This is why many teams are rethinking how they apply AI to testing. Deterministic approaches—systems that behave consistently and predictably—make it easier to reduce noise, identify real failures, and move faster with confidence.

What this means for testing strategy in 2026

The takeaway for Quality and Engineering leaders isn’t to slow down AI adoption. It’s to be more intentional about it.

Successful testing strategies in 2026 will share a few characteristics:

AI is treated as foundational, not experimental
Different phases of testing use different kinds of AI
Reliability and explainability are prioritized where decisions are made
Signal quality and maintenance reduction are explicit goals

Not all AI belongs everywhere. Choosing where reliability matters most is becoming a core leadership responsibility for engineering and quality teams. The biggest risk in AI-driven testing isn’t lack of automation—it’s lack of trust.

Choosing progress over noise

AI is reshaping software testing whether teams are ready or not. The challenge now is judgment. Knowing where AI accelerates quality—and where it quietly undermines it—is what separates teams that scale confidently from those that drown in noise.

The fastest teams aren’t the ones chasing the newest tools. They’re the ones that trust what their tests are telling them.

Watch Shaping Your 2026 Testing Strategy now.

Quick Answers

Why does AI increase noise in software testing and how does this affect testing strategy in 2026?

AI accelerates code changes and test generation, but probabilistic (non-deterministic) systems can introduce inconsistent results, leading to flaky tests and false positives. Teams that make intentional choices about where and how AI is used will scale faster with less noise and higher confidence.

What is the biggest risk of AI-driven software testing?

The biggest risk in AI-driven software testing is loss of trust. When test results aren’t repeatable or explainable, teams slow down releases and reintroduce manual validation.

Is AI bad for test automation?

No, not all AI is bad for test automation. AI is highly effective for test generation, data creation, and analysis. Problems arise when probabilistic (non-deterministic) AI is used for execution and decision-making.

What should engineering leaders prioritize in AI testing strategies?

Software engineering and QA/QE leaders should prioritize reliable signals, reduced maintenance, and explainable results over raw test volume or novelty.

The post AI Testing in 2026: Why Signal, Trust, and Intentional Choices Matter More Than Ever appeared first on AI-Powered End-to-End Testing | Applitools.

Applitools Named a Strong Performer in The Forrester Wave™: Autonomous Testing Platforms Report, Q4 2025

Applitools Team — Tue, 20 Jan 2026 21:19:00 +0000

TL;DR
• Reducing test maintenance and improving result accuracy are becoming core evaluation criteria for autonomous testing platforms
• Visual validation is increasingly used to ensure UI accuracy across web, mobile, and native applications
• These capabilities help teams maintain release confidence and reduce risk in complex and dynamic, user-facing experiences at scale

Modern software teams ship faster than ever, and testing teams need tooling that keeps up. In Q4 2025, Forrester published The Forrester Wave: Autonomous Testing Platforms, Q4 2025, evaluating autonomous testing platform providers.

Applitools is named a Strong Performer in this evaluation.

The momentum behind autonomous testing

Teams now build and ship across more devices, frameworks, and release cadences. That reality pushes quality practices toward higher automation, better maintenance efficiency, and faster feedback loops.

Forrester frames this market shift directly:

“This is why we changed this Forrester Wave category from ‘continuous automation testing platforms’ to ‘autonomous testing platforms.’”
The Forrester Wave: Autonomous Testing Platforms, Q4 2025, Forrester Research, Inc., Q4 2025.

What buyers should look for in autonomous testing platforms

When you evaluate autonomous testing platforms in 2025, three practical questions usually help teams make sense of the space:

Platform fit: Can the platform support your mix of apps and test types, plus your workflows across engineering and QA?
AI-infused automation: Does the platform reduce authoring and maintenance effort in a way you can trust and govern?
Testing AI-enabled experiences: As more teams ship AI-enabled features, can your testing approach keep pace with new failure modes and higher variability?

These questions help teams connect product capabilities to real delivery constraints: speed, coverage, confidence, and operating cost.

How the report characterizes Applitools

This report describes Applitools’ approach through Visual AI and ML-resilience oriented toward UI accuracy and maintenance reduction:

“(Applitools) It features Visual AI to validate UI accuracy across web, mobile, and native apps and support modern digital experiences at scale.”
The Forrester Wave: Autonomous Testing Platforms, Q4 2025, Forrester Research, Inc., Q4 2025.

It also cites a strategy emphasis on reducing maintenance and improving accuracy:

“Applitools stands out for innovation, gaining an above-par score due to its Visual AI and ML-driven resilience that reduce test maintenance and improve accuracy.”
The Forrester Wave: Autonomous Testing Platforms, Q4 2025, Forrester Research, Inc., Q4 2025.

What this can mean for engineering, QA, and design teams in 2025

Engineering teams can treat autonomous testing as a way to protect delivery speed. When teams reduce flaky failures and avoid constant test repairs, they shorten the path from code change to deployable signal.

QA teams can prioritize scalability and governance. As test suites grow, teams need tools and workflows that improve coverage without creating unsustainable maintenance load.

Design teams can connect UI intent to release confidence. When teams validate UI accuracy consistently across browsers, devices, and releases, they reduce risk in UX-heavy, customer-facing journeys.

Across all three groups, teams can get more value when they align on what “quality” means for the product and then choose automation approaches that enforce that definition consistently.

Read the report

While you’re evaluating autonomous testing priorities for 2025, read the full report to understand the evaluation criteria, methodology, and vendor profiles in context.

Forrester does not endorse any company, product, brand, or service included in its research publications and does not advise any person to select the products or services of any company or brand based on the ratings included in such publications. Information is based on the best available resources. Opinions reflect judgment at the time and are subject to change. For more information, read about Forrester’s objectivity here.

The post Applitools Named a Strong Performer in The Forrester Wave™: Autonomous Testing Platforms Report, Q4 2025 appeared first on AI-Powered End-to-End Testing | Applitools.

AI Testing in Regulated Environments: Smarter Testing Starts With Stability, Not More Code

Applitools Team — Thu, 04 Dec 2025 22:06:00 +0000

TL;DR
• Code-centric automation continues to slow teams down as UI changes multiply, making stability and evidence hard to maintain.
• AI code generators don’t solve the problem because they still produce brittle test code that requires constant oversight.
• Live LLM-driven execution introduces unpredictability. Regulated teams need deterministic runs, not improvisation
• A clearer path is intent-driven authoring paired with deterministic engines and Visual AI that detects visual drift and preserves audit-ready evidence.

Request our Governance Readiness Checklist

Teams in regulated environments face a familiar strain. Applications grow in complexity, expectations for fast releases keep rising, and every update requires clarity about what changed and whether required elements still appear as intended. Traditional automation wasn’t built for that pace or level of oversight, and the recent wave of AI coding tools hasn’t solved the core challenges.

A better model is emerging—one that uses AI to reduce the workload of authoring and maintaining tests while keeping execution deterministic, reviewable, and aligned with how people evaluate digital experiences.

This post breaks down why the legacy testing model is hitting its limits and how AI can support a more stable, more trustworthy approach.

Why traditional automation keeps slowing teams down

As digital experiences expand across pages, portals, member journeys, and product flows, test code becomes difficult to scale. Even minor UI changes break locators and assertions, creating unpredictable test runs, delayed reviews, and long maintenance cycles.

Developers are often asked to take on more of the testing responsibility. While this can improve feedback loops, it does not reduce the burden of maintaining code that reacts poorly to UI changes. And when teams already lack time, context switching between product development and test diagnostics becomes expensive.

The result is a predictable bottleneck: too many tests tied directly to implementation details and not enough stability across releases.

Why AI-generated test code hasn’t fixed the problem

The last few years have produced a surge of tools that promise to generate automation code automatically. But teams report the same issues repeating in a new form. LLMs can produce code quickly, yet the resulting output still inherits all the maintenance challenges of coded automation.

AI code generators also excel more at producing new code than updating existing flows. They struggle with assertions, hallucinate element behavior, and require human supervision to validate every step. For regulated teams that must show repeatability and generate evidence for every release, inconsistency becomes a risk rather than a convenience.

If the goal is to escape brittle code, producing more of it is not the answer.

Why live LLM-driven execution creates instability

Another idea gaining attention is allowing an LLM to operate the UI directly during test execution. In theory, this removes the need to write code. In practice, teams quickly run into new risks: undefined steps, inconsistent interactions, slow decision-making, and no reliable way to debug.

Execution in regulated environments must be predictable. It must be reviewable. And it must produce evidence that can be traced, explained, and defended. Live improvisation during a test run undermines each of these requirements.

Determinism matters more than novelty. A testing approach must produce the same result today, tomorrow, and during an audit review.

A clearer path forward: intent-driven authoring with deterministic execution

A more reliable model is emerging that uses AI to simplify authoring without relying on AI to make real-time decisions during execution.

Teams describe test intent in natural language. An AI system translates that intent into structured steps during authoring, where humans can review and adjust. Execution is then handled by deterministic engines and Visual AI that observe the rendered UI and detect visual changes, required-element presence, placement consistency, and contrast.

This separation delivers two advantages:

People write and maintain far fewer lines of test code
Test runs become stable, repeatable, and easier to verify

Visual AI provides a complete view of the screen state and compares each run against an approved baseline. When something changes, the system surfaces the difference, captures evidence, and supports reviewer approvals. When the change is expected, one acceptance updates the baseline and applies it across browsers and devices.

The outcome is a testing layer that is easier to maintain and easier to trust.

What this looks like in practice

Teams adopting this approach typically see changes across several parts of their workflow:

Tests are written in plain language, without selectors or framework setup
Visual AI validates full screens for layout, presence, placement, and readability
Changes are highlighted automatically to reduce manual inspection
Evidence is captured through screenshots, diffs, timestamps, and logs
Debugging takes place in an environment where runs behave the same every time
Reusable flows and data-driven steps integrate into the same natural-language format

Instead of managing a growing volume of fragile code, teams maintain intent-level descriptions supported by deterministic execution.

What this means for oversight and compliance

For teams in financial services, healthcare, insurance, or life sciences, the benefits go beyond efficiency.

A visually grounded testing model helps confirm that required notices, disclosures, language-access elements, and other regulated UI content remain present and placed as expected. It documents what changed and preserves evidence for review. It supports consistent experiences across browsers, devices, and PDFs without checking whether values, data, or regulatory text are correct.

Most importantly, it delivers predictable results.

Regulated environments depend on clarity and traceability. When every test run yields reviewable outputs, and every change is captured with context, teams can maintain confidence and release with speed.

If you’re assessing how well your testing workflow supports stability and audit readiness, request our Governance Readiness Checklist. We’ll share the version designed for your stage—whether you’re evaluating Applitools or optimizing an existing deployment.

Frequently Asked Questions

What makes AI testing viable in regulated environments?

AI testing in regulated environments must be deterministic. Generative AI can help describe test intent, but live LLM execution introduces inconsistent behavior and slow debugging. Regulated teams need predictable, repeatable runs that avoid improvisation and produce evidence they can review and defend.

How does Visual AI support oversight?

Visual AI checks the rendered UI against an approved baseline, highlighting visual drift, and capturing screenshots, diffs, and timestamps for audit review. Learn more about Visual AI.

Why is reducing test maintenance so important for regulated organizations?

Code-centric UI tests break frequently as interfaces evolve. This creates delays, slows approvals, and complicates reviews. Using intent-based authoring paired with Visual AI reduces locator churn and helps teams maintain consistent coverage with less rework. Read more about PDF change detection and baseline comparison.

Does AI testing validate regulatory correctness?

No. AI testing can detect visual drift, confirm required-element presence and placement, and preserve evidence. Validation of regulatory correctness, plan data, rates, or clinical content remains a human and organizational responsibility.

The post AI Testing in Regulated Environments: Smarter Testing Starts With Stability, Not More Code appeared first on AI-Powered End-to-End Testing | Applitools.

Buyer’s Checklist for Autonomous Testing in Regulated Environments

Applitools Team — Mon, 17 Nov 2025 20:45:00 +0000

TL;DR
• Autonomous testing is maturing quickly, but regulated organizations must evaluate platforms through the lens of traceability, auditability, and control.
• Forrester’s Autonomous Testing Platforms Landscape, Q3 2025 shows that the real differentiators now are explainability, risk-based orchestration, and AI governance—not just automation speed.
• Use this checklist to choose a platform that accelerates delivery while protecting oversight.

Download Forrester’s full report for detailed market insights

Rethinking Autonomy for Regulated Teams

With hundreds of tools now promising “AI-driven automation,” sorting true autonomy from clever scripting has become increasingly difficult. This matters even more for regulated teams planning their 2026 quality strategy. Speed is no longer the only concern. Proof, traceability, and controlled execution are now essential.

Forrester’s recent analysis highlights a market shifting from test automation to AI-augmented and agentic systems that generate, maintain, and execute tests under human supervision. The key question for regulated buyers is not whether autonomy will help, but whether the platform provides clear governance around how that autonomy operates.

Use this checklist to evaluate solutions with the guardrails required for safety-critical or compliance-heavy environments.

Core Capabilities Every Autonomous Testing Platform Should Provide

These capabilities form the baseline for operating safely and efficiently in regulated sectors.

Plain-language test authoring and execution
Non-technical reviewers should contribute without adding risk. Natural-language authoring and guardrails make collaboration safe and auditable.

Transparent AI actions
Every generated or changed step must be reviewable. No black-box maintenance. No silent updates.

Evidence management and auditability
Exportable logs, change histories, and evidence packs should support internal and external audits without manual rework.

Role-based control and gated approvals
Automation should accelerate work, but never bypass required compliance workflows.

Adaptive, governed maintenance
Self-healing is useful only when changes are traceable and reversible. Regulated teams need adaptive maintenance under human oversight.

If a platform lacks any of these essentials, it’s not built for environments where documentation and control are mandatory.

Where Advanced Platforms Differentiate

Once the fundamentals are covered, regulated organizations should look at the capabilities that separate mature autonomous solutions from those still catching up.

Intent-based visual and experience validation
Pixel comparison is brittle. Intent-driven validation ensures the interface appears correct, accessible, and compliant across devices and browsers.

Governance dashboards
AI actions, risk coverage, and test triggers should be visible and easy to trace for auditors and managers.

Actionable analytics and reporting
Evidence should turn into insights that support risk management, release approvals, and executive reporting.

Risk-based orchestration
Platforms should prioritize tests based on business criticality, change impact, and historical issues—not just run everything in bulk.

Applying Autonomous Testing in Regulated Workflows

Organizations across healthcare, life sciences, financial services, and other regulated industries are already adopting autonomous testing—but always with governance in place.

In the pharmaceutical sector, EVERSANA INTOUCH takes a hybrid approach, combining Applitools Eyes for Visual AI validation with Applitools Autonomous for intelligent test generation. This end-to-end strategy ensures quality products, supports compliance-ready evidence, reduces maintenance, and provides end-to-end coverage across complex workflows—all while keeping human reviewers in charge. Read the EVERSANA INTOUCH case study.

These hybrid models show how autonomy can increase coverage and speed without loosening control.

Applying the Checklist to Your Evaluation Process

Use this framework when comparing platforms side by side:

Map your highest-risk business journeys. Focus on areas tied to compliance, customer safety, or financial impact.
Prioritize transparency. Ensure the platform shows why AI takes each action and allows review before changes go live.
Assess evidence and governance. Exportable results, audit-ready logs, and approval gates are non-negotiable.
Evaluate adaptability. Autonomous maintenance should reduce manual effort but still operate inside defined boundaries.

Reassess regularly. The market is moving fast. Capabilities that seem advanced today will become baseline expectations.

Choosing with Confidence

Autonomous testing is reaching maturity, but regulated organizations need more than speed—they need governance, visibility, and trust. Forrester’s research confirms that platforms built with explainability and risk alignment at the center are the ones best suited for compliance-driven teams.

Use Forrester’s analysis and this checklist to guide your next evaluation and choose an autonomous testing solution that accelerates both delivery and confidence. Download the Autonomous Testing Platforms Landscape, Q3 2025 report.

Frequently Asked Questions

What is an autonomous testing solution?

An autonomous testing solution uses AI to create, execute, and maintain tests automatically—continuously improving speed, coverage, and reliability.

Are autonomous testing tools safe for regulated industries?

Yes, as long as the platform provides explainable AI actions, governed maintenance, exportable evidence logs, and strict access controls. These guardrails ensure autonomy operates within compliance requirements.

How does autonomous testing support audit readiness?

Modern platforms capture evidence automatically, record AI-driven changes, and produce exportable logs that simplify internal and external audits. This reduces manual documentation effort while increasing traceability.

Can autonomous testing replace human testers?

No—it complements them. By automating maintenance and execution, it frees QA and engineering teams to focus on strategy, risk, and user experience.

When is a team ready to invest in autonomous testing?

When test maintenance slows releases or expanding coverage requires more effort than resources allow. Teams with established CI/CD pipelines gain the most immediate benefit.

What should regulated organizations look for in autonomous testing tools?

Key capabilities include transparent AI actions, controlled authoring, audit-ready evidence, risk-based test prioritization, and dashboards that show why the AI took specific actions.

The post Buyer’s Checklist for Autonomous Testing in Regulated Environments appeared first on AI-Powered End-to-End Testing | Applitools.

Agentic Automation: Preparing QA Leaders for the Next Leap in Testing

Applitools Team — Thu, 30 Oct 2025 19:30:00 +0000

Update & TL;DR
This post was written while Forrester’s research on agentic and autonomous testing was still emerging. Since publication, Applitools has been included in The Forrester Wave: Autonomous Testing Platforms, Q4 2025. The perspective outlined below reflects how this shift has since been validated and formalized by independent industry analysts.

• Agentic automation shifts testing from brittle, script-driven execution to intelligent systems that adapt based on change, risk, and context.
• AI augments human intent rather than replacing QA teams, enabling people to focus on quality strategy, governance, and risk decisions.
• This model is increasingly shaping how autonomous testing platforms are evaluated in the market.

Forrester, a leading global research and advisory firm, identified a major turning point in software testing in its Autonomous Testing Platforms Landscape, Q3 2025. The research describes a shift from traditional scripted automation to AI-augmented systems that can learn, adapt, and act under human guidance. This shift signals the rise of agentic automation: intelligent systems that create, run, and optimize tests within defined boundaries.

As delivery cycles compress and complexity grows, quality and engineering leaders are redefining what effective testing means in practice. Agentic automation bridges human intent with machine-driven precision—transforming testing from a reactive maintenance task into a proactive engine for reliability, speed, and continuous improvement.

From Automation to Intelligence

Traditional automation accelerated execution but left teams managing brittle scripts and endless maintenance. AI-augmented testing changes that dynamic. These systems:

Learn continuously from results and application change.
Adapt test scope and prioritization based on business risk.
Optimize coverage while maintaining human oversight.

The result is testing that behaves less like a checklist and more like a self-improving quality partner, one that scales reliability across every release.

The Three Business Values Driving This Shift

Forrester highlights three outcomes motivating investment in more intelligent testing systems:

Accelerate Time to Value – AI-driven generation and self-healing shorten feedback loops and reduce maintenance.
Reduce Strategic Risk – Risk-based orchestration and built-in governance connect quality metrics directly to business priorities.
Democratize Testing – Low-code authoring and natural-language interaction let non-developers participate in quality, closing skill gaps.

Agentic automation brings these together: human-directed intent, machine-driven efficiency, and transparent oversight.

How AI-Augmented Systems Complement Human Expertise

AI in testing works best as augmentation, not replacement. By handling repetitive execution and maintenance, intelligent systems free QA professionals to focus on:

Defining risk and coverage strategy.
Establishing governance frameworks that maintain trust.
Collaborating earlier with product and development teams.

Agentic automation shifts QA leadership from running tests to steering quality outcomes.

The Role of Visual and Experience Validation

Intelligent automation depends on reliable validation signals. Traditional assertions can’t always capture what matters to real users: layout, accessibility, and experience consistency.

Visual and experience validation fill that gap, giving AI-augmented systems context they can trust. When machines validate what users actually experience, teams gain both speed and confidence—without rigid pixel-level comparison.

Building Toward AI-Augmented Readiness

Forrester describes this as a maturing market: organizations are blending traditional automation with AI capabilities to move toward greater autonomy over time. QA leaders can start by:

Stabilizing automation foundations and addressing flakiness.
Adopting AI-assisted detection of UI and data changes.
Integrating experience-level validation for richer feedback.
Connecting quality analytics to business metrics for continuous improvement.

Each step builds the trust and data maturity required for agentic automation to succeed under human orchestration. As adoption increases, these maturity steps align with how leaders in the market are being evaluated on autonomous capabilities.

What QA Leaders Can Do Next

Forward-looking teams are already experimenting with:

Adaptive execution that prioritizes tests dynamically.
Governance dashboards linking coverage, risk, and compliance.
Visual AI that helps systems understand real user impact.

The goal isn’t full autonomy—it’s AI-augmented confidence: testing that’s faster, smarter, and more inclusive across roles. Read the full report now.

Frequently Asked Questions

What is agentic automation in software testing?

Agentic automation refers to AI-augmented systems that can learn, adapt, and act within human-defined boundaries to create, run, and optimize tests. Instead of simply executing scripts, these systems continuously improve based on feedback and business context.

How does AI-augmented testing reduce maintenance?

By using self-healing and adaptive test generation, AI-augmented testing identifies and fixes broken tests automatically. It also adjusts coverage based on application changes and risk, minimizing the need for manual upkeep.

What business benefits does agentic automation deliver?

The Forrester research identifies three key outcomes: faster time to value through automation and learning; reduced strategic risk through governance and risk-based prioritization; and democratized testing through natural-language and low-code interfaces.

How do human testers fit into agentic automation?

AI systems handle repetitive execution and maintenance so human experts can focus on strategy—defining risk models, shaping governance, and collaborating earlier in the delivery process. This partnership amplifies QA’s influence across engineering.

Why is visual and experience validation essential for intelligent testing?

Visual and experience validation let AI systems measure what users actually see and feel—not just code-level outputs. This gives machine-driven tests the contextual awareness to evaluate accessibility, layout, and experience consistency accurately.

The post Agentic Automation: Preparing QA Leaders for the Next Leap in Testing appeared first on AI-Powered End-to-End Testing | Applitools.

Test Maintenance at Scale: How Visual AI Cuts Review Time and Flakiness

Applitools Team — Tue, 21 Oct 2025 20:22:00 +0000

Why Test Maintenance Breaks at Scale

Test maintenance at scale slows releases. Teams that rely on coded assertions spend more time updating tests than improving coverage. Brittle locators, environment drift, and false positives all add up—turning automation into a maintenance cycle.

Neglecting maintenance is like skipping car care: small issues snowball into costly downtime. A smarter approach replaces manual review and locator-based scripts with automated, visual validation that adapts as your UI evolves.

How Visual AI Delivers Test Maintenance at Scale

Visual AI replaces dozens of coded assertions with a single checkpoint that mimics how humans see. It validates full UI states, detecting layout shifts, missing elements, and text overlaps automatically.

By consolidating validations into one Visual AI check, teams cut review time, reduce false positives, and gain faster feedback cycles.

Scale Reviews with Ultrafast Grid and Grouping

Running tests one browser at a time no longer scales. The Applitools Ultrafast Grid executes a single test once, then validates results across every browser and device combination in parallel.

Batching and grouping features make reviews equally efficient—approve or reject similar changes across entire runs in just a few clicks.

How it works

Replace assertions with one visual checkpoint
Run once across all browsers and devices
Batch results for unified review
Approve or reject in bulk
Tune match levels for dynamic content

Together, these capabilities eliminate redundant effort and make large-scale testing faster to maintain.

Customer Results: 78% Less Maintenance

Teams that adopt this approach see measurable ROI. At Peloton, replacing a legacy visual testing tool with Applitools Visual AI produced a 78% reduction in maintenance time and saved about 130 hours per month.

With dynamic leaderboards, live data, and responsive layouts across web and native mobile, Peloton maintains quality at scale without expanding test overhead.

Three Features That Change Maintenance

“Ultrafast Grid, Visual AI match levels, and bulk grouping—those three change the game.”
Mike Millgate, Smarter Test Maintenance at Scale

These three deliver flexible validation, fast execution, and effortless maintenance. Each removes manual steps and accelerates the feedback loop that keeps releases reliable.

Smarter Maintenance for Modern Teams

Smarter test maintenance isn’t about writing more code—it’s about automating intelligently. Visual AI reduces flakiness, speeds reviews, and scales across devices and environments.

To see what’s next, explore Applitools Eyes 10.22, featuring faster review cycles, new Storybook and Figma integrations, and even shorter feedback loops for test maintenance at scale.

Frequently Asked Questions

What is Visual AI testing?

Visual AI uses automated visual assertions to validate full UI states, catching layout and content changes that code-heavy checks miss.

How does Visual AI reduce test maintenance at scale?

One visual checkpoint replaces dozens of brittle assertions, while batching and grouping speed reviews across browsers and devices.

What’s the difference between Visual AI and visual regression testing?

Visual AI applies learned match levels and region logic to reduce false positives and handle dynamic content; classic visual diffing is more brittle. Learn more about Visual AI.

How do match levels help with dynamic content?

Layout, text, and color match levels tune sensitivity so teams can ignore cosmetic shifts while catching meaningful UI regressions.

Does Visual AI work with my framework (Selenium, Cypress, Playwright)?

Yes—Applitools has drop-in SDKs let you run your existing tests and add a single Visual AI checkpoint. Learn how to quickly integrate Applitools into your current tech stack.

The post Test Maintenance at Scale: How Visual AI Cuts Review Time and Flakiness appeared first on AI-Powered End-to-End Testing | Applitools.

Test Your Components Where You Build with the Applitools Storybook Addon

Applitools Team — Fri, 17 Oct 2025 11:07:00 +0000

Local dev is where most UI changes happen (and where regressions sneak in). States drift, styles diverge, and tiny tweaks pile up until something breaks in CI. The Applitools Storybook Addon brings AI-powered visual testing straight into Storybook so you can catch issues as you code, approve the good changes quickly, and keep your CI/CD pipelines green.

AI-Powered Visual Testing Inside Storybook

Open your Storybook and run visual tests from an Applitools Eyes tab – no context switching. Results are grouped by component to mirror your Storybook structure, and a reporter widget highlights what needs attention first so you can review diffs in minutes, not hours. Learn more on our Storybook Component Testing with Applitools page.

Catch bugs where you build. Validate component states during local development and avoid surprises later.
Review faster with Visual AI. See only meaningful, human-perceptible UI changes without pixel-to-pixel noise. Tune sensitivity with AI match levels when you need to.
Scale coverage painlessly. Run once; render everywhere with Ultrafast Grid across browsers, devices, and viewports in parallel.

How to Use the Applitools Eyes Storybook Addon

Getting started takes just a couple of minutes.

Install the SDK & Addon
Add Applitools Eyes to your project and enable the Storybook addon (React, Vue, Angular supported). See the installation instructions in the Eyes Storybook Addon docs.
Run Applitools Visual Tests in Storybook
Open Storybook, switch to the Applitools Eyes tab, and trigger tests for a single story or an entire component. Results stream back in real time with automatic grouping by component.
Review & Maintain
Use Visual AI diffs, side-by-side views, and auto-maintenance to approve or reject changes in bulk. Prioritized sorting surfaces what needs attention first.
Scale Across Browsers/Devices
Turn on Ultrafast Grid to parallelize renders across Chrome, Firefox, Safari, Edge, and mobile sizes – without extra local setup.

Applitools Storybook Addon Use Case Playbook

Below are the three most common ways teams use the Eyes Storybook Addon each – with a quick, practical flow pulled right from the product.

Use Case: Guard Your Design System

As you refactor tokens or update themes, run visual tests on every component state. Spot unintended changes across the library instantly.

How to do it in Storybook

Start Storybook and open your design‑system component in the Applitools Eyes tab.
Click Run from the tab (or use Run in the left sidebar test module). The addon tests the stories and streams results inline for every browser/device in your applitools.config.js.
In the sidebar, filter by Unresolved to zero in on changes across the library (Green = passed, Orange = unresolved, Red = failed).
Open a story’s result and use Side‑by‑Side or the Slider to spot subtle spacing/typography diffs.
Approve legit theme updates with Thumbs Up (or use ⋯ → Review actions to approve the whole story/batch). Reject regressions with Thumbs Down and fix.

Pro tip: Use the tab ⋯ → Configuration to confirm you’re validating the right browser matrix and server URL. See more options in the docs.

Use Case: Fix Fast During Local Dev

Working on a feature branch? Validate your component in Storybook before you commit.

How to do it in Storybook

Open your feature’s stories, then hit Run in the Applitools tab for the component you’re touching.
Watch statuses update inline; click the status buttons to filter to Unresolved so you only look at what changed.
Click into any row to open compare tools: Diff Image, Actual Image, Expected Image, Side‑by‑Side, or Slider.
If the change is intended, Thumbs Up to approve; otherwise Thumbs Down to flag and keep iterating.
When you’re happy locally, push your branch. You can scale the same setup in CI using your existing Storybook build/preview URL.

Heads‑up: To view baselines or approve/reject, sign in to your Applitools account in the same browser that’s running Storybook (you’ll be prompted if not).

Use Case: Ship Multi‑Browser Confidence

One click, many targets. Validate layout and responsive behavior across browsers and viewports – early.

How to do it in Storybook

In ⋯ → Configuration, verify your browsers/devices list (Chrome, Firefox, Safari, Edge; add viewports you care about).
Hit Run for representative stories (states, theming, interactive). Results come back grouped by each browser/device so differences are obvious.
Filter the sidebar by Unresolved and scan. Use Side‑by‑Side or Slider to compare layout at different sizes.
Approve good changes in bulk (⋯ → Review actions) to keep maintenance low.
For broader coverage, run the same setup in CI and expand the matrix.

Why Visual AI > Pixel Diffs for Storybook

Pixel-to-pixel tools are fragile with dynamic content and minor rendering differences. Applitools Visual AI mimics human vision to highlight only meaningful UI changes (structure, layout, content) while ignoring the noise. You can still dial sensitivity up or down with match levels whenever needed. Less flake, more signal.

Try AI-Powered Visual Testing in Storybook Today

Run your first component tests in minutes, review diffs right in Storybook, and expand coverage with Ultrafast Grid – without slowing delivery.

Explore the Applitools Storybook Component Testing solution
Read the Applitools Storybook Addon docs
Compare Applitools Eyes against Chromatic

Frequently Asked Questions

What does the Applitools Storybook Addon do?

It runs Applitools visual tests from inside Storybook. You can trigger tests per story or component, then review results and diffs inline with automatic grouping that mirrors your Storybook tree.

Do I need to write tests with the Applitools Storybook Addon?

With the Applitools Storybook Addon, you existing stories become the tests.

How is the Applitools Storybook Addon different from Chromatic visual tests?

Applitools’ Visual AI detects signficant visual differences instead of only pixel-to-pixel comparisons. This means you see fewer false positives and spend less time on maintenance.

Appitools also lets you auto-maintain hundreds of tests at once (when you do need to perform test maintenance), run them across multiple browsers and devices instantly, and manage everything in the same platform that’s also running your Playwright and Cypress end-to-end test flows. See our Applitools vs. Chromatic comparison page for a deeper breakdown.

What about performance and CI stability?

Validate locally in Storybook to prevent CI failures. When you’re ready, run the same tests in CI and render broadly with Ultrafast Grid – fast and consistent.

Do I need an Applitools account to use the Storybook Addon?

Yes. You’ll need an active Applitools Eyes account and an API key to use the Applitools Storybook Addon.

The post Test Your Components Where You Build with the Applitools Storybook Addon appeared first on AI-Powered End-to-End Testing | Applitools.

Validate Your Figma Designs Before Code Ships with the Applitools Eyes Plugin

Applitools Team — Mon, 13 Oct 2025 22:00:00 +0000

Even the best design systems can fall short when a layout moves from Figma to code. Fonts shift, buttons resize, and colors look a little off. These small issues result in visual drift and long review cycles between design, development, and QA.

Figma design testing with Applitools Eyes closes that gap. Export Figma frames directly to Eyes to compare what you designed with what you built using the same visual testing tools your QA teams already trust.

Design-to-Code Testing in One Place

The plugin lets you send Figma frames, including individual components, pages, or entire prototypes, straight into Applitools Eyes. Each exported frame becomes a visual baseline, the same kind used in automated tests.

Developers can run their regular visual tests against these baselines to confirm that what they’ve built matches the approved design. Meanwhile, Designers can export each new version of a design to see what changed between iterations. Everyone reviews results in the same Eyes dashboard, where visual differences appear side by side.

This shared view reduces guesswork and keeps teams aligned around what “correct” actually looks like.

How to Use the Applitools Eyes Figma Plugin

Getting started takes just a couple of minutes.

1. Install the Plugin

Open the plugin from the Figma Store, or open the Figma desktop app and select Plugins → Manage Plugins → Search “Applitools Eyes” → Install.

2. Connect Your Applitools Account

Launch the plugin and enter your Applitools API key and server URL (default: https://eyes.applitools.com). These settings are saved for future use.

3. Select Figma Frames to Export

You can export a single frame, multiple frames, or a full design. The plugin automatically names them based on your Figma file, or you can customize names with dynamic parameters like {figma_filename}, {figma_page}, or {figma_frame}.

4. Adjust Settings

Optional configurations include:

Match level: strict, dynamic, layout, ignore colors, exact, or none
Contrast level: accessibility comparison thresholds
Auto-accept baselines: mark first exports as approved
And more…

5. Export and Review

Lastly, click Export to Eyes to send your selections to Applitools. Frames appear in the Eyes dashboard under the “Figma” environment. Designers and Devs can view differences directly and decide whether to accept or reject them.

Three Use Cases for QA Teams

1. Design-to-Implementation Validation

Once designs are uploaded, developers can link automated tests to the same baseline using the “baseline environment name” provided by the plugin. When they run their tests, Eyes compares the live UI against the design reference.

Result: Teams catch spacing, text, or layout differences before they reach production.

2. Design-to-Design Version Comparison

Designers often revisit earlier layouts or explore small variations. Exporting both versions to Eyes highlights the exact visual differences, making it easy to review and choose the preferred version.

Result: Faster review cycles and fewer overlooked design changes.

3. Shared Visual Baselines for Collaboration

Designers, developers, and QA teams can all access the same Eyes dashboard. Instead of passing screenshots or notes, they can comment on the same visual checkpoints.

Result: Clearer handoffs and fewer miscommunications between design and engineering.

Why Visual Testing from Design to Code Matters

Designs are often reviewed visually, while code is tested functionally. However, the Figma plugin connects these two disciplines by giving both teams a consistent, visual source of truth.

For designers, it’s a way to confirm that their layouts are faithfully implemented without manually comparing screenshots. The plugin provides a reference that removes ambiguity about spacing, colors, or typography for developers. For QA teams, it introduces an additional layer of confidence that each release matches approved specifications.

This integration fits naturally into existing workflows: designs are exported once, developers test as usual, and visual checks happen automatically. What was once a manual review step becomes part of the team’s regular quality process.

Try Design-to-Code Testing for Yourself

The Applitools Eyes Figma Plugin brings visual testing into the design process, helping teams maintain consistency from mockup to release. It’s a straightforward way for design and development to share one accurate reference for how an interface should look in order to reduce manual review and give everyone confidence that what’s shipped matches what was designed.

Install the Applitools Eyes Figma Plugin and start validating your designs before code ships.

Frequently Asked Questions

What is the Applitools Eyes Figma Plugin?

The Applitools Eyes Figma Plugin lets you export frames from Figma into Applitools Eyes for visual testing. It helps teams compare their designs against live implementations or across design versions, ensuring the final product matches what was originally designed.

Why should I use the Applitools Eyes Figma Plugin?

The main reasons teams like using the plugin include:
– Detecting visual differences early in development
– Maintaining design consistency from mockup to production
– Reducing manual screenshot comparisons
– Providing a shared visual reference for design, QA, and development teams

How does Figma design testing work with Applitools?

Figma design testing with Applitools works by turning design frames into visual baselines inside the Eyes dashboard. Developers then run automated tests that capture the built UI and compare it to those baselines, highlighting any visual differences between design and implementation.

Can I compare two Figma designs using the plugin?

Yes. You can export two or more design versions to Applitools Eyes and compare them visually. The dashboard highlights differences such as layout changes, spacing updates, or color tweaks, making it easier to review design revisions before sign-off.

Do I need an Applitools account to use the Figma Plugin?

Yes. You’ll need an active Applitools Eyes account and an API key to export Figma frames to Eyes. Once connected, you can reuse your credentials for future exports.

The post Validate Your Figma Designs Before Code Ships with the Applitools Eyes Plugin appeared first on AI-Powered End-to-End Testing | Applitools.

Test Where You Build with Eyes 10.22: Visual AI for Storybook & Figma

Applitools Team — Thu, 09 Oct 2025 17:53:32 +0000

The new Applitools Eyes 10.22 release brings Visual AI testing directly to the tools teams already use to design and develop digital experiences. This update strengthens how teams build, validate, and release with three major enhancements:

Scale quality without slowing down delivery using the new Storybook Addon.
Align development and design with the new Figma Plugin, bridging intent and implementation.
Improve traceability and accountability with smarter Dashboard Optimizations for QA and leadership.

Together, these updates make visual testing faster, sharper, and more connected across the product pipeline, helping teams fully embrace shift-left testing and collaborate around a shared baseline for visual quality.

Scale Quality Without Slowing Down Delivery: Storybook Addon

Test Where You Build with the Storybook Addon

The new Storybook Addon for Applitools Eyes brings storybook visual testing directly into the development workflow. Developers can now validate UI components where they build them—no CI/CD wait times, no switching between tools.

Core Capabilities

Run and review visual tests directly inside your component workspace.
Approve or reject diffs inline and validate changes locally before merging.
Leverage higher concurrency for faster feedback.
Catch visual issues early and prevent CI failures that slow releases.

Use Case Example:
A frontend developer refactoring a shared button component runs Eyes tests directly in Storybook to verify that the update didn’t alter its appearance across the design system. By catching issues immediately, they avoid regressions on the main branch and prevent unnecessary rework later in the release cycle.

The Applitools Eyes Storybook Addon allows teams to scale quality by expanding coverage while maintaining release velocity. By embedding visual validation where developers work, organizations can expand test coverage and reduce release timelines without compromising accuracy.

See the newest features in action during the Platform Pulse webinar, coming October 22.

Align Development and Design: Figma Plugin

Bridge Design & Code with the Figma Plugin

The new Applitools Figma Plugin connects design intent with implementation through seamless Figma design testing and design-to-code comparison. Designers and developers can now validate visual consistency earlier—and with far less manual effort.

What You Can Do

Export Figma frames directly into Eyes for automated design-to-code validation.
Compare design-to-design versions to manage iteration cycles.
Maintain shared baselines so designers and developers work from the same source of truth.
Designers can validate implementations without writing code, while developers see exactly what “done” looks like.

Use Case Example:
A designer exports a component from Figma and compares it with the implemented version in Eyes to confirm that spacing, colors, and typography match the original design. Any differences appear immediately, allowing the designer and developer to resolve them early instead of during QA or after release.

This plugin bridges design and development, reducing handoff friction and review cycles. Teams gain a single source of truth for UI quality and deliver products that match design intent from concept to code.

Improve Traceability and Accountability: Dashboard Optimizations

Context-Rich Results with Dashboard Optimizations

Eyes 10.22 introduces dashboard enhancements that bring clarity and accountability to every test result. These updates help QA engineers, SDETs, and technical leaders focus on what matters most.

Key Highlights

Commit SHA and branch info appear directly in batch details for instant traceability.
“Required Attention First” sorting automatically surfaces unresolved and failed tests.
Auto-grouping mirrors your Storybook structure, making results easier to navigate.
Streamlined triage reduces noise from passed tests and accelerates review.

Use Case Example:
A QA engineer reviewing test results spots a visual regression and uses the Eyes dashboard to trace it back to the specific commit that introduced the change. With commit and branch details in context, the engineer can alert the right developer and close the feedback loop quickly.

These context-rich results strengthen accountability and compliance across teams. Organizations gain the visibility they need for auditability—particularly valuable for industries with governance or regulatory requirements.

Explore all the updates and improvements in the Eyes 10.22 Release Notes.

Broader Strategic Impact

Eyes 10.22 delivers improvements that go beyond new capabilities. Together, these updates create a foundation for stronger collaboration and faster releases across the organization:

Align Development & Design Around a Single Source of Truth – With the Storybook addon and Figma plugin, teams collaborate around shared visual baselines.
Scale Quality Without Slowing Delivery – Native testing inside development tools ensures visual coverage grows with velocity.
Improve Traceability & Accountability – Git-linked context and prioritized review queues make it easier to understand, resolve, and communicate changes.
Drive Shift-Left Adoption Across the Organization – With validation embedded where work happens, teams can catch issues earlier and release with confidence.

Quality That Scales: Visual AI for Every Team

With Applitools Eyes 10.22, teams can test where they build—bringing Visual AI testing directly into their design and development workflows. By embedding validation inside Storybook, Figma, and the Eyes dashboard, organizations can scale software quality without adding maintenance or slowing delivery. Developers, designers, and QA now share one intelligent workflow powered by Visual AI—delivering expanded test coverage, faster feedback, and less maintenance across every stage of the testing lifecycle. Learn more about Applitools Eyes.

The post Test Where You Build with Eyes 10.22: Visual AI for Storybook & Figma appeared first on AI-Powered End-to-End Testing | Applitools.

AI-Powered End-to-End Testing | Applitools

Engineering a Playwright-Native Developer Experience: One Flag, Three Strategies

Adding unresolved to Playwright

We don’t block test execution when Applitools is rendering

The “Draining Queue” Effect

Solving the Matrix Problem

The Strategy: failTestsOnDiff

​Strategy A—Recommended for CI: failTestsOnDiff: false

Strategy B—Recommended for Strict Pipelines: failTestsOnDiff: 'afterAll'

​Strategy C—Recommended for Local Debugging: failTestsOnDiff: 'afterEach'

TL;DR – When to use each setting

Closing the Visibility Gap: The Custom Reporter

Summary

Quick Answers

A New Chapter in Customer Success at Applitools

By Kunal Rao, Chief Customer Officer, Applitools

Aligning Early on What Success Looks Like

Accelerating Time to Value

Proactive Engagement—Before Issues Become Blockers

Clear Progress and Shared Accountability

Coordinated Support Across Teams

What Happens Next

Our Commitment

What Test Execution Demands That Generative AI Can’t Guarantee

Where generative AI fits well in testing

Why test execution is fundamentally different

How probabilistic execution creates real problems

When failures aren’t repeatable, teams stop trusting their tests—and that’s when automation becomes a bottleneck instead of a benefit.

Execution amplifies risk: security, governance, and explainability

Why deterministic execution matters at scale

Rethinking AI’s role in execution

What this means for engineering and QE teams

Choosing confidence over convenience

Watch Shaping Your 2026 Testing Strategy now.

Quick Answers

AI Testing in 2026: Why Signal, Trust, and Intentional Choices Matter More Than Ever

AI is no longer optional in testing

More AI hasn’t reduced pressure on QA—it’s increased it

As Applitools CEO Anand Sundaram recently described, the imbalance is real:

“You have more code to be tested, sometimes not the best code, more coverage required, and fewer people.”

The real bottleneck is signal-to-noise

Not all AI is suitable for testing decisions

Trust, explainability, and repeatability matter more than novelty

What this means for testing strategy in 2026

Choosing progress over noise

Watch Shaping Your 2026 Testing Strategy now.

Quick Answers

Applitools Named a Strong Performer in The Forrester Wave™: Autonomous Testing Platforms Report, Q4 2025

The momentum behind autonomous testing

What buyers should look for in autonomous testing platforms

How the report characterizes Applitools

“(Applitools) It features Visual AI to validate UI accuracy across web, mobile, and native apps and support modern digital experiences at scale.”

“Applitools stands out for innovation, gaining an above-par score due to its Visual AI and ML-driven resilience that reduce test maintenance and improve accuracy.”

What this can mean for engineering, QA, and design teams in 2025

Read the report

AI Testing in Regulated Environments: Smarter Testing Starts With Stability, Not More Code

Why traditional automation keeps slowing teams down

Why AI-generated test code hasn’t fixed the problem

Why live LLM-driven execution creates instability

A clearer path forward: intent-driven authoring with deterministic execution

What this looks like in practice

What this means for oversight and compliance

If you’re assessing how well your testing workflow supports stability and audit readiness, request our Governance Readiness Checklist. We’ll share the version designed for your stage—whether you’re evaluating Applitools or optimizing an existing deployment.

Frequently Asked Questions

Buyer’s Checklist for Autonomous Testing in Regulated Environments

Rethinking Autonomy for Regulated Teams

Core Capabilities Every Autonomous Testing Platform Should Provide

Where Advanced Platforms Differentiate

Applying Autonomous Testing in Regulated Workflows

Applying the Checklist to Your Evaluation Process

Choosing with Confidence

Frequently Asked Questions

Agentic Automation: Preparing QA Leaders for the Next Leap in Testing

From Automation to Intelligence

The Three Business Values Driving This Shift

How AI-Augmented Systems Complement Human Expertise

The Role of Visual and Experience Validation

Building Toward AI-Augmented Readiness

What QA Leaders Can Do Next

Frequently Asked Questions

Adding `unresolved` to Playwright

The Strategy: `failTestsOnDiff`

Strategy A—Recommended for CI: `failTestsOnDiff: false`

Strategy B—Recommended for Strict Pipelines: `failTestsOnDiff: 'afterAll'`

Strategy C—Recommended for Local Debugging: `failTestsOnDiff: 'afterEach'`