UFG Archives - AI-Powered End-to-End Testing | Applitools

Engineering a Playwright-Native Developer Experience: One Flag, Three Strategies

Noam Gaash — Thu, 19 Mar 2026 20:19:13 +0000

Hello everyone! I’m Noam, an SDK developer on the Applitools JS-SDKs team. While my day-to-day focus is on core engineering, I work closely with our field teams and occasionally join technical deep-dive sessions with customers.

In these conversations, we frequently encounter questions about performance and the engineering philosophy behind our integration. Specifically, there is often curiosity about how to make visual testing feel more “Playwright-native” and natural to developers.

In this post, I’ll share the design logic behind these architectural choices so you can apply these patterns in your own CI pipelines in a way that fits your organization’s needs.

Adding `unresolved` to Playwright

Integrating visual regression testing into Playwright requires combining two different status models: Playwright’s binary Pass/Fail and the visual testing concept of unresolved.

In visual testing, instead of having two (passed and failed) states, there’s an additional third state: unresolved. This state indicates a difference was detected, but a human decision is required to determine if it is a bug or a valid change that should be approved as a new baseline.

Playwright doesn’t support this third state out of the box. Visual test maintenance using Playwright’s native toHaveScreenshot API forces the developer into a cumbersome cycle requiring three separate test executions:

First, the developer needs to run to see the failure.
Then, they need to run with the --update-snapshots flag to create new baseline images.
Then, most developers would run again to validate that everything works with the updated baseline as expected—which isn’t always the case, because the Playwright native comparison method (pixelmatch) tends to be very flaky, unlike Visual AI.

After this local cycle, the developer must commit the new baseline images to the repository—bloating the git history—and wait for a new CI execution to provide final feedback. For dev-centered organizations that focus on feedback loop velocity, this workflow is… suboptimal. Personally, I believe that’s one of the reasons visual testing isn’t as popular as it should be among Playwright users.

When we engineered the Applitools fixture, one of our goals was to support this Unresolved state natively, without disrupting Playwright’s core lifecycle—specifically its Worker Processes and Retry mechanisms.

The solution rests on two key engineering decisions: moving rendering to the background (async architecture) and giving developers control over the exit signal and performance tradeoffs (failTestsOnDiff).

We don’t block test execution when Applitools is rendering

The core value of visual testing lies in AI-based comparison to eliminate false positives and multi-platform rendering.

Architecturally, these processes are cloud-native services.

AI-as-a-Service: Just like massive LLMs or other generative models, the Visual AI engine runs on specialized cloud infrastructure optimized for heavy inference. It cannot simply be “installed” on a lightweight CI agent.
Platform Constraints: Authentic cross-platform rendering (e.g., iOS Safari on a Linux CI agent) is physically impossible on a single local machine.

Since these operations inherently occur remotely, performing them synchronously would force the local test runner to idle while waiting for network round-trips and cloud processing.

To solve this, we designed the fixture around an asynchronous architecture:

Instant Capture: When eyes.check() is called, we synchronously capture the DOM and CSS resources (instead of a rasterized image). This operation is extremely fast.
Immediate Release: We purposefully use soft assertions by design. We release the Playwright test thread immediately so the functional logic can proceed to the next step or test case without blocking.
Background Heavy Lifting: The heavy work—uploading assets, rendering across different browsers and operating systems, and performing the AI comparison in the Applitools cloud—starts immediately in the background, managed by the Worker process.

The “Draining Queue” Effect

This architecture explains why the Playwright Worker sometimes remains active after the final test completes.

The background tasks are limited only by your account’s concurrency settings, and the screenshot size. For example, when rendering a 10,000 px page on a small mobile device, the rendering infrastructure might need time for scrolling and stitching. If your functional tests execute faster than the background workers can process the queue (rendering & comparing), the Worker process stays alive at the end solely to “drain the queue” and ensure data integrity.

While it does ensure your test logic runs at maximum speed, offloading the processing cost to the background, this experience might cause friction and frustration as the developers see that workers are “hanging” after tests are completed. When facing such issues, our support team is here to advise and assist with various solutions—we can investigate execution logs and if needed even make custom suggestions to tailor Eyes-Playwright to your needs.

Solving the Matrix Problem

Standard Playwright documentation recommends defining multiple projects in playwright.config.ts to cover different browsers (Chromium, Firefox, WebKit) and various viewport sizes.

While this ensures coverage, it introduces a linear performance penalty (O(N)). To test three browsers across two viewports, your CI must execute the functional logic (clicks, waits, navigation) six times. It’s 6x more load on the CI machine and the testing environment.

We recommend shifting this workload to the Ultrafast Grid (UFG).

In this mode, you execute the Playwright test once, typically on Chromium. We upload the DOM state, and our cloud infrastructure renders that state across all configured browsers and viewports in parallel.

This transforms an O(N) execution problem into an O(1) execution problem, significantly shortening the feedback loop.

The Strategy: `failTestsOnDiff`

Since the actual comparison happens asynchronously and potentially completes after the test logic finishes, we need a mechanism to map the visual result back to the Playwright status.

This is controlled by the failTestsOnDiff flag. It’s not just a boolean; it’s a strategic choice for your CI pipeline.

Strategy A—Recommended for CI: `failTestsOnDiff: false`

The Logic: This is the configuration our own Front-End team uses. We believe that Visual Change ≠ Test Failure.
Behavior: The Playwright test passes (Green). The unresolved status is reported externally via our SCM integration (GitHub/GitLab).
Why: Retrying a visual test is computationally wasteful—the pixels won’t change on the second run. By keeping the test “Green,” we avoid triggering Playwright’s retry mechanism. The decision is moved to the Pull Request, where it belongs.

Read more about SCM integration or hop directly to our GitHub, Bitbucket, Gitlab or Azure Devops articles.

Strategy B—Recommended for Strict Pipelines: `failTestsOnDiff: 'afterAll'`

The Logic: You need a “Red” pipeline to block deployment, but you want to avoid the noise of retries and gain a significant performance improvement.
Behavior: Individual tests pass, but the Worker Process exits with a failure code if any diffs were found in the suite.
Why: This provides a hard gatekeeper for the build status. It allows the Eyes rendering farms to continue processing visual test results in the background without blocking the execution thread, allowing the worker to move on to handle more tests efficiently.

Strategy C—Recommended for Local Debugging: `failTestsOnDiff: 'afterEach'`

The Logic: Immediate feedback loop.
Behavior: Fails the test immediately in the afterEach hook.
Why: Best for local development where you want to see the failure immediately in the console. It is also useful if you use the trace: retainOnFailure setting in Playwright, as it ensures traces are preserved for unresolved visual assertions. Not recommended for CI due to the retry loops described above.

TL;DR – When to use each setting

Mode	`afterEach`	`afterAll`	`false`
Performance	Less performant The Playwright worker will wait after each test for all renders to be completed and for the visual AI to compare the results	Best performance The Playwright workers will collect the resources and manage the rendering and Visual AI comparisons in the background	Best performance Similar to `afterAll`
Observability	Best Applitools reporter will show all statuses correctly, other reporters will consider `unresolved` tests as failing	Good Applitools reporter will show all statuses correctly, other reporters will consider `unresolved` tests as passing. You will get a failure of the worker process, and other reporters won’t link it to a specific test case.	Great in pull request (If SCM integration is enabled). The Applitools reporter will reflect the tests perfectly. Other reporters will consider `unresolved` tests as passing.
Best fit	Local testing	Local testing AND CI environments without SCM integration	CI environments with SCM integration

Closing the Visibility Gap: The Custom Reporter

If you adopt Strategy A (false) or Strategy B (afterAll), you introduce a secondary challenge: Visibility.

Since Playwright technically marks these tests as Passed to avoid retries, the standard Playwright HTML Report will show them as “Green,” potentially masking unresolved visual differences that require attention.

To bridge this gap without forcing developers to switch context, we developed a Custom Applitools Reporter.

This reporter extends the standard Playwright HTML report. It injects the actual visual status (Passed, Failed, or unresolved) directly into the test results view.

True Status: You see which tests have visual diffs, even if the Playwright exit code was successful.
Direct Links: It provides a direct link from the test report to the specific batch results in the Applitools Dashboard.
Context: It enriches the report with UFG render status and batch information.

This ensures you get the best of both worlds: The optimization of a “Green” CI run (no retries), with the transparency of a report that highlights exactly where manual review is needed.

Summary

The Applitools Playwright fixture is designed to be non-blocking and scalable. By leveraging asynchronous architecture and Applitools UltraFast Grid, we offload the heavy lifting from your CI. By correctly configuring failTestsOnDiff, you ensure that your pipeline reflects your team’s engineering culture—whether that’s strict gating or modern, PR-based visual review.

Quick Answers

What is visual regression testing in Playwright

Visual regression testing in Playwright verifies that changes to an application’s UI do not introduce unintended visual differences. Playwright can perform basic visual regression checks using screenshot comparisons like toHaveScreenshot, while dedicated visual testing tools (such as Applitools Eyes) extend this by detecting meaningful UI changes, managing baselines, and enabling review workflows for approving visual updates.

What is the best way to do visual testing in Playwright?

Playwright supports basic visual testing through screenshot comparisons such as toHaveScreenshot, but this approach can become difficult to maintain at scale. Dedicated visual testing tools, like Applitools Eyes, extend Playwright by adding Visual AI comparison, cross-browser rendering, and review workflows that allow teams to detect visual regressions without maintaining large sets of screenshot baselines.

How does Playwright screenshot testing (toHaveScreenshot) compare to visual regression testing tools?

Playwright’s toHaveScreenshot performs pixel-by-pixel image comparisons against stored baseline images. While this works for simple cases, it often requires updating and maintaining many snapshots. Visual regression testing tools like Applitools Eyes use Visual AI to detect meaningful UI changes while ignoring insignificant rendering differences, provide review workflows to approve or reject visual changes, and allows custom match levels for different regions of the screen.

Can Playwright run visual tests across multiple browsers and devices

Yes, but with a limited scope. Natively, Playwright supports three browser engines (Chromium, Firefox, and WebKit), but it does not execute tests across different real operating systems or mobile devices. This lack of OS-level rendering limits coverage and imposes a risk of missing platform-specific visual bugs. For example, see how a frontend team caught a visual bug specific to Mac Retina screens that a standard engine check would miss.

How can you run cross-browser visual tests in Playwright without running tests multiple times?

Normally, cross-browser testing requires executing the same tests separately for each browser configuration. Tools like Applitools Ultrafast Grid allow tests to run once while visual rendering is executed across multiple browsers and viewport combinations in parallel. This removes the need to multiply test execution across the full browser matrix.

Why is cross-browser testing in Playwright so slow?

Natively, cross-browser testing introduces a significant performance penalty. Playwright must execute the entire test logic (clicks, waits, network requests) separately for every browser and viewport configuration. Modern visual testing tools (e.g., Applitools Ultrafast Grid) eliminate this overhead by executing the test logic just once locally, performing the cross-browser rendering and visual comparison in parallel in the cloud.

The post Engineering a Playwright-Native Developer Experience: One Flag, Three Strategies appeared first on AI-Powered End-to-End Testing | Applitools.

How We Identified and Resolved a Bug Before Release Using Applitools Ultrafast Grid

Noam Gaash — Tue, 07 Feb 2023 16:43:16 +0000

This is a story about how standard tests were not able to identify a bug, as the CSS and HTML files were valid, but when rendered on Chrome on a Mac images were not displayed correctly. Applitools Ultrafast Grid (UFG) helped us identify the bug at an early stage in development, before deploying the change. These types of bugs are a regular occurrence in any organization, and without UFG, these bugs can easily make it to production and remain there undetected until a customer complains about the problem. Translation support from Michael Sedley.

Front end development is complicated, and it involves a wide range of knowledge and tools to develop web applications. Regression testing across different systems, browsers, and devices makes it almost impossible to be sure that an application will display correctly on every system and there are no visual regressions as a result of a minor code change.

A real-life example occurred to me during my first week as an Applitools employee, when I fixed a minor bug using my Linux machine. Inadvertently, in the process, I created a more serious visual bug which was only visible on certain devices.

Had UFG not alerted me to the bug, the code would have gone to production and the result would have affected the usability of our flagship product on a Mac. This would have reflected badly on the professionalism of the company and would have affected the company representation, trust in our product, and sales.

Understanding the problem

In recent months, we improved Applitools Eyes’ ability to perform visual testing on images which are semi-transparent. In the past, Eyes would test an entire screen or defined region, but now using the Storybook SDK, users can automatically test each component separately, without needing to define a test for each component.

For example, when testing a gallery component, Eyes can identify visual bugs and regressions over all screen elements, including the appearance of buttons, controls, fonts, shadows, images, as well as backgrounds that include a transparency gradient.

Figure 1: Transparent Background

After implementing the transparency feature, a visual bug was reported. In a screen capture of a semi-transparent screen region, unexpected grid lines appeared on top of the tested image.

The root cause of these lines wasn’t clear, so as a first step, we developed a test plan to reproduce the issue. I created a semi-transparent image, all gray (rgb = 127,127,127), with a constant alpha (transparency) channel (alpha=0.5). Fortunately, the bug was easily reproduced and I managed to create easily identifiable grid lines:

As I experimented with different transparency settings, it was clear that the color of the grid lines was the same as the color of the image, and it became stronger as the image transparency was lower.

After further investigation, I discovered that the image viewer component uses tiles to represent large images, and the tiles had one pixel overlap. In the past, when all images were RBG images with no transparency, the overlapping pixel was not visible to the human eye. When I added semi-transparency, as the adjacent tiles were stacked on top of each other, sampling the outcoming color inside the grid line produced a 192 grayscale value, which is the exact outcome of stacking two half transparent gray layers over a white background:

(white ⋅ 0.5 + gray ⋅ 0.5) ⋅ 0.5 + gray ⋅ 0.5
given (white:=255, gray:=128)

Solving the preliminary bug

To resolve this bug, I recalculated the position and scale of each region so that there would not be an overlap and the line between regions would not appear.
For example, if the first tile had width: 480px and left: 0, the next adjacent tile should be positioned using left: 480, so that there is a zero pixel overlap between tiles.

I tested the results on my local (Linux) machine and assumed that the issue was resolved.

I didn’t realize that when I fixed this bug, I had also created a new issue which would have been almost impossible to anticipate.

How UFG identified the bug I created before deployment

At Applitools, we understand the importance of quality visual testing across browsers, so before deployment, every code change that impacts the Eyes interface must be tested by Applitools Eyes using UFG.

We are proud to “eat our own dogfood.” We rely on our visual testing tools to make sure that our products are visually perfect before release.

Our integration pipeline is configured to use UFG to test the UI change on multiple devices and screen settings so that we can confirm that the interface is consistent on every browser, operating system, and screen size.

We discovered that fixing the bug of a one pixel overlap created a new bug on certain systems where there was a gap visible between tiles. Frustratingly, this bug was not reproducible in any of the devices used in development, and could not have been discovered with conventional manual visual testing.

The bug was only visible on screens with Retina Display which uses HiDPI.

What was interesting about this bug, is that it highlighted an inconsistency in the way that the same browser (Chrome) displays the same UI on different screen types.

What happened?

The bug and the solution

After some research, it turns out that there is (seemingly) a bug in the way Chrome behaves on Mac computers with Retina display (see 1, 2, 3). It turns out that using percentages or fractions of pixels for positioning and scaling of elements can lead to unexpected results.

So, what is the solution?

The solution itself is very elegant – all we had to do was to round each canvas scaling so that the canvas size would always be an integer:

scale = Math.round(scale * canvasSize) / canvasSize;

Thus, if the width of the canvas is 480 and our scale factor is 0.17, the width of our scaled canvas would not be 480 * 0.17 = 81.6, but would be 82 – this way we maintain compatibility with Retina displays and prevent unwanted gaps from being created.

This bug was easy to resolve once we were aware of it, but without UFG we would never have identified it using any of our test computers.

Conclusion

Maintaining a quality front end for all configurations is an ongoing challenge in every company and every organization.

Solving a bug for one audience can create a bigger bug for a wider audience. In this article, we saw a classic example of a malfunction, where the initial solution we implemented only made things worse.

The number of users who use Applitools Eyes for testing semi-transparent components is significantly lower than the number of eyes users who work with Retina displays (most Apple users) – so the initial approach we took to solve the problem could have caused significantly more harm than good. Even worse – we could have caused significant damage to the user experience, and not know about it. No modern organization wants to rely on frustrated customer feedback to discover bugs in their application or websites.

Using UFG reduces the likelihood that errors of this type will pass under the radar and allows developers, product managers, and all stakeholders in the development process to significantly reduce the fear factor in deploying new features. The UFG is insurance against platform-dependent visual bugs and provides the ability to perform true multi-platform coverage.

Don’t wait to discover your visual bugs from user reports. We invite you to try UFG – our team of experts is here to help with any questions or problems, and assist you migrate Applitools Eyes and UFG into your integration pipeline. For more information, see Introduction to the Ultrafast Grid in the Applitools Knowledge Center.

The post How We Identified and Resolved a Bug Before Release Using Applitools Ultrafast Grid appeared first on AI-Powered End-to-End Testing | Applitools.