All posts
Published at Tue Mar 08 2022 in
Tech

Writing a LocalFileComparator with Threshold for Flutter Golden Tests

Bernardo Belchior
Bernardo Belchior
49883234 1

At Rows, we use Flutter Desktop to build our native desktop app for macOS and Windows. To ensure everything looks as expected in Rows’s native desktop app, we leverage so-called “golden tests”. These are visual tests where a widget is compared to a screenshot of a previous run. If there is any difference — even if it’s only a pixel — the test will fail.

Recently, we found some discrepancies in golden tests when running them locally and in the pipeline and wrote a LocalFileComparator with a configurable threshold to ignore irrelevant differences between these environments.

Here’s how we did it.

Testing our App

At Rows, we have different developers using different operating systems and laptops. There’s Windows and macOS, there’s M1 MacBooks and Intel MacBooks, etc. These platforms result in several differences when running golden tests since they render things differently from each other.

As a first step to solve these inconsistencies, we moved over to Docker for running golden tests. It wasn’t easy, but it did bear fruits since now we could run golden tests in macOS and Windows without having to worry about operating system differences.

This worked until running the tests in the pipeline.

Pipeline Problems with Golden Tests

We use GitHub Actions for running tests, creating releases, and publishing new versions of our desktop app. After introducing Docker to run our golden tests, we moved over to ubuntu-latest (from macos-latest) as the platform to run our tests on. This would, in theory, allow us to visual run tests consistently across engineers’ laptops and the pipeline, since they’re running on Docker.

But… in practice, this didn’t happen. We started seeing some failing tests:

49883234 2

Golden test failing when running in our pipeline.

Notice how the diff percentage is very low: 0.01%.

We did some digging and eventually discovered that this is an architecture issue. GitHub Actions runs x86 Ubuntu machines, however, most of our engineers are using M1 MacBooks.

Since GitHub Actions does not support ARM machines and the difference in our failing golden tests was never greater than 0.02%, we decided to compromise and add a threshold of 0.02% in golden tests.

We felt safe with this value since it is high enough for these architecture-specific differences, but too low for most visually significant changes. This means that tests should only fail when there are meaningful differences in screenshots.

Writing a File Comparator with Threshold

As it turns out, flutter_test’s LocalFileComparator does not support setting a threshold, so we had to implement it ourselves.

To achieve that, we created a new LocalFileComparatorWithThreshold that extends LocalFileComparator and overrides its compare method to take the threshold into account:

49883234 4

LocalFileComparatorWithThreshold (source)

The main part of this implementation is the compare method. It uses the default GoldenFileComparator’s compareLists function to compare the current image with the baseline.

It returns a [object Object], which includes, among other fields, a passed flag indicating whether the two images are exactly equal (i.e., pixel-by-pixel), and a diffPercent field containing the percentage of pixels in which the two images differ. The only difference from LocalFileComparator’s compare method is that we added the conditional to check if the passed is false and the diffPercent is lower than the threshold we set. In that case, we override the default behavior and return true, meaning that the flutter_test framework should consider the test as successful.

In that case, we also print a message to warn the user that the test only passed due to having the threshold. Otherwise, it would have failed. Finally, after creating our new file comparator, we need to configure our tests to use it.

Switching over to new implementation

According to the documentation of  [object Object], we can use a file named flutter_test_config.dart to customize how tests are run. This file should be placed in the directory in which tests with the new comparator will run. If you want to use the comparator on all tests, make sure this file is on the tests’ directory root. Reading flutter_test’s code, you can see that the goldenFileComparator is set globally, so this means we can easily override it to use our comparator:

49883234 4

flutter_test_config.dart (source)

Here we check if the goldenFileComparator is of type LocalFileComparator and throw an exception if it isn’t. This is required because we need the basedir field which is available in the LocalFileComparator. If it isn’t present we should abort running the tests since it isn’t possible to run them properly.

If this precondition is met, we instantiate our file comparator and override the default goldenFileComparator with our new implementation. Afterward, we execute the test itself, by calling testMain() and everything else proceeds as normal.


And that’s it! 🎉 From now on, golden tests will use our new file comparator with threshold and we’ll be able to skip over minor details!

We continue building the spreadsheet with superpowers. Get started today for free at rows.com.