Shopify · Rollouts & SimGym

Shopify Rollouts and SimGym: A Product Photo Testing Checklist for Merchants

How to test Shopify product photos with Rollouts (native theme A/B testing) and SimGym (AI shopper simulations) — what each tool can and can't do, a pre-flight checklist, which photo treatments to test at the theme level, an experiment-design workflow, what to measure, common mistakes, and an FAQ.

Prodofoto Team·June 10, 2026·11 min read

Checklist layout comparing two Shopify product-photo variants in an A/B test, with the winning variant marked by a check — Use this checklist to design product-photo experiments that Shopify Rollouts and SimGym can actually read — one variable at a time, with a metric chosen up front.

Quick answer

To test product photos with Shopify Rollouts (native theme A/B testing) and SimGym (AI shopper simulations), start from one fact: both work at the theme level today. So test photo treatments your theme controls — gallery layout, crop, zoom, or whether a lifestyle section shows — and change one variable at a time. Decide a primary metric before you launch, pre-screen with SimGym to drop obvious losers, then validate the survivor with a staged Rollout on real traffic. For swapping the actual image files, use image-level A/B testing instead.

Rollouts and SimGym are recent Shopify features whose availability and capabilities are evolving — confirm what's enabled in your admin and in Shopify's documentation. The testing discipline here is durable as of June 10, 2026.

What Rollouts and SimGym are — and what they don't do

Rolloutsis Shopify's native way to schedule theme changes and A/B test theme variations from the admin. Because it runs server-side, traffic is split between theme variants without an external script and without the flicker bolt-on tools can cause. SimGym is a first-party app that sends AI shopper agents, trained on aggregate behavioral data from across Shopify, through your store to compare variants and surface recommendations in minutes — before any real customer sees them.

The honest caveat is the one that shapes everything else: both tools compare whole themes, and SimGym's results are directional, not a guarantee that your real shoppers will behave the same way. Availability is still rolling out and differs by store and plan. None of that makes them less useful — it just means you design photo tests around what a theme-level, partly-synthetic workflow can actually measure.

If you're new to photo experimentation, the fundamentals — what to test, how to size a test, and how to read significance — are covered in our guide to A/B testing Shopify product photos. This article stays on the Rollouts-and-SimGym workflow specifically rather than repeating those basics.

The pre-flight checklist

Each row is one thing to confirm before you launch a product-photo experiment. The goal is a test whose result you can trust and act on, not a dashboard you have to argue about afterward.

Check	Skip this	Do this	Why
One clear hypothesis What the experiment is actually asking.	swapping several photos and a layout tweak at once, so you can't tell what moved the metric	changing one photo variable per experiment — the hero treatment, the gallery layout, or a single section	A confounded test can't be read. Isolating one variable is what turns a result into a decision you can act on.
A decision metric set up front How you'll judge the winner.	deciding after the fact and calling a winner on clicks or a vanity number	picking one primary metric before launch — add-to-cart or conversion rate on the affected pages	A metric chosen after you've seen the data invites cherry-picking. Committing first keeps the test honest.
Comparable production quality Whether the comparison is fair.	pitting a polished new shot against an old, low-resolution one	holding technical quality constant so you test the creative idea, not the resolution	If one variant simply looks sharper or better-lit, you've measured production value — not the hypothesis you meant to test.
Enough traffic and time Whether the result can reach significance.	stopping the moment one variant edges ahead	sizing the test to your real traffic and letting it run a full cycle, weekends included	Early peeking and underpowered tests produce false winners that don't replicate when you ship them.
A synthetic pre-screen Catching obvious losers before live traffic.	spending real-traffic days on a variant a simulation could have flagged in minutes	using SimGym to pre-screen variants synthetically, then validating the survivors with a Rollout	Simulation is fast and costs no traffic, but it's directional. Real-traffic validation still decides the winner.
Honest, on-brand imagery Whether a win is safe to keep.	winning with an exaggerated or misleading image that lifts clicks but raises returns	keeping every variant truthful to the product so a conversion lift doesn't become a returns problem	A test that optimizes a misleading image can win in the short term and quietly lose on returns and trust.

For what a strong gallery should contain in the first place — how many photos and in what order — the product gallery sequence is the reference to test against.

What to test at the theme level

Because Rollouts compares theme variants, the cleanest photo tests are the ones your theme controls. You duplicate the theme, change one presentation setting in the copy, and let the experiment attribute any difference to that setting.

Photo treatment	Example variant	How to set it up
Gallery layout	a thumbnail rail vs a stacked scroll, or a square vs a taller crop	These are theme settings, so a duplicate-theme Rollout isolates them cleanly — the only difference between variants is the layout, not the photos themselves.
A lifestyle section on the PDP	a lifestyle or lookbook block shown vs hidden below the gallery	Toggle the section in the variant theme and compare add-to-cart. Because the section lives in the theme, the Rollout attributes the difference to it.
Aspect ratio and zoom behavior	a taller crop with hover-zoom vs a square image with no zoom	Both are controlled by the theme, so the variant difference is the crop and zoom experience — a fair test of presentation rather than of the image files.

The image-level caveat. Swapping the actual hero file or reordering a product's media changes what everyvisitor sees, no matter which theme variant they land in — so a theme-level Rollout can't isolate it. To test the images themselves, lean on the image-level methods in the product photo A/B testing guide. A classic example worth running there is lifestyle vs white-background heroes.

The experiment-design workflow

Five steps take a photo idea from hypothesis to a decision you can defend, using SimGym to triage and Rollouts to validate.

Start with one hypothesis worth testing. Pick a single photo treatment your theme controls — gallery layout, crop, or whether a lifestyle section shows — on a high-traffic template where a real difference would matter. Write down the metric you expect it to move.
Build the variant as a duplicate theme. Duplicate your live theme and change only the photo treatment in the copy. Keep everything else identical so the experiment isolates the one variable, and confirm both variants render correctly on mobile.
Pre-screen with SimGym before spending live traffic. Run the variant against your current theme with SimGym's simulated shoppers to catch obvious problems in minutes. Treat the result as directional triage — a reason to keep or drop a variant — not the final verdict.
Validate the survivor with a staged Rollout. Split real traffic between the current theme and the variant with Rollouts. Because it runs server-side there's no flicker, and a staged ramp limits exposure while the test gathers data. Let it run a full cycle before reading it.
Read the decision metric, then ship or iterate. Keep the winner only if it holds on the primary metric you committed to up front and doesn't raise returns. If the result is flat or noisy, refine the hypothesis and run the next isolated test rather than stacking changes.

Producing the variant imagery itself shouldn't be the bottleneck. The same single-image-to-many-variants approach behind generating AI lifestyle product photos lets you stand up a credible alternate treatment without booking a shoot for every test.

Measurement plan

A photo test is only as good as the metric you commit to before it runs:

One primary metric— usually add-to-cart or conversion rate on the affected pages, decided up front so the call isn't made after seeing the data.
Enough traffic over a full cycle — let the Rollout gather data across weekends and a normal demand cycle before reading it, rather than stopping at the first lead.
Returns and refunds— a conversion lift that comes with more returns isn't a real win; watch the post-purchase signal, not just the page metric.
Simulation vs reality gap — note where SimGym's prediction and the live Rollout disagree, and trust the real-traffic result when they do.

To fold photo testing into a broader image program — quality, gallery, SEO, and conversion — work from the Shopify product photography conversion checklist, and if you're preparing images for AI surfaces too, see how to prepare product photos for AI shopping agents.

Six common testing mistakes

Treating a SimGym result as the final verdict. Simulations are fast, directional triage trained on aggregate shopper behavior. They don't guarantee your store's real customers will respond the same way. Use them to screen out obvious losers, then validate the winner with real traffic.
Changing several things at once. A new hero plus a reordered gallery plus a layout tweak can't be untangled afterward — you'll know something moved but not what. Isolate one photo variable per experiment so the result points at a specific change.
Testing image files with a theme-level tool. Swapping the actual photo or reordering a product's media changes what every visitor sees regardless of which theme variant they're in, so a theme-level Rollout won't isolate it. Use image-level A/B testing methods for that kind of test instead.
Calling a winner too early. Peeking at the dashboard and stopping at the first lead is how underpowered tests manufacture false winners. Size and time the test before you launch, and let it finish even when an early result looks exciting.
Comparing unequal production quality. If one variant is simply a sharper, better-lit shot, you've tested production value rather than the creative idea. Hold technical quality constant across variants so the test measures the thing you actually want to learn.
Optimizing a misleading image. An exaggerated or over-styled shot can win clicks and even conversions in the short term while quietly raising returns. Keep every variant honest to the product so a win on the page doesn't become a loss after delivery.

FAQ

What are Shopify Rollouts?

Rollouts is Shopify's native way to schedule theme changes and run A/B tests between theme variations, built into the admin. It runs server-side, so traffic is split between theme variants without an external script and without the flicker that bolt-on testing tools can cause. Availability is rolling out and may differ by store and plan, so confirm what's enabled in your own admin and in Shopify's documentation before planning a test.

What is Shopify SimGym?

SimGym is a first-party Shopify app that sends AI shopper agents — trained on aggregate behavioral data from across Shopify — through your store to compare variants and surface recommendations in minutes, before any real customer sees them. Its current focus is theme-to-theme comparison. Treat its results as directional rather than a guaranteed predictor of real conversions, and verify the app's current capabilities, since the product is evolving.

Can I A/B test individual product photos with Rollouts?

Rollouts compares theme variants, so it's best suited to photo treatments the theme controls — gallery layout, crop and aspect ratio, zoom behavior, and whether a lifestyle section appears. Swapping the actual image file or reordering a product's media changes what every visitor sees regardless of the variant they land in, so a theme-level Rollout won't isolate that change. For testing the images themselves, use image-level A/B testing methods rather than a theme experiment.

Do I still need real-traffic testing if SimGym is faster?

Yes. SimGym is a fast, low-cost pre-screen that catches obvious losers without spending live traffic, but a simulation is not the same as your real shoppers behaving a certain way. The reliable pattern is to use the two together: simulate to triage variants, then validate the survivor with a staged Rollout on real traffic before you commit to it.

How do I know whether a photo variant actually won?

Decide one primary metric before you launch — usually add-to-cart or conversion rate on the affected pages — and let the test gather enough traffic over a full cycle, including weekends. Then check that the lift holds on that metric and doesn't come with a rise in returns. Avoid judging a photo test on clicks alone, which can move without any change in sales.

For the current availability and setup of Rollouts and SimGym, consult the official Shopify Editions: Winter '26 and the Shopify Help Center.

Generate the photo variants your tests need

Rollouts and SimGym tell you which photo treatment wins — but you still need credible variants to test. Prodofoto turns a single clean product photo into multiple Shopify-ready variants, so you can stand up an alternate hero or lifestyle treatment for an experiment without booking a shoot. For a walkthrough, read how to generate AI lifestyle product photos in 60 seconds.

Install Prodofoto on Shopify See example product photos