
I've been wanting to write this post for a while, though I always had a slightly different format in mind. But after seeing this incredibly delightful and generous post by messaging app Drift, I knew I had to steal borrow their format (sharing is caring, right?).
If you're like me, you have multiple email accounts: one for work, one for pleasure and - admittedly - one for newsletters and other spam. As you can likely empathize with, I get a buttload of emails every day. Of which I read very few.
Here are some reasons why I actually read an email:
- I have to. It's either necessary for me to do my job or for me to function in society (i.e., “Your cell phone bill is overdue”).
- It's from a friend or family member. (My grandma is legit tech savvy.)
- The email subject line is so gosh darn intriguing that I literally cannot not click.
- It's from a source I trust, and/or the value is clearly indicated in the subject line or in the email opening.
That's pretty much it. And a quick poll of a few of my team members revealed they shared my sentiment.
^ Yes, my team members are Simpsons characters. Image via
Giphy.
So, if you're writing an email on behalf of your business (or on behalf of your client), it better fall into one of the above categories. Otherwise, it'll end up in email heaven. Or if you don't believe in heaven… email nothingness.
Lucky for you, we've compiled our (Unbounce's) top-performing emails (based on open rate, click-through rate or other defined metric of engagement). AND WE WANT YOU TO STEAL THEM.
Here's what's included:
- Blog welcome email
- Re-engagement email
- Campaign follow-up email
- Holiday email
- Feature launch email
- Oops email
- Nurture track welcome email
Plus we've highlighted what made them so successful and provided actionable tips for translating what worked to your own emails.
And for your copy-and-paste pleasure, here they all are in a Google doc.
Get all of Unbounce's top-performing emails
Copy, paste and customize em for your own email campaigns.
By entering your email you'll receive weekly Unbounce Blog updates and other resources to help you become a marketing genius.
1. Start a conversation with your blog welcome email
This email was just one of the amazing things to come out of our two-week publishing hiatus. It's the welcome email that is sent after someone subscribes to the Unbounce blog:
Subject line: Just another welcome email

We had three goals for re-working the blog welcome email: (1) inject personality, (2) get recipients to actually engage and (3) gain insights about what our readers actually struggle with as marketers.
Whereas our previous blog welcome email got at most one response per month, this email prompted 20 genuine responses in its first 30 days. Here's one:

This one, too:

If that wasn't enough, the email also sustained a ~60% open rate (multiple online sources state 50% is industry average for this type of email). We'd like to think it has something to do with our borderline self-deprecating subject line… but then, maybe we were just lucky.
Pro tip: Steer clear from generic subject lines such as, “Welcome to [company name]”. Instead, think about how to leverage
pattern disruption to cut through the generic garbage and get noticed.
2. Revive dormant contacts with a re-engagement email
After the release of our web series The Landing Page Sessions, we sent this email to our unknown subscribers. (Note: We define unknown subscribers as people whose email is the only data we have, and thus cannot determine if they are market qualified.)
The goal of the email was to get recipients to check out the first season of the show, which showcases our product in a delightful and actionable way.
Subject line: Wanna binge watch the Netflix of marketing videos? 

(Image Source)
Perhaps no industry is as wedded to the use of p-values as the pharmaceutical industry. As Ben Goldacre points out in his chilling book, Bad Pharma, there is terrible potential for the pharmaceutical industry to hoodwink doctors and patients with p-values. For example, a p-value of .05 means that one trial in 20 will incorrectly show a drug to be effective, even though, in actuality, that drug is no better than placebo. A dodgy pharmaceutical company could theoretically perform 20 trials of such a drug, bury the 19 trials showing it to be rubbish, and then proudly publish the one and only study that “proves” the drug works.
For the same probabilistic reasons, the online marketer who trawls through their Google AdWords/Facebook Ads/Google Analytics reports looking for patterns runs a big risk of detecting trends and tendencies which don't really exist. Every time said marketer filters their data one way or the other, they are essentially running an experiment. By sheer force of random chance, there will inevitably be anomalies, anomalies which the marketer will then falsely attribute to underlying pattern. But these anomalies are often no more special than seeing a coin land “heads” five times in a row in 1/100 different experiments where you flipped five fair coins.
Epiphany #3: Small differences in conversion rates are near impossible to detect. Large ones, trivial.
Imagine we observed the following advertising results:
Upon eyeballing the data, we see that the goat variant tripled its equestrian competitor's conversion rate. What's more, we see that there was a large number of impressions (1,000) in each arm of the experiment. Is this enough to satisfy the aforementioned “Law of Large Numbers” and give us the certainty we need? Surely these data mean that the “Miniature Goat” is the better photo in a statistically significant way?
Not quite. Without going too deep into the math, these results fail to reach statistical significance (where p=.05). If we concluded that the goat was the better photo, we would have a 1 in 6 chance of being wrong. Our failure to reach statistical significance despite the large number of impressions shows us that impressions alone are insufficient in our quest for statistically significant results. This might surprise you. After all, if you saw a coin land “heads” 1,000 times in a row, you'd feel damn confident that it was rigged. The math of statistical significance supports this feeling-your chances of being wrong in calling this coin rigged would be about 1 in 1,000,000,000,000,000,000,000,000,000,000… (etc.)
So why is it that the coin was statistically significant after 1,000 flips but the advert wasn't after 1,000 impressions? What explains this difference?
Before answering this question, I'd like to bring up a scary example that you've probably already encountered in the news: Does the use of a mobile phone increase the risk of malignant brain tumors? This is a fiendishly difficult question for researchers to answer, because the incidence of brain tumors in the general population is (mercifully) tiny to start off with (about 7 in 100,000). This low base incidence means that experimenters need to include absolutely epic numbers of people in order to detect even a modestly increased cancer risk (e.g., to detect that mobile phones double the tumor incidence to 14 cases per 100,000).
Suppose that we are brain cancer researchers. If our experiment only sampled 100 or even 1,000 people, then both the mobile-phone-using and the non-mobile-phone-using groups would probably contain 0 incidences of brain tumors. Given the tiny base rate, these sample sizes are both too small to give us even a modicum of information. Now suppose that we sampled 15,000 mobile phone users and 15,000 non-users (good luck finding those).
At the end of this experiment, we might count two cases of malignant brain cancer in the mobile-phone-using group and one case in the non-mobile-using group. A simpleton's reading of these results would conclude that the incidence of cancer (or the “morbid conversion rate”) with mobile phone users is double that of non-mobile-phone users. But you and I know better, because intuitively this feels like too rash a conclusion-after all, it's not that difficult to imagine that the additional tumor victim in the mobile-phone-using group turned up there merely by random chance. (And indeed, the math backs this up: this result is not statistically significant at p=.05; we'd have to increase the sample size a whopping 8 times before we could detect this difference.)
Let's return to our coin-flipping example. Here we only considered two outcomes-that the coin was either fair (50% of the time it lands “heads”) or fully biased to “heads” (100% of the time it lands “heads”). Phrasing the same possibilities in terms of conversion rates (where “heads” counts as a conversion), the fair coin has a 50% conversion rate, whereas the biased coin has a 100% conversion rate. The absolute difference between these two conversion rates is 50% (100% – 50% = 50%). That's stonking huge! For comparison's sake, the (reported) difference between the miniature pony and miniature goat photo variants (from the example at the start of this section) was only .2%, and the suspected increase in cancer risk for mobile phone users was .01%.
Now we get to the point: It is easier to detect large differences in conversion rates. They display statistical significance “early” (i.e., after fewer flips or fewer impressions, or in studies relying on smaller sample sizes). To see why, imagine an alternative experiment where we tested a fair coin against one ever so slightly biased to “heads” (e.g., one that lands “heads” 51% of the time). This would require many, many coin flips before we would notice the slight tendency towards heads. After 100 flips we would expect to see 50 “heads” with a fair coin and 51 “heads” with the rigged one, but that extra “heads” could easily happen by random chance alone. We'd need about 15,000 flips to detect this difference in conversion rates with statistical significance. By contrast, imagine detecting the difference between a coin biased 0% to “heads” (i.e., always lands “tails”) and one biased 100% to “heads” (in other words, imagine detecting a 100% difference in conversion rates). After 10 coin flips we would notice that the results would be either ALL heads or ALL tails. Would there really be much point in continuing to flip 90 more times? No, there would not.
This brings us to our next point, which is really just a corollary of the above: Small differences in conversion rates are near impossible to detect. The easiest way to understand this point is to consider what happens when we compare the results of two experimental variants with identical conversion rates: After a thousand, a million, or even a trillion impressions, you still won't be able to detect a difference in conversion rates, for the simple reason that there is none!
Bradd Libby, of Search Engine Land, calculated the rough number of impressions necessary in each arm of an experiment to reach statistical significance. He then reran this calculation for various different click-through rate (CTR) differences, showing that the smaller the expected conversion rate difference, the harder it is to detect.
Notice how in the final row an infinite number of impressions are needed; as we said above, we will never detect a difference, because there is none to detect. The consequence of all this is that it's not worth your time, as a marketer, to pursue tiny expected gains; instead, you'd be better off going for a big win that you have a chance of actually noticing.
Epiphany #4: You destroy a test's validity by pulling the plug before its preordained test-duration has passed
Anyone wedded to statistical rigor ought to think twice about shutting down an experiment after perceiving what appears to be initial promise or looming disaster.
Medical researchers, with heartstrings tugged by moral compassion, wish that every cancer sufferer in a trial could receive what's shaping up to be the better cure-notwithstanding that the supposed superiority of this cure has yet to be established with anything approaching statistical significance. But this sort of rash compassion can have terrible consequences, as happened in the history of cancer treatment. For far too long, surgeons subjected women to a horrifically painful and disfiguring procedure known as the 'radical mastectomy'. Hoping to remove all traces of cancer, doctors removed the chest wall and all axillary lymph nodes, along with the cancer-carrying breast; it later transpired that removing all this extra tissue brought no benefit whatsoever.
Generally speaking, we should not prematurely act upon the results of our tests. The earlier stages of an experiment are unstable. During this time, results may drift in and out of statistical significance. For all you know, two more impressions could cause a previous designation of “statistically significant” to be whisked out from under your feet. Moreover, statistical trends can completely switch direction during their run-up to stability. If you peep at results early instead of waiting until an experiment runs its course, you might leave with a conclusion completely at odds with reality.
For this reason, it's best practice not to peek at an experiment until it has run its course-this being defined in terms of a predetermined number of impressions or a preordained length of time (e.g., after 10,000 impressions or two weeks). It is crucial that these goalposts be established before starting your experiment. If you accidentally happen to view your results before these points have been passed, resist the urge to act upon what you see or even to designate these premature observations as “facts” in your own mind.
Epiphany #5: “Relative” improvement matters, not “absolute” improvement
Look at the following table of data:
After applying a statistical significance test, we would see that the 80s rocker photo outperforms the 60s hippy photo in a statistically significant way. (The numerical details aren't relevant for my point so I've left them out.) But we need to be careful about what business benefit these results imply, lest we misinterpret our findings.
Our first instinct upon seeing the above data would be to interpret it as proving that the 80s rocker photo converted at a 16% higher rate than the 60s hippy photo, where 16% is the difference by subtraction between the two conversion rates (30% – 14% = 16%).
But calculating the conversion rate difference as an absolute change (rather than a relative change) would lead us to understate the magnitude of the improvement. In fact, if your business achieved the above results, a switch from the incumbent 60s hippy pic to the new 80s rocker pic would cause you to more than double your number of conversions, and, all things being equal, you would, as a result, also double your revenue. (Specifically, you would have a 114% improvement, which I calculated by dividing the improvement in conversion rates, 16%, by the old conversion rate, 14%.) Because relative changes in conversion rates are what matter most to our businesses, we should convert absolute changes to relative ones, then seek out the optimizations that provide the greatest improvements in these impactful terms.
Epiphany #6: “Statistically insignificant” does not imply that the opposite result is true
What exactly does it mean when some result is statistically insignificant? The example below has a p-value of approximately .15 for the claim that the Mini Goat photo is superior, making such a conclusion statistically insignificant.
Does the lack of statistical significance imply that there is a full reversal of what we have observed? In other words, does the statistical insignificance mean that the “Miniature Pony” variant is, despite its lower recorded conversion rate, actually better at converting than the “Miniature Goat” variant?
No, it does not-not in any way.
All that the failure to find statistical significance says here is that we cannot be confident that the goat variant is better than the pony one. In fact, our best guess is that the goat is better. Based on the data we've observed so far, there is an approximately 85% chance that this claim is true (1 minus the p-value, .15 = .85). The issue is that we cannot be confident of this claim's truth to the degree dictated by our chosen p-value-to the minimum level of certainty we wanted to have.
One way to intuitively understand this idea is to think of any recorded conversion rate as having its own margin of error. The pony variant was recorded as having a .1% conversion rate in our experiment, but its confidence interval might be (using made-up figures for clarity) .06% above or below this recorded rate (i.e., the true conversion rate value would be between .04% and .16%). Similarly, the confidence interval of the goat variant might be .15% above or below the recorded .3% (i.e., the true value would be between .15% and .45%). Given these margins of error, there exists the possibility that the pony's true conversion rate would be at the high end (.16%) of its margin of error, whereas the goat's true conversion rate would lie at its low end (.15%). This would cause a reversal in our conclusions, with the pony outperforming the goat. But in order for this reversal to happen, we would have had to take the most extreme possible values for our margins of error-and in opposite directions to boot. In reality, these extreme values would be fairly unlikely to turn up, which is why we say that it's more likely that goat photo is better.
Epiphany #7: Any tests that are run consecutively rather than in parallel will give bogus results
Statistical significance requires that our samples (observations) be randomized such that they fairly represent the underlying reality. Imagine walking into a Republican convention and polling the attendees about who they will vote for in the next US presidential election. Near everyone in attendance is going to say “the Republican candidate”. But it's self-evident that the views of the people in that convention are hardly reflective of America as a whole. More abstractly, you could say that your sample doesn't reflect the overall group you are studying. The way around this conundrum is randomization in choosing your sample. In our example above, the experimenter should have polled a much broader section of American society (e.g., by questioning people on the street or by polling people listed in the telephone directory.) This would cause the idiosyncrasies in voting patterns to even out.
If you ever catch yourself comparing the results of two advertising campaigns that ran one after the other (e.g., on consecutive days/weeks/months), stop right now. This is a really really bad idea, one that will drain every last ounce of statistical validity from your analyses. This is because your experiments are no longer randomly sampling. Following this experimental procedure is the logical equivalent of extrapolating America's political preferences after only asking attendees of a Republican convention.
To see why, imagine you are a gift card retailer who observed that 4,000% as many people bought Christmas cards the week before Christmas compared to the week after. You would be a fool if you concluded that the dramatic difference in conversion rates between these two periods was because the dog photo you advertised with during the week preceding Christmas was 40 times better at converting than the cat photo used the following week. The real reason for the staggering difference is that people only buy Christmas cards before Christmas.
Put more generally, commercial markets contain periodic variation-ranging in granularity from full-blown seasonality to specific weekday or time of day shopping preferences. These periodic forces can sometimes fully account for observed differences in conversion rates between two consecutively run advertising campaigns, as happened with the Christmas card example above. The most reliable way to insulate against such contamination is to run your test variants at the same time as one another, as opposed to consecutively. This is the only way to ensure a fair fight and generate the data necessary to answer the question 'which advert variant is superior?' As far as implementation details go, you can stick your various variants into an A/B testing framework. This will randomly display your different ads, and once the experiment ends you simply tally up the results.
Perhaps you are thinking, “My market isn't affected by seasonality, so none of this applies to me”. I strongly doubt that you are immune to seasonality, but for argument's sake let's assume your conviction is correct. In this case, I would still argue that you have a blind spot in that you are underestimating the temporally varying effect of competition. There is no way for you to predict whether your competitors will switch on adverts for a massive sale during one week only to turn them off during the next, thereby skewing the hell out of your results. The only way to protect yourself against this (and other) time-dependent contaminants is to run your variants in parallel.
Conclusion
Having been enlightened by the seven big epiphanies for understanding statistical significance, you should now be better equipped to pull up your sleeves and dig into statistical significance testing from a place of comfortable understanding. Your days of opening up Google AdWords reports and trawling for results are over; instead, you methodically set up parallel experiments, let them run their course, choose your desired trade-off for certainty vs. experimental time, give adequate sample sizes for your expected conversion rate differences, and calculate business impact in terms of relative revenue differences. You will no longer be fooled by randomness.
About the Author: Jack Kinsella, author of Entreprenerd: Marketing for Programmers.