emailtester.net FAQ

Methodology

Q: Are you testing domains or servers?
A: We're testing servers. The atomic unit of testing is an IP address corresponding to a mail exchanger. See the paper for details.
Q: Do you test all servers in the MX records, or just the one that is most preferred (lowest preference value)?
A: We test them all, but our results examine only those that are highest preference, i.e. they could potentially have been chosen by some system in the Internet.
Q: What about MXs that resolve to several IP addresses for load-balancing or multi-homing?
A: Yes, we expand each MX and test all of these.
Q: What about multiple MXs with the same preference?
A: We handle these and mark all MXs that were most preferred as being primary during our testing round.
Q: What if the server under test doesn't accept your test email?
A: We record the sequence of response codes from the SMTP transaction. Only if all of the codes indicate successful receipt do we track the email in our database for analysis.
Q: Did you have proper DNS PTR records such that your emails would not be rejected?
A: Yes. 235.0.26.18.in-addr.arpa. 1800 IN PTR fyodor.emailtester.net.

Analysis and Open Questions

Q: Your research relies on bounces, do you believe that the data presented is representative of real email?
A:This answer depends on the metrics that you consider. In the paper, we present three types of metrics:
- Errors: We believe most of the errors observed (esp, unable to connect errors) are likely representative of true error rates. In particular, for many of these errors, we do not even have a chance to supply the invalid email address.
- Latency: While our results need not be representative of true latency, we believe that it likely is.
- Loss: Our loss rates seem abnormally high and at this point we have no satisfactory answer for this. Because this loss pattern was observed across many independent domains, we feel that there is some interesting interesting email behavior that merits further investigation. We are currently seeking to understand what is causing the strange patterns of loss presented in the paper.
To date, we have found nothing to contradict the fact that our bounce results are not representative.
Q: Are you certain your server wasn't the source of lost emails?
A: Fairly. We performed stress testing with a load one order of magnitude higher email traffic than any we expected to experience. In addition, we maintain extensive logging of each process to verify consistency. Finally, we see time correlation of lost emails between the same site, but not between different sites, indicating that the loss was in fact a phenomenon specific to the remote site.
Q: Do virtual servers explain the loss? That is, an email server farm sitting behind a load-balancer that appears as a single IP address but only some servers are configured to reply to bounces?
A: We try to check for this phenomenon by examining and recording the greeting banners. Indeed, we see virtualized servers, but loss from these domains is not correlated to a single server (identified by a unique hello banner).
Q: Couldn't a server be configured to respond to bounces differently? Or perhaps differently when under load?
A: This certainly could be true. However, we do not know of a real-world system or architecture that would explain this. Recall that the confusing results are the losses, emails for which we never get a reply. To explain our results, the system would have to be set up such that under load, the server probabilistically drops some fraction of bounce replies (or doesn't generate the bounce message). If someone feels that a large number of MXes are configured this way, we'd love to hear from you.
Q: Could spam filters explain the losses?
A: Again, this could be true but we are yet to find a real architecture or system that would do this. For example, the spam filter could drop emails more aggressively when under load. But such an approach is a little odd, creates odd user behavior, and we don't know of a system that would do that. Alternatively, the spam filter could label us as spammers and decide to not let our emails through. But this does not explain why only some of the emails get through and why this behavior goes away. Also, recall however that our emails differ only in the To: line (and of course timestamp and other appropriate headers). The bodies are always the exact same. Thus, it's unclear why, during periods of loss, only some of the emails do not get replies.
Q: What was the longest delayed email?
A: Approximately 34 days (!). While this may seem abnormally high, as mentioned in the paper, one of the authors had a personal email delayed for over 42 days.
Q: Is there a correlation between loss or latency and the time of day?
A: There appears to be. We present some of this in the paper and are currently doing some more analysis.
Q: You should examine the headers to determine where the loss or latency is occurring.
A: We present some of this in the paper and are doing more analysis now.
Q: You didn't answer my question / your research is bunk / let me buy you a beer
A: Please read our ACM SIGCOMM CCR paper and see if the answer is detailed there. If not, please contact us - we'd love to hear explanations of some of the strange behavior we witnessed.