Amazing! How to eliminate spam.
Oct. 8th, 2006 03:19 amI had a nap and woke wanting to go back to sleep, but my mind, operating in that weird twilight zone between full wakefulness and sleep has hit upon an amazingly simple way to eliminate spam.
I don't know why nobody has thought of this before -- it is so incredibly simple.
The big barrier to blocking spam is the fact that headers in email can be so easily forged. This has been the "reason" behind the push to eliminate anonymous communications on the net. In fact anonymity has been largely removed because when any communications travel through the net a record is kept of the machines it has passed thru on its way to you. Most spam comes from only a small number of sources. Weak laws keep them from the clutches of those of us who would gladly strangle their sociopathic necks.
But we don't need a big-brother net devoid of privacy in order to flawlessly block spam. All we need is a simple two-step process of reception and verification. We already do this in most other internet communications. For example when you receive a packet of data from a web server your computer checksums the data to make sure it is uncorrupted. Your computer then sends a signal back to ask for the next one or to resend a damaged packet. A single web page can require several packets. All this happens without you knowing it.
If we extend a variation of this to email then we can get rid of spam. It goes like this:
An email arrives at your email server. Before passing the email on to you, the server looks up the "from" address and posts a verification packet back to that address with some unique identifier of the email (a checksum or some special purpose header). If the sender's computer returns a verified "OK" signal then your email server knows the address wasn't forged.
Using this simple technique spammer's forged "from" addresses will never work ever again. If the spammer uses their real address in an attempt to get beyond this then they can be easily blocked by anti-spam software which can very quickly learn those addresses.
I don't know why nobody has thought of this before -- it is so incredibly simple.
The big barrier to blocking spam is the fact that headers in email can be so easily forged. This has been the "reason" behind the push to eliminate anonymous communications on the net. In fact anonymity has been largely removed because when any communications travel through the net a record is kept of the machines it has passed thru on its way to you. Most spam comes from only a small number of sources. Weak laws keep them from the clutches of those of us who would gladly strangle their sociopathic necks.
But we don't need a big-brother net devoid of privacy in order to flawlessly block spam. All we need is a simple two-step process of reception and verification. We already do this in most other internet communications. For example when you receive a packet of data from a web server your computer checksums the data to make sure it is uncorrupted. Your computer then sends a signal back to ask for the next one or to resend a damaged packet. A single web page can require several packets. All this happens without you knowing it.
If we extend a variation of this to email then we can get rid of spam. It goes like this:
An email arrives at your email server. Before passing the email on to you, the server looks up the "from" address and posts a verification packet back to that address with some unique identifier of the email (a checksum or some special purpose header). If the sender's computer returns a verified "OK" signal then your email server knows the address wasn't forged.
Using this simple technique spammer's forged "from" addresses will never work ever again. If the spammer uses their real address in an attempt to get beyond this then they can be easily blocked by anti-spam software which can very quickly learn those addresses.
no subject
Date: 2006-10-08 06:03 am (UTC)Also, a domain's mail server doesn't currently even need to know what addresses are valid. It's currently impossible to tell what the real addresses might be. For example, I send email out to a site with from address "bah@example.com" and another email gets sent out later from "humbug@example.com". Those addresses are only known to my mail client software. When the reply hits my domain's (hosted) mail server, the mail then gets forwarded to me (as the default catch-all email address) from which it either gets downloaded directly or gets forwarded to my private mail server, which then delivers them to "bah" and "humbug" respectively. The administrative overhead of updating the mail server's knowledge of email addresses would be a real pain.
no subject
Date: 2006-10-08 09:47 am (UTC)It is may be impractical to upgrade all the current email systems. (Though I'm not convinced of that.) But this is something else. Adoption would proceed apace as soon as everybody realised it stopped spam in its tracks. It is a new kind of email that would work slightly differently. It would also give us the opportunity to get rid of the stupid uuencoded attachments that live on from the old days of 7-bit exchanges. (This is why an attached file increases in size.)
Even if current email servers were simply modified instead of changing over to the new protocol, it wouldn't mean mail would take 3 times as long. It would take the same amount of time to verify an email's source as it currently takes to do the same with web packets.
The server doesn't need to know any addresses in this scheme. All it needs is to have recorded a CRC check or a special-purpose header from each email. When a verification request comes in it is matched and the OK packet is fired off. No need for addresses at all. If an email is delivered there then it was either the source or not. That's all that is needed.
no subject
Date: 2006-10-08 02:31 pm (UTC)Yes, I guess you could just check the CRC. This would produce interesting problems. Let's say I send email from the mail server on machine X and that machine falls over. The other end asks for verification, and that goes to the secondary mail server, which, of course, knows nothing about the email that was sent at all. What does it do? If it requires a shared database of messages sent and CRCs between every mail server which may respond to email for a certain domain, this introduces hellish update problems.
I think you'll find that SPF will achieve most of what you're suggesting, while keeping the existing email architecture.
no subject
Date: 2006-10-09 02:54 am (UTC)I was really hoping someone had come up with what I'd suggested. Unfortunately it looks to me that SPF is mainly intended to avoid someone forging my email address in spam. But it doesn't help with all the rest of spam with forged addresses.
I can't see where SPF avoids forged headers. It seems to me a spammer could easily find your signing policy and forge it and the other headers. Or they could simply avoid the problem altogether by using totally fictitious headers for each batch of spam. As far as I can see, SPF fixes neither of these situations... it just makes it a little bit harder for spammers... and inconveniences users while it's doing it. And let's face it, most users would never create a signing policy.
If, however, we use the simple trick of sending a verification request back to the original "From" address to get a yay or nay response then it solves the whole problem. If the sender sent it, you get a yay; if not, then you don't. You don't need trusted keys or special ID strings on your server. It is really simple.
On the point of what happens if a machine falls over: well, if the database isn't safe then of course there will be problems. If a machine falls over while delivering conventional email it would be catastrophic, but that isn't a reason to dismiss conventional email as unworkable. I can find special case calamities in anything. Any failure will cause problems, but fall-back positions can almost always be found.
I still think the principle behind what I said above is sound.
no subject
Date: 2006-10-09 05:26 pm (UTC)Like all methods, it needs wide usage to make any dent on spam. Some major ISPs and some sites are using it and setting headers, so that users can block some spam this way. LJ is using it. eg. The notification of this reply of yours to my gmail account included this header:
Received-SPF: pass (google.com: domain of lj_notify@livejournal.com designates 204.9.177.18 as permitted sender)
At the risk of going around in circles, sending a verification request back to the original "From" address to get a yay or nay response "solves" the whole problem in a way which could take days! Email is not necessarily single hop, nor instantaneous, and the only way to get back to the original From address is by email.
The part about a machine falling over while sending email is not terribly relevant. Either the mail was queued for sending or it wasn't, and the user's mail client will inform them of this. If mail server to mail server communications break down, the email gets tried again later when the machines or their networking links are back up again. Nothing catastrophic at all. It's a store-and-forward system and it copes. However, you can't expect the originating system to be up and running when you are trying to receive the email. All you can guarantee will be running is the mail server you are currently communicating with. While a fall-back position could be found, any that I can think of (eg. sharing the list of all sent mail between all valid sending mail servers for one domain) have network bandwidth and admin overheads, plus lots of new code to be written and a database to be managed. For a company with large numbers of mail servers, it could be messy.
no subject
Date: 2006-10-08 12:47 pm (UTC)Thanks for the reply. :)
I like it when I'm forced to think about unforseen complications.
no subject
Date: 2006-10-08 06:49 am (UTC)Many email readers can already detect that the header is probably forged, based on it being poorly formatted or inconsistent. Your approach takes things a bit further and at the cost of a relatively small extra message (though time-outs could be a problem.) It could effectively double the size of some stacks on mail-servers because there'd be the original transmission plus an extra round-trip message to confirm things. I think the biggest obstacle to this scheme is that take-up would be an all-or-nothing thing and it would require some re-engineering of email protocols. It could, however, work initially as an optional "wrapper" for email to and from the same servers/ISPs (eg: spammer@hotmail.com --> joe@hotmail.com). From there, it could be adopted by more and more ISPs.
Also, it won't stop all spam, only spam from forged email addresses. And what's to stop a compromised server from relaying "Yes, that email I just sent was genuine" to every request? :(
Combined with whitelisting (and comparing a log or signature of previous headers from known senders), you could use your approach to cut some cases of phishing and identity-theft. eg: this email claims to have come from damien.wise@blah.com, yet parts of the header are different from the established pattern of other emails from him...so I'll flag it as probably spam.
no subject
Date: 2006-10-08 10:00 am (UTC)Forged "From" address are the biggest problem with spam. If we got rid of that then 90% of email would dry up overnight. A server could only send verifications for email sent from it (which is what it should do). In such cases it takes only a very short time for filtering software to block that spam. This is exactly why most spam forges the "From" address.
It would be a great step toward crushing crooks who swindle people with phishing emails. The only crooked emails to get through would be from sub-morons and would have a valid return address for the police.