I was wondering how do email providers like gmail, yahoo detect spam mails and mark them as spam? How do they know which mail is a spam and which is not.

Just theoretically I want to know.

Any help is appreciated.


I may not be that accurate but

  • hidden texts in the e-mail (HTML)
  • undefined parameters in the mail headers(giving proper information like from address, to address,reply-to address, subject etc
  • language, grammar etc

Mailchimp has a couple of good articles about the topic:

I would search for research papers on this topic, for example,

Unfortunately, people usually don’t blog about it.

The workings of SpamAssassin is a starting point for research. The big mail providers also have the advantage of being able to know when an email was sent to a large pool of users.