How to validate an email address ?January 28th, 2010
Having worked on various web projects, I often encounter a very well known problem : finding an effective regular expression (regexp) to check the validity of user submitted email addresses.
In his blog, Fighting for a lost cause, Ian Dunn has compiled various regular expressions which try to address this problem. The editor’s idea is great: using a set of valid/invalid emails and a simple unit test, he can provide a good comparison of some of the most used regexps.
His philosophy is simple : “It’s better to accept a few invalid addresses than reject any valid ones, so I’m looking for 0 false-positives and as few false-negatives as possible.”
But I’ve noticed 2 problems :
- His “best” regexp doesn’t work in JavaScript (JS doesn’t support advanced features like negative lookbehind …)
- The method used to validate IP addresses is not correct (doesn’t take care of 0-255 range)
So i’ve decided to improve another existing regex, created by Warren Gaebel and already enhanced by Guillaume Arluison, by adding another test criteria : also check the “real” validity of the IP address.
Here is my solution :
/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9]([-a-z0-9_]?[a-z0-9])*(\.[-a-z0-9_]+)*\.(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|[a-z]{2})|([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3})(:[0-9]{1,5})?$/i
This one works very well (found 18/18 valid mails + deep IP address check, and found 19/20 invalid mails – there is a problem checking global length)
There’s just a small problem, each time a new TLD > 2 chars will be added, you’ll need to append it to the list in the regex, if you want a more generic solution, you can use this variant (note that this version will not check if the TLD really exists) :
/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9]([-a-z0-9_]?[a-z0-9])*(\.[-a-z0-9_]+)*\.([a-z]{2,6})|([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})(\.([1]?\d{1,2}|2[0-4]{1}\d{1}|25[0-5]{1})){3})(:[0-9]{1,5})?$/i
Those 2 solutions should be usable in all languages providing PCRE (Perl Compatible Regular Expressions), on server & client side (such as Javascript, PHP, Perl, Python, Ruby etc…)
Various solutions exist. I have recently given an internal presentation to present Prelude SIM (Security Information Management) System, a project I have contributed to. It’s an OpenSource solution which allows you to monitor in real-time your infrastructure by correlating events from deployed sensors such as Snort (IDS), Samhain (FileSystem Integrity Checker) or Prelude-LML (Log analyzer) and helps you react quickly to a potential attack.