Validation and Formatting Back
This chapter mainly discusses recipes (秘訣) for validating and formatting common types of user input, and give a proper way for us to validate and format what we usually meet in reality development.
Validate Email Addresses
Problem
How to check whether a provided email is a legal input?
Solution
Simple
/^\S+@\S+$/i
Simple, with restrictions on characters
/^[A-Z0-9+_.-]+@[A-Z0-9.-]$/i
Simple, with all validate local part characters
/^[A-Z0-9_!#$%&'*+/=?`{|}~^.-]@[A-Z0-9.-]$/i
No leading, trailing, or consecutive dots
/^[A-Z0-9_!#$%&'*+/=?`{|}~^-]+(?:.[A-Z0-9_!#$%&'*+/=?`{|}~^-]+)*@[A-Z0-9-]+(?:.[A-Z0-9-]+)*$/i
Top-level domain has two to six letters
/^[\w!#$%&'*+/=?`{|}~^-]+(?:.[\w!#$%&'*+/=?`{|}~^-]+)*@(?:[A-Z0-9-]+.)+[A-Z]{2,6}$/i
Discussion
If you thought something as conceptually simple as validating an email address would have a simple one-size-fits-all regex solution, you're quite wrong. This recipe is a prime example that before you can start writing a regular expression, you have to decide exactly what you want to match. There is no universally agreed-upon rule as to which email addresses are valid and which not. It depends on your definition of valid.
Allowing invalid addresses to slip through may be preferable to annoying people by blocking valid addresses.
But if you want to avoid sending too many undeliverable emails, which still not blocking any real email addresses, the regex in "Top-level domain has two to six letters" is a good choice.
What it means is that it all depends what you want.
To build a complicated regex, you have to do this step-by-step, and defines a structure firstly before, like /^\S+@\S+$/i.
Validate and Format Chinese Phone Numbers
Problem
Chinese phone numbers have a specific format including: 12345678901, 123-4567-8901, 123 4567 8901, +8612345678901, +86-123-4567-8901, (+86) 123 4567 8901, and so on. If the phone number is valid, you may want to convert it all into a standard format: (+86) 123 4567 8901
Solution
subject.replace(/(?:\(?\+86\)?)?[\s-]*(\d{3})[\s-]*(\d{4})[\s-]*(\d{4})/g, '(+86) $1 $2 $3');
Discussion
In China, a phone number will always have a number
1
at the first, so to validate more phone numbers, you can use a variation like /(?:(?+86)?)?[\s-]*(1\d{2})[\s-]*(\d{4})[\s-]*(\d{4})/g. Of course, this regex will have a problem when matching more than 11 numbers, which is actually not a Chinese phone number. Considering this problem, I'll improve it by restrict its head and tail like: /([^(\d]+)(?:(?+86)?)?[\s-]*(1\d{2})[\s-]*(\d{4})[\s-]*(\d{4})(?=[^\d]+?)/g, and the corresponding replacement is$1(+86) $2 $3 $4
.
Validate and Format North American Phone Numbers
Problem
What if a North American phone? Match 1234567890, 123-456-7890, 123.456.7890, 123 456 7890, (123) 456 7890, and convert them all into a standard format: (123) 456-7890.
Solution
subject.replace(/\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})/g, '($1) $2-$3');
Discussion
If you want to limit matches to valid phone numbers according to the North American Numbering Plan, here are the basic rules:
Area codes start with a number 2–9, followed by 0–8, and then any third digit.
- The second group of three digits, known as the central office or exchange code, starts with a number 2–9, followed by any two digits.
The final four digits, known as the station code, have no restrictions.
So the regex according to this rule should be: /(?([2-9][0-8][0-9]))?[-. ]?([2-9][0-9]{2})[-. ]?([0-9]{4})/g.
Besides, you can also allow an optional, leading "1" for the country code: /(?:+?1[-. ]?)?(?([0-9]{3}))?[-. ]?([0-9]{3})[-. ]?([0-9]{4})/g.
To allow matching phone numbers that omit the local area code, enclose the first group of digits together with its surrounding parentheses and following separator in an optional, non-capturing group: /(?:(?([0-9]{3}))?[-. ]?)?([0-9]{3})[-. ]?([0-9]{4})/g.
Validate International Phone Numbers
Problem
And what if we want to match international phone number, like +86 123 4567 8901 of Chinese?
Solution
function isValidate(phone) { return /^\+(?:[0-9] ?){6,14}[0-9]$/.test(phone); }
Discussion
The rules and conventions used to print international phone numbers vary significantly around the world, so it's hard to provide meaningful validation for an international phone number unless you adopt a strict format. Fortunately, there is a simple, industry standard notation specified by ITU-T E.123. And the regular expression above is responding to this rule. If you want to obey the notation specified by the Extensible Provisioning Protocol (EPP), you can use the following regex: /^+[0-9]{1,3}.[0-9]{4,14}(?:x.+)?$/;
Validate Traditional Date Formats
Problem
Create a regex to match dates in the traditional formats: mm/dd/yy, mm/dd/yyyy, dd/mm/yy, and dd/mm/yyyy.
Solution
Solution 1: Match any of these date formats, allowing leading zeros to be omitted
/^[0-3]?[0-9]/[0-3]?[0-9]/(?:[0-9]{2})?[0-9]{2}$/
Solution 2: Match any of these date formats, require leading zeros:
/^[0-3][0-9]/[0-3][0-9]/(?:[0-9][0-9])?[0-9][0-9]$/
Solution 3: Match m/d/yy and mm/dd/yyyy, allowing any combination of one or two digits for the day and month, and two or four digits for the year:
/^(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}$/
Solution 4: Match mm/dd/yyyy, requiring leading zeros:
/^(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/[0-9]{4}$/
Solution 5: Match d/m/yy and dd/mm/yyyy, allowing any combination of one or two digits for the day and month, and two or four digits for the year:
/^(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}$/
Solution 6: Match dd/mm/yyyy, requiring leading zeros:
/^(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9])/[0-9]{4}$/
Solution 7: Match any of these date formats with greater accuracy, allowing leading zeros to be omitted:
/^(?:(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])|(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9]))/(?:[0-9]{2})?[0-9]{2}$/
Solution 8: Match any of these date formats with greater accuracy, requiring leading zeros:
/^(?:(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])|(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9]))/[0-9]{4}$/
Discussion
You might think that something as conceptually trivial as a date should be an easy job for a regular expression. But it isn't, for two reasons.
dates are such an everyday thing, humans are very sloppy with them.
regular expressions work character by character rather than deal directly with numbers
If you're going to validate an input,
^
and$
are both what you should not use. Conversely, you should use a variation regex like: /\b(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/[0-9]{4}\b/.
Validate Traditional Date Formats, Excluding Invalid Dates
Problem
How about weed out some invalid dates, such as Feb 31st.
Solution
One solution is to use code to validate what you have captured, but if you just use one regular expression, you can create a complex one like this:
/^(?:(0?2)/([12][0-9]|0?[1-9])|(0?[469]|11)/(30|[12][0-9]|0?[1-9])|(0?[13578]|1[02])/(3[01]|[12][0-9]|0?[1-9]))/((?:[0-9]{2})?[0-9]{2})$/
Respectively represents for:
February (29 days every year)
(0?2)/([12][0-9]|0?[1-9])
30-day months:
(0?[469]|11)/(30|[12][0-9]|0?[1-9])
31-day months
(0?[13578]|1[02])/(3[01]|[12][0-9]|0?[1-9]))
year
((?:[0-9]{2})?[0-9]{2})
Discussion
As it will be a complex regular expression, it's recommended to use code to do filtering for you rather than to create such a complex regex. If you do want to build this, you can have a analysis about it and use
|
to separate all cases.
Validate Traditional Time Formats
Problem
How to validate times in various traditional time formats, such as hh:mm and hh:mm:ss in both 12-hour and 24-hour formats.
Solution
Hours and minutes, 12-hour clock:
/^(1[0-2]|0?[1-9]):([0-5]?[0-9])( ?[AP]M)?$/
Hours and minutes, 24-hour clock:
/^(2[0-3]|[01]?[0-9]):([0-5]?[0-9])$/
Hours, minutes and seconds, 12-hour clock:
/^(1[0-2]|0?[1-9]):([0-5]?[0-9]):([0-5]?[0-9])( ?[AP]M)?$/
Hours, minutes and seconds, 24-hour clock:
/^(2[0-3]|[01]?[0-9]):([0-5]?[0-9]):([0-5]?[0-9])$/
Discussion
Validating times is considerably easier than validating dates. Every hour has 60 minutes, and every minute has 60 seconds. This means we don't need any complicated alternations in the regex.
If you want to search all the time, you can use like regular expressions like this: /\b(2[0-3]|[01]?[0-9]):([0-5]?[0-9])\b/g.
Validate ISO 8061 Dates and Times
Problem
Match dates and/or times in the official ISO 8601 format, which is the basis for many standardized date and time formats.
Solution
Dates
Match YYYY-MM-DD or YYYYMMDD but not YYYY-MMDD or YYYYMM-DD:
/^([0-9]{4})(-?)(1[0-2]|0[1-9])\2(3[01]|0[1-9]|[12][0-9])$/
Match original date like 2008-243:
/^([0-9]{4})-?(36[0-6]|3[0-5][0-9]|[12][0-9]{2}|0[1-9][0-9]|00[1-9])$/
Weeks
Match weeks of the year such as 2008-W35:
/^([0-9]{4})-?W(5[0-3]|[1-4][0-9]|0[1-9])$/
Match week dates like 2008-W35-6:
/^([0-9]{4})-?W(5[0-3]|[1-4][0-9]|0[1-9])-?([1-7])$/
Times
Match hours and minutes with optional colon(
:
):/^(2[0-3]|[01][0-9]):?([0-5][0-9])$/
Match hours, minutes, and seconds like 17:21:59 with optional colon(
:
):/^(2[0-3]|[01][0-9]):?([0-5][0-9]):?([0-5][0-9])$/
Time zone designator (e.g., Z, +07 or +07:00) with optional colons and minutes:
/^(Z|+-(?::?(?:[0-5][0-9]))?)$/
Hours, minutes, and seconds with time zone designator (e.g., 17:21:59+07:00) with optional colons and minutes:
/^(2[0-3]|[01][0-9]):?([0-5][0-9]):?([0-5][0-9])(Z|+-(?::?(?:[0-5][0-9]))?)$/
Date and Times
Calendar date with hours, minutes, and seconds (e.g., 2008-08-30 17:21:59 or 20080830 172159) with required spaces between the date and the time, but optional hyphens and colons:
/^([0-9]{4})-?(1[0-2]|0[1-9])-?(3[01]|0[1-9]|[12][0-9]) (2[0-3]|[01][0-9]):?([0-5][0-9]):?([0-5][0-9])$/
A more complicated solution is needed if we want to match date and time values that specify either all of the hyphens and colons, or none of them:
/^(?:([0-9]{4})-?(1[0-2]|0[1-9])-?(3[01]|0[1-9]|[12][0-9]) (2[0-3]|[01][0-9]):?([0-5][0-9]):?([0-5][0-9])|([0-9]{4})(1[0-2]|0[1-9])(3[01]|0[1-9]|[12][0-9]) (2[0-3]|[01][0-9])([0-5][0-9])([0-5][0-9]))$/
XML Schema dates and times
Date, with optional time zone (e.g., 2008-08-30 or 2008-08-30+07:00) but required hyphens:
/^(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])(Z|+-:[0-5][0-9])?$/
Time, with optional fractional seconds and time zone (e.g., 01:45:36 or 01:45:36.123+07:00):
/^(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(.[0-9]+)?(Z|+-:[0-5][0-9])?$/
Date and time, with optional fractional seconds and time zone (e.g., 2008-08-30T01:45:36 or 2008-08-30T01:45:36.123Z).
/^(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(.[0-9]+)?(Z|+-:[0-5][0-9])?$/
Discussion
ISO 8601 defines a wide range of date and time formats. The regular expressions presented here cover the most common formats, but most systems that use ISO 8601 only use a subset. For example, in XML Schema dates and times, the hyphens and colons are mandatory (強制要求的). To make hyphens and colons mandatory, simply remove the question marks after them. To disallow hyphens and colons, remove the hyphens and colons along with the question mark that follows them.
None of the regexes here attempts to exclude invalid day and month combinations, such as February 31st. To do this, you can considering using code to filter for you.
Limit Input to Alphanumeric Characters
Problem
How to limit users' responses to one or more alphanumeric English characters (letters A–Z and a–z, and digits 0–9).
Solution
/^[A-Za-z0-9]+$/
Discussion
When we want to limit the input to ASCII characters, we can use regular expressions like /^[\x00-\x7F]+$/.
Or limit input to ASCII non-control characters and line breaks by using /^[\n\r\x20-\x7E]+/.
Or limit input to shared ISO-8859-1 and Windows-1252 characters by using /^[\x00-\x7F\xA0-\xFF]+$/.
Limit the length of Texts
Problem
To test whether a string is composed of between 1 and 10 letters between A to Z.
Solution
/^[A-Z]{1,10}$/
Discussion
If you want to limit the length of an arbitrary (任意的) pattern, you can considering using a positive lookahead at the beginning of the pattern to ensure that the string is within the target length range like: /^(?=[\S\s]{1,10}$)[\S\s]*/. It is important that the
$
anchor appears inside the lookahead because the maximum length test works only if we ensure that there are no more characters after we've reached the limit.If you want a regex to match any string that contains between 10 and 100 non-whitespace character: /^\s*(?:\S\s*){10,100}$/. By default,
\s
matches all Unicode white-space, and\S
matches everything else.Or if you want to limit the number of words: /^\W*(?:\w+\b\W*){10,100}$/. In JavaScript,
\w
will only match the ASCII characters A-Z, a-z, and _, which means that it cannot correctly count words that contain non-ASCII letters and numbers. If you do want to count those words that contain, there's a possible workaround, which is to reframe he regex to count whitespace rather than word character sequences: /^\s*(?:\S+(?:\s+|$)){10,100}$/. In many cases, this will work the same as the previous solutions, although it's not exactly equivalent. For example, one difference is that compounds joined by a hyphen, such as "far-reaching", will now be counted as one word instead of two. The same applies to words with apostrophes, such as "don't".
Limit the Number of Lines in Text
Problem
How to check whether a string is composed of five of fewer lines, without regard for how many total characters appear in the string?
Solution
/^(?:[^\r\n]*(?:\r\n?|\n)){0,4}[^\r\n]*$/
Discussion
We can't simply omit this class and change the preceding quantifier to
{0,5}
, because then the text would have to end with a line break to match at all. So long as the last line was empty, it would also allow matching six lines, since six lines are separated by five line breaks. That's not good.
Validate Affirmative (肯定的) Responses
Problem
How to check a configuration option or command-line response for a positive value? For example, you want to provide some flexibility in the accepted responses, so that true, t, yes, y, okay, ok, and 1 are all accepted in any combination of uppercase and lowercase.
Solution
/^(?:1|t(?:rue)?|y(?:es)?|ok(?:ay)?)$/
Validate ZIP Codes
Problem
How to validate a ZIP code (U.S. portal code)? For example, match 12345 and 12345-6789.
Solution
/^[0-9]{5}(?:-[0-9]{4})?$/
Validate Canadian Postal codes
Problem
What if Canadian postal codes?
Solution
/^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$/
Validate U.K. Postal Codes
Problem
What if postal codes of U.K.?
Solution
/^[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][ABD-HJLNP-UW-Z]{2}$/
Reformat Names From "FirstName LastName" to "LastName, FirstName"
Problem
How to convert people's names from the "FirstName LastName" format to "LastName, FirstName" for use in an alphabetical listing? Besides, names may contain a suffix, which is one of the values "Jr", "Jr.", "Sr", "Sr.", "II", "III", or "IV", with an optional preceding comma.
Solution
function formatName(name) { return name.replace(/^(.+?) ([^\s,]+)(,? (?:[JS]r\.?|III?|IV))?$/i, '$2, $1$3'); }
Validate Password Complexity
Problem
If you're tasked with ensuring that any passwords chosen by users need to meet a complexity requirements
Solution
There're several code example that show how to validate password with complex requirements:
Length bwetween 8 and 32 characters
function validate(password) { return /^[\s\S]{8,32}$/.test(password); }
ASCII visible and space characters only
function validate(password) { return /[\x20-\x7E]+/.test(password); }
One or more uppercase letters
function validate(password) { return /[A-Z]/.test(password); }
One or more lowercase letters
function validate(password) { return /[a-z]/.test(password); }
One or more number
function validate(password) { return /[0-9]/.test(password); }
One or more special characters
function validate(password) { return /[ !"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]/.test(password); }
Disallow three or more sequential identical characters like 111111
function validate(password) { return !/([\s\S])\1\1/.test(password); }
Discussion
Using JavaScript to validate passwords in a web browser can be very beneficial for users, but make sure to also implement validation on the server, so that users are not able to disable JavaScript or to use custom scripts to circumvent (迴避) client-side validation.