Tutorial 14 - Regular ExpressionsTutorial 13 - Getting The Bugs Out - Tutorial 15 - User-Defined Objects Regular expressions are a shorthand notation for matching, extracting, sorting or formatting strings. Their most common use is to reduce the amount of work while validating data input. This tutorial covers the special syntax used, how one can use a regular expression in form validation, and several useful examples. Escaper CharactersEscaper characters are used to literally represent characters that normally have a special meaning in regular expressions (ie. meta-characters). They are also used to represent non-typable characters.
Character ClassesSpecial characters are used as a shorthand, to abbreviate the amount of typing and specifying required when creating a regular expression. For example \w includes all letters, numbers and the underscore character.
Boundary Matches and Greedy Quantifiers
Regular Expression ModifiersRegular expression modifiers have been added to the syntax to handle global modification of the entire expression. They are placed at the end of the expression outside the quoting brackets as in /[abc]+/i
Testing Regular ExpressionsSeveral on-line sites let you test a regular expression before using it in your own scripts. A good one to use is provided by Locher. Using Regular Expressions in ScriptsTo use a regular expression for validating an entry in
JavaScript, first set up a variable that contains the expression. re = /whatever/ Then apply the regular expression test method on the string to be tested. if (re.test(entryValue)) {return true;}
To use a regular expression to extract a matching string, first set up a regular expression variable as above. Next use the regular expression exec method on the string. Any match is returned and null indicates no match. var ar = re.exec(var_string); To use a regular expression for modifying a string in JavaScript first set up a regular expression variable as above. Next use the string replace method. Note that you can use back references if required. var x = y.replace(re,"$1"); Example - Canadian Postal CodesCanadian postal codes alternate between letter and number such as L0S 1E0. Some choose not to put in the space. And not every letter is used as the first letter which designates region. The regions are (from east to west, then north): A,B,C,E,G,H,J,K,L,M,N,P,R,S,T,V,X,Y. A 'first version' regular expression for Canadian postal codes is: /^([a-z]\d){3}$/i
This expression makes sure that there is exactly 3 {3} groups of a letter [a-z] followed by a digit \d. The i suffix indicates insensitivity (ie capitals allowed). The ^ and $ guarantee that no other data is provided. However this easy to understand expression does not allow for an optional space after the third character or a restricted subset on the first letter. It also doesn't allow for leading/trailing whitespace. The solution is to explicitly do the repeating but place a (/s)? to check for zero or one space after the third character and to reduce the matches on the first letter to the specific regional characters. /^\s*[a-ceghj-npr-tvxy]\d[a-z](\s)?\d[a-z]\d\s*$/i Example - URLs and FilesOften validation of an URL or filename requires a specific extension. One regular expression that will catch all filenames (and more!) is: /^\S+\.(gif|jpg|jpeg|png)$/ The above expression will match only image files that are Web standard. The expression is not foolproof as it permits subfolders with null names such as a//b.gif and specs like a:/b:/c.gif Example - E-mail AddressesE-mail addresses are of the form xxx@yyy where xxx is the specific mailbox (and can contain underscores and periods) and yyy is the domain which can contain a series of suffixes such as .com.uk. One regular expression that matches 99.99% of valid entries is: /^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/
This is a very complex expression and deserves explanation. All regular expressions start and end with forward slashes to differentiate them from ordinary string expressions. Most regular expressions start matches at the first character ^ and end at the last $. Now we try to match the mailbox name which can include periods and dashes \w+ states one or more alphanumeric must be at the start of the name. ([\.-]?\w+)* allows periods or dashes to be included in the mailbox name with the trailing \w+ ensuring that those characters can not finish the name. The @ is the mandatory separator. The domain name can have several .xx or .xyz suffixes such as .com.uk. Once again \w+ ensures that domain starts with an alphanumeric and ([\.-]?\w+)* allows for the dashes and periods. Finally (\.\w{2,3})+ ensures that there is at least one suffix of between 2 and 3 characters preceded by a period. Note: This is not a completely foolproof validation as it does not account for new domain names of 4 or more characters. Also not all two and three letter combinations are legitimate domains! Tutorial 13 - Getting The Bugs Out - Tutorial 15 - User-Defined Objects |