Regular Expression Syntax

A regular expression is a very powerful (and sometimes complicated) tool, and a complete understanding of their capabilities and applications goes well beyond this primer. Instead, some standard expressions that are typically used and the rules below must be understood in order to read a regular expression, including:

Characters

The following parts of a regular expression allow you to indicate both matches for characters and character classes.

Matches

To specify that a certain character or character set can be matched, use the following options.

Example: 'a' matches the character 'a'.

Example: a..d

Matches: 'abcd', 'a12d', and 'aaad'

Does Not Match: 'abbb', 'abcde', or 'ba1d'

Example: [ae]

Matches: 'a' and 'e'

Does Not Match: 'b', 'c', or 'd'

Example: [12345]

Matches: '1', '2', '3', '4', and '5'

Does Not Match: '0', '6', '7', '8', or '9'

Example: [a-e]

Matches: 'a', 'b', 'c', 'd', and 'e'

Does Not Match: '1' or 'f'

Example: [1-5]

Matches: '1', '2', '3', '4', and '5'

Does Not Match: '0', '6', '7', '8', or '9'

Example: [a-z][a-z0-9][a-z0-9]

Matches any lowercase letter followed by two other lowercase letter or numbers

Matches: 'a11', 'bt9', and 'xyz'

Does Not Match: '1ab' or 'abc1'

Classes

To specify that the text must be of a certain class, use the following options.

\d indicates a digit character is required.

\D indicates a digit character must not be present.

Example: '\d' matches the number '5' in the value '5 = V'.

Example: '\D' matches the equal sign '=' in the value '5 = V'.

\w indicates a word character is required.

\W indicates a word character must not be present.

Example: '\w' matches the number '5' in the value '5 = V'.

Example: '\W' matches the equal sign '=' in the value '5 = V'.

\s indicates space character is required.

\S indicates space character must not be present.

Example: '\s' matches the space between the two words in the value 'this is'.

Example: '\S' matches the character 't' in the value ' this is'.

Repetition Modifiers

To specify that a certain character or character set can be matched more than once, you can add repetition modifiers after the character or character set. Some common repetition modifiers are described below.

Example: [a-z]+[0-9]

Matches anything starting with one or more lowercase letters, followed by a single number

Matches: 'abc1', 'a1', and 'abcxyz9'

Does Not Match: '9a', 'a22', or 'a-4'

Example: [a-z]+[0-9]?

Matches anything starting with one or more lower-case letters, optionally followed by a single number

Matches: 'abc', 'abc2', 'x', 'y3', and 'abcdefgh8'

Does Not Match: '9', 'a99', or 'a-4'

Example: .*

Any number (0 or more, due to the '*') of any character (due to the '.')

Will match any text

Matches: 'a1b2c3', 'aaaa', and '111'

Example: [0-9]{5}-[0-9]{4}

Simple zip code + 4 matcher

Matches: '12345-0123'

Does Not Match: '12345' or '6789012'

Groups and Ranges

Grouping is not related to matching, but is used to store certain matched sets of characters for use later, typically in a substitution.

Example: ([0-9]{5})-([0-9]{4})

Stores the first five digits in group 1

Stores the last four digits in group 2

The dash ( - ) character is not stored as it is not inside parenthesis

Example: $1

Use the content from group 1

With the expression '([0-9]{5})-([0-9]{4})' and value '12345-0123' the results are $1 = '12345' and $2 = '0123'

Example: [a-z]|[0-9]

Matches anything with one lowercase letter OR one numeric digit

Matches: 'c' and '8'

Does Not Match: 'X' or '22'

Escaping Special Characters

There are several characters that have special meanings, such as ?, . (dot), *, +, (, ), {, }, and more. This prompts the question: how can you match a literal example of one of these characters? The key is to escape the special character by putting a backslash ( \ ) in front of it.

Example: \.

Retain the dot ( . )

Example: .*\..*

Match anything that has a number of characters, followed by a period, followed by any number of characters

Used to match file names with extensions (e.g. abc123.pdf)

Can also be used to match any text (e.g. 'Hello there this is matched')

Anchors

The following characters indicate the beginning or end of an expression.

Example: ^[1-5][0-9]$|^[0-9]$

Two strings are being considered for matching. The first string allows a single digit from 1 - 5, and a second digit from 0 - 9. The pipe (|) symbol indicates an 'or' condition, meaning either the first or second string can be matched. The second string allows for a single digit from 0 - 9.

 

2019, Stibo Systems – Confidential