This article will help you to:

  • Understand what a regex is
  • Understand the special characters used in regex
  • Understand the advanced features of regex

Definition

A regex, or regular expression, is a sequence of characters that designates a more or less restrictive part of your data. A regex describes a pattern to search for, such as a URL address, phone number or date.

Simple search

A regex engine examines the correspondence (= match) of each character one after the other. A regex matches a text if the text or a part of it meets all the regex conditions at a given time.

Literal search

The regex Paris matches with all texts containing Paris, such as "Paris" or "France (Paris)".

The symbol | (Pipe) creates an "OR" condition in a regex. Thus, France|EU Zone matches with texts containing "France" or "EU Zone".

Grouped search

Parentheses group expressions in a block.

For example, (France|EU) Zone matches texts containing "France Zone" or "EU Zone".

Square brackets create a possible selection of characters to match. For example, [AbcZ] matches texts containing "A" or "b" or "c" or "Z". Conversely, the expression [^AbcZ] does not match text containing any of these characters.

Dashes in square brackets create character ranges. Thus, [0-5] matches with a digit between 0 and 5. Similarly, [a-zA-Z0-9] matches any upper or lower case letter or any number.

Metacharacters

A number of special characters are interpreted differently by the regex engine, such as parentheses or question marks.

Note: To search for a special character, type it preceded by a backslash \.

Wildcards

  • The point . matches with any character
  • The expression \d matches with a number between zero and nine...
  • The expression \w matches with a letter, a number or an underscore _
  • The expression \s matches with a space.
  • The expression \b matches with a beginning or an end.

Quantifiers

  • The expression x{n} matches with x present n times
  • The expression x{n, } matches with x present at least n times
  • The expression x{n,m} matches with x present between n and m times (inclusive)
  • The expression x? matches with x present between 0 and 1 time
  • The expression x+ matches with x present at least 1 time
  • The expression x* matches with x present at least 0 times

Notes : Quantifiers apply to groups, as in (cpc-){2}(EN)? which matches with "cpc-cpc-FR" or with "cpc-cpc-". In addition, the expression .* is used to designate a string of any characters, of any length (possibly null).

Anchors

  • The symbol ^ indicates the beginning of the text being searched for. Thus, the expression ^x matches what starts with x
  • The $ symbol indicates the end of the text being searched for. Thus, the expression x$ matches what ends with x

Note: ^mail$ allows to match "mail" only (and doesn't match with anything else, like "email" or "mails").

Advanced features

Greedy matching

A regex quantizer is by default "greedy" and performs the match on as many characters as possible. Thus, abc* will prefer "abcc" to "abc" or to "ab".

Adding a question mark after the quantizer allows it to match as few characters as possible. Thus, abc*? will prefer "ab".

Group capture

In addition to creating groups of characters, the parentheses capture these groups. The captured group is stored in memory for later use. Group captures are used to select a particular part of the regex to be extracted.

The captured groups are numbered from \1 to \9. It is then possible to refer to them in an expression. For example, (\d*)-(\w)-(\1) match with sequences such as "753-aby-753" or "223-pt4-223".

Note: The expression (?:Google-)Ads allows you to match with "Google-Ads" without capturing "Google-".

Lookahead and lookbehind

  • The expression abc(?=xyz) allows to match with "abc" only if it is followed by "xyz", and without capturing "xyz".
  • The expression (?<=abc)xyz allows you to match with "xyz" only if it's followed by "abc", and without capturing "abc".
  • To use the negation of a lookahead/lookbehind, replace the = with a !For example, \d+(?!€) match with "3$", "100£" but not with "5€".

Using regex in Reeport

If you want to create regex in Reeport, you can read the article on how to use regex in Reeport.

Did this answer your question?