The Ultimate Guide to Regular Expressions: Everything You Need to Know

Unlock the power of regular expressions with this ultimate guide. Whether you're a beginner or an experienced programmer, this guide will take your regex skills to the next level.

3,126 words, estimated reading time 12 minutes.
Introduction to Programming with C#

This article is part of a series of articles. Please use the links below to navigate between the articles.

  1. Learn to Program in C# - Full Introduction to Programming Course
  2. Introdution to Programming - C# Programming Fundamentals
  3. Introduction to Object Oriented Programming for Beginners
  4. Introduction to C# Object-Oriented Programming Part 2
  5. Application Flow Control and Control Structures in C#
  6. Guide to C# Data Types, Variables and Object Casting
  7. C# Collection Types (Array,List,Dictionary,HashTable and More)
  8. C# Operators: Arithmetic, Comparison, Logical and more
  9. Using Entity Framework & ADO.Net Data in C# 7
  10. What is LINQ? The .NET Language Integrated Query
  11. Error and Exception Handling in C#
  12. Advanced C# Programming Topics
  13. All About Reflection in C# To Read Metadata and Find Assemblies
  14. What Are ASP.Net WebForms
  15. Introduction to ASP.Net MVC Web Applications and C#
  16. Windows Application Development Using .Net and Windows Forms
  17. Assemblies and the Global Assembly Cache in C#
  18. Working with Resources Files, Culture & Regions in .Net
  19. The Ultimate Guide to Regular Expressions: Everything You Need to Know
  20. Introduction to XML and XmlDocument with C#
  21. Complete Guide to File Handling in C# - Reading and Writing Files

I love it when people say "A simple way to do XYZ is to use regular expressions" and then offer what amounts to a string of indecipherable hieroglyphics to answer the question. However, once you know how to leverage the power of regular expressions, they can be very useful tools.

Egyptian Hieroglyphics
Regular Expressions can look a lot like Egyptian Hieroglyphs if you're not familiar with the syntax.

What are Regular Expressions?

A regular expression is a string of characters that form what is known as a pattern. This pattern can then be used to match a part, or parts, of another string. There are usually start and end characters to indicate where the pattern starts and stops, and a seemly random bunch of characters in between. These random characters are representations of different smaller patterns to match, for example, letters, numbers, punctuation or whitespace.

Regular expressions are a very fast and efficient method for string manipulation and can save tens of lines of code for complex operations, for example, an email address can be validated in just 4 lines of code, and that's splitting the lines up. You could do it in one line!

One very frustrating thing is that while regular expressions are fairly generic, each application "engine" has its implementation so it is rarely a simple case of copy and paste and it'll work. Examples of different application engines using regular expressions are Perl, PHP, .NET, Java, JavaScript, Python, Ruby and POSIX.

The term regular expression is often shortened to just regex, it's easier to say and type so I'm going to use that from now on.

Basic RegEx Syntax and Patterns

The basic syntax of regular expressions consists of a combination of characters and special symbols that define a pattern to be matched. Some common symbols include:

  • Dot (.) - Matches any single character except a newline.
  • Asterisk (*) - Matches zero or more occurrences of the preceding character or group.
  • Plus (+) - Matches one or more occurrences of the preceding character or group.
  • Question mark (?) - Matches zero or one occurrence of the preceding character or group.
  • Square brackets ([]) - Matches any single character within the brackets.
  • Caret (^) - Matches the beginning of a line.
  • Dollar sign ($) - Matches the end of a line.

By combining these symbols and characters, you can create powerful patterns to search for specific text. For example, the pattern "cat" would match any occurrence of the letters "cat" in a text, while the pattern "c.t" would match "cat", "cut", "cot", and so on.

Understanding the basic syntax and patterns of regular expressions is essential for effectively using them in your programming or text editing tasks. With practice, you can become proficient in creating complex patterns to match and manipulate text with ease.

Regular Expressions in C#

Regular expressions in C# are defined within the System.Text.RegularExpressions namespace which provides a Regex class. When instantiating the class you need to pass the expression string to the constructor. We have used a verbatim string for the regex as it makes the regex easier if you don't have to escape forward slashes.

Finding Values with RegEx

One of the basic methods of the RegEx class is called IsMatch. It simply returns true or false, depending on whether there are one or several matches found in the test string.

For our first RegEx example, we can use IsMatch to see if a string contains a number.

C#
string stringValue = "Franklin Moyer, 42 years old, born in Seattle";
Regex regex = new Regex("[0-9]+");
if (regex.IsMatch(testString))
    Console.WriteLine("String contains numbers!");
else
    Console.WriteLine("String does NOT contain numbers!");

Capture Values with RegEx

In this example, we'll capture the number found in the test string and present it to the user, instead of just verifying that it's there.

C#
string stringValue = "Franklin Moyer, 42 years old, born in Seattle";
Regex regex = new Regex("[0-9]+");
Match match = regex.Match(testString);
if (match.Success)
    Console.WriteLine("Number found: " + match.Value);

The Index and Length properties of match can be used to find out the location of the match in the string and the length of the match.

Group Matching with RegEx

In the previous two examples we saw how to find and extract a number from a string, so now let's look at groups and see how to extract both the age and the name.

This new pattern first looks for the separating comma and after that, a number, which is placed in the second capture group.

C#
string testString = string stringValue = "Franklin Moyer, 42 years old, born in Seattle";
Regex regex = new Regex(@"^([^,]+),s([0-9]+)");
Match match = regex.Match(testString);
if (match.Success)
    Console.WriteLine("Name: " + match.Groups[1].Value + ". Age: " + match.Groups[2].Value);

The groups property is used to access the matched groups. Index 0 contains the entire match, while Index 1 is for the name and 2 for the age.

Validation with RegEx

Regular Expressions can also be used for input validation. In this example, we test two strings to see if they contain a valid email address. The emailRegex will match any valid email address.

C#
string ValidEmailAddress = "somebody@somedomain.com";
string InvalidEmailAddress = "invalid.email-address.com&somedomann..3";
string emailRegex = @"^[w-]+(.[w-]+)*@([a-z0-9-]+(.[a-z0-9-]+)*?.[a-z]{2,6}|(d{1,3}.){3}d{1,3})(:d{4})?$";

Regex RegularExpression = new Regex(emailRegex);

if (RegularExpression.IsMatch(ValidEmailAddress))
  Console.WriteLine("{0}: is Valid Email Address", ValidEmailAddress);
else
  Console.WriteLine("{0}: is NOT a Valid Email Address", ValidEmailAddress);

if (RegularExpression.IsMatch(InvalidEmailAddress))
  Console.WriteLine("{0}: is Valid Email Address", InvalidEmailAddress);
else
  Console.WriteLine("{0}: is NOT a Valid Email Address", InvalidEmailAddress);

In ASP.Net you can use a RegularExpressionValidator to validate user input in forms on the client and server side.

C#
<asp:TextBox ID="txtEmail" runat="server" ></asp:TextBox>  
    <asp:RegularExpressionValidator ID="RegularExpressionValidator1" runat="server"   
        ErrorMessage="Please enter a valid email address"   
        ToolTip="Please enter a valid email address"   
        ValidationExpression="^w+([-+.']w+)*@w+([-.]w+)*.w+([-.]w+)*$"   
        ControlToValidate="txtEmail" ForeColor="Red">Please enter a valid email address</asp:RegularExpressionValidator>

Search/Replace with the Regex

Another powerful feature of regular expressions is to perform complex search and replace functions. We'll use the Replace() to remove whitespace (spaces, tabs) from a string.

C#
string stringValue = "Hello World 12345, Testing! - We are good!";
Regex regex = new Regex("[s+]");
stringValue = regex.Replace(stringValue, string.Empty);

How about removing anything that is not alpha-numeric (useful for input sanitisation)

C#
string stringValue = "Hello World 12345, Testing! - We are good!";
Regex regex = new Regex("[^a-zA-Z0-9]");
stringValue = regex.Replace(stringValue, string.Empty);

Next, we'll see how to strip HTML tags from a string using RegEx.

C#
string stringValue = "<b>Hello, <i>world</i></b>";
Regex regex = new Regex("<[^>]+>");
string cleanString = regex.Replace(stringValue, string.Empty);
Console.WriteLine(cleanString);

You can even use DataAnnotations on your models to enforce user input on both the client and server sides.

C#
[RegularExpression(@"^([w-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([w-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$", ErrorMessage = "Please enter a valid email address")]  
public string EmailAddress { get; set; }

Formatting Data with RegEx

You can also use RegEx.Replace to format strings, in this example taking a series of numbers and formatting them into a telephone number.

C#
string number = "0123456789";
string formatted = Regex.Replace(number, "(d{3})(d{3})(d{4})", "($1) $2-$3");
/ formatted = "(012) 345 6789"

Regular Expressions in PHP

In the PHP built-in regular expression functions, each regex starts and ends with a forward slash, with the pattern in between. In this example, I'm going to use "The quick brown fox jumps over the lazy dog" as a string to search in, and I want to find the word "the".

php
$string = 'The quick brown fox jumps over the lazy dog';
$pattern = "/the/";
preg_match_all($pattern, $string, $matches);
var_dump($matches);

The result is:

php
array (size=1)
  0 => 
    array (size=1)
      0 => string 'the' (length=3)

Just the one instance of "the" contained in our input string. But hold on I hear you cry, there are TWO the's in the string! And you are correct, there are, but one has a capital T, and the other does not. If we change the pattern to "/The/" then we match the other one. How do we get both? Easy, we can either make the whole pattern case insensitive (will match upper and lower case equally) or we can just match an upper or lower T.

Making Regex Case Insensitive

To make our initial pattern case insensitive, it is just a matter of placing i after the closing forward slashes like so:

php
$string = 'The quick brown fox jumps over the lazy dog';
$pattern = "/the/i"; / Note the position of i
preg_match_all($pattern, $string, $matches);
var_dump($matches);

The result is now:

php
array (size=1)
  0 => 
    array (size=2)
      0 => string 'The' (length=3)
      1 => string 'the' (length=3)

Matching Combinations of Characters with PHP Regular Expressions

Regex can also match combinations of characters. Let's start simple and match an upper case or lower case T. A square bracket is used to contain a combination of characters, numbers or symbols. Inside the square bracket, you can insert the characters to match. In this example, we want to find The and the, so we need to change the pattern to look for T and t.

php
$string = 'The quick brown fox jumps over the lazy dog';
$pattern = "/[Tt]he/"; 
preg_match_all($pattern, $string, $matches);
var_dump($matches);

This will now match an uppercase T or a lowercase t followed by 'he'.

php
array (size=1)
  0 => 
    array (size=2)
      0 => string 'The' (length=3)
      1 => string 'the' (length=3)[/pre]

This method should only be used if the first method cannot be used.

<h3>Matching Strings Starting With X in PHP Regular Expressions</h3>
You can also use a special character to indicate "at the start" and this character is a caret (^). Changing the pattern from the example, the addition of the caret will find a match only if the pattern is at the start of the string.

[code lang="php"]$string = 'The quick brown fox jumps over the lazy dog';
$pattern = "/^The/"; 
preg_match_all($pattern, $string, $matches);
var_dump($matches);
php
array (size=1)
  0 => 
    array (size=1)
      0 => string 'The' (length=3)

Searching for a lowercase t at the start of the string does not find any matches.

php
$string = 'The quick brown fox jumps over the lazy dog';
$pattern = "/^the/"; 
preg_match_all($pattern, $string, $matches);
var_dump($matches);
php
array (size=1)
  0 => 
    array (size=0)
      empty

Matching Strings Ending With

As with matching patterns at the start, there is another special character to match a pattern at the end. This is the dollar symbol and it goes right before the end forward slash.

php
$string = 'The quick brown fox jumps over the lazy dog';
$pattern = "/the$/"; 
preg_match_all($pattern, $string, $matches);
var_dump($matches);
php
array (size=1)
  0 => 
    array (size=0)
      empty

The word 'the' is not matched at the end of the string, so there are no matches. Dog is however at the end of the string so we can match that.

php
$string = 'The quick brown fox jumps over the lazy dog';
$pattern = "/dog$/"; 
preg_match_all($pattern, $string, $matches);
var_dump($matches);
php
array (size=1)
  0 => 
    array (size=1)
      0 => string 'dog' (length=3)

Matching Strings Starting And Ending With (Exact Match)

We can combine both the start and end to form an exact pattern match.

php
$string = 'The quick brown fox jumps over the lazy dog';
$pattern = "/^The quick brown fox jumps over the lazy dog$/";
preg_match_all($pattern, $string, $matches);
var_dump($matches);
php
array (size=1)
  0 => 
    array (size=1)
      0 => string 'The quick brown fox jumps over the lazy dog' (length=43)

Although this is not a particularly good example, you may as well just do a string compare, but it does lead nicely onto...

Matching Strings Starting And Ending With (Wildcard Match)

What if you wanted to find a match between two known words? In this example, let's find the fox. In the pattern, we can replace the word fox with a wildcard pattern that will match anything in its place.

A wildcard matches anything and takes the form of '(.*)'

php
$string = 'The quick brown fox jumps over the lazy dog';
$pattern = "/^The quick brown (.*) jumps over the lazy dog$/";
preg_match_all($pattern, $string, $matches);
var_dump($matches);
php
array (size=2)
  0 => 
    array (size=1)
      0 => string 'The quick brown fox jumps over the lazy dog' (length=43)
  1 => 
    array (size=1)
      0 => string 'fox' (length=3)

Multiple Matches with PHP Regular Expressions

We can also match multiple animals in this example, just by adding another wild card.

php
$string = 'The quick brown fox jumps over the lazy dog';
$pattern = "/^The quick brown (.*) jumps over the lazy (.*)$/";
preg_match_all($pattern, $string, $matches);
var_dump($matches);
php
array (size=3)
  0 => 
    array (size=1)
      0 => string 'The quick brown fox jumps over the lazy dog' (length=43)
  1 => 
    array (size=1)
      0 => string 'fox' (length=3)
  2 => 
    array (size=1)
      0 => string 'dog' (length=3)

Being Specific with PHP Regular Expressions

Wildcards are all well and good, but they will match anything and everything, so it is best to be as specific as possible especially when dealing with user input.

php
$string = 'The quick brown DELETE ALL MY DATA jumps over the lazy dog';
$pattern = "/^The quick brown (.*) jumps over the lazy (.*)$/";
preg_match_all($pattern, $string, $matches);
var_dump($matches);
php
array (size=3)
  0 => 
    array (size=1)
      0 => string 'The quick brown fox jumps over the lazy dog' (length=43)
  1 => 
    array (size=1)
      0 => string 'DELETE ALL MY DATA' (length=3)
  2 => 
    array (size=1)
      0 => string 'dog' (length=3)

By swapping the wildcard for a more accurate range of values or limited characters, you can be sure of a better and safer match. If the value you are expecting is going to be a single word then you can match the alphabet range a-z and if upper case characters could be involved you can match A-Z as well. Like matching individual upper case and lower case letters, ranges go in square brackets, which are within parenthesis "(" and ")" along with curly brackets. Now we're starting to get Egyptian hieroglyphs!

OK, the pattern for matching any letter, uppercase or lowercase of one character or more is this:

php
([A-Za-z]{1,})

Let's break it down. The parenthesis indicates "match anything in here". The square brackets match a single character matching the pattern inside. A-Z is the alphabetical range in upper case; likewise, a-z is lowercase. Finally, the curly bracket indicates that there should be one or more of the preceding patterns, which in this case is our square bracket. The curly brackets take two numbers, the minimum length of the string to match and the maximum. If the maximum is omitted as in our example, then there is no limit. For example, to match a word between 3 and 5 characters long the rule would be {3,5} and to match a word 6 characters or longer it would be {6,}.

Now our pattern looks like this:

php
$pattern = "/^The quick brown ([A-Za-z]{1,}) jumps over the lazy ([A-Za-z]{1,})$/";

And the results are much safer and only contain expected results. If an input string contains invalid, or non-matched patterns then it will return empty.

php
$string = 'The quick brown INSERT MALICIOUS CODE HERE jumps over the lazy dog';
$pattern = "/^The quick brown (.*) jumps over the lazy ([A-Za-z]{1,})$/";
preg_match_all($pattern, $string, $matches);
var_dump($matches);
php
array (size=3)
  0 => 
    array (size=0)
      empty
  1 => 
    array (size=0)
      empty
  2 => 
    array (size=0)
      empty

But for valid input:

php
$string = 'The quick brown fox jumps over the lazy dog';
$pattern = "/^The quick brown ([A-Za-z]{1,}) jumps over the lazy ([A-Za-z]{1,})$/";
preg_match_all($pattern, $string, $matches);
var_dump($matches);
php
array (size=3)
  0 => 
    array (size=1)
      0 => string 'The quick brown fox jumps over the lazy dog' (length=43)
  1 => 
    array (size=1)
      0 => string 'fox' (length=3)
  2 => 
    array (size=1)
      0 => string 'dog' (length=3)

Advanced RegEx Techniques and Tips

Once you have a solid understanding of the basics of regular expressions, you can start exploring more advanced techniques and tips to take your regex skills to the next level. Here are some advanced techniques that can help you become a regex master:

  1. Lookaheads and Lookbehinds: Lookaheads and lookbehinds allow you to match patterns based on what comes before or after a certain point, without including that part in the match. This can be useful for complex matching scenarios.
  2. Quantifiers: Quantifiers allow you to specify how many times a certain pattern should occur. For example, the quantifier "+" matches one or more occurrences of the preceding pattern, while the quantifier "*" matches zero or more occurrences.
  3. Grouping and Capturing: Grouping allows you to treat multiple characters as a single unit and apply quantifiers or other operations to that group. Capturing allows you to extract specific parts of a match and use them in replacement patterns.
  4. Backreferences: Backreferences allow you to refer back to captured groups within the same regular expression. This can be useful for finding repeated patterns or for performing complex replacements.
  5. Character Classes: Character classes allow you to match a set of characters. For example, the character class "[aeiou]" matches any vowel, while the character class "[0-9]" matches any digit.
  6. Greedy vs. Lazy Matching: By default, regular expressions are greedy, meaning they will match as much as possible. However, you can make them lazy by adding a "?" after a quantifier, which will make them match as little as possible.
  7. Anchors and Boundaries: As mentioned earlier, anchors and boundaries allow you to specify where a pattern should match within a string or word. Mastering the use of anchors and boundaries can greatly enhance your regex skills.
  8. Escape Sequences: Some characters have special meanings in regular expressions, such as ".", "*", and "+". If you want to match these characters literally, you need to escape them with a backslash ("\").

By exploring these advanced techniques and tips, you can become a regex expert and unlock the full power of regular expressions in your programming projects. Remember to practice and experiment with different patterns to fully grasp their capabilities.

Common RegEx Match Patterns

You can use any of these patterns to match common input formats for validation and sanitisation.

Pattern Description
abc Letters
123 Digits
d Any Digit
D Any Non-digit character
. Any Character
. Period
[abc] Only a, b, or c
[^abc] Not a, b, nor c
[a-z] Characters a to z
[0-9] Numbers 0 to 9
w Any Alphanumeric character
W Any Non-alphanumeric character
{m} m Repetitions
{m,n} m to n Repetitions
* Zero or more repetitions
+ One or more repetitions
? Optional character
s Any Whitespace
S Any Non-whitespace character
^...$ Starts and ends
(...) Capture Group
(a(bc)) Capture Sub-group
(.*) Capture all
(abc|def) Matches abc or def

Common RexEx Validation Patterns

^[w-]+(.[w-]+)*@([a-z0-9-]+(.[a-z0-9-]+)*?.[a-z]{2,6}|(d{1,3}.){3}d{1,3})(:d{4})?$ Email address
/((www.|(http|https|ftp|news|file)+:/)[_.a-z0-9-]+.[a-z0-9/_:@=.+?,##%&~-]*[^.|'|# |!|(|?|,| |>|<|;|)])/ Website
Was this article helpful to you?
 

Related ArticlesThese articles may also be of interest to you

CommentsShare your thoughts in the comments below

If you enjoyed reading this article, or it helped you in some way, all I ask in return is you leave a comment below or share this page with your friends. Thank you.

There are no comments yet. Why not get the discussion started?

We respect your privacy, and will not make your email public. Learn how your comment data is processed.