Web Design that taps into the haromny and vision of your dreams.

Regular Expressions in C#

Those undecipherable hieroglyphics called RegEx's

By on in Coding

1,215 words, estimated reading time 6 minutes.

Introduction to Programming with C# Series
  1. Introduction to Programming with C# 7
  2. C# Programming Fundamentals
  3. Introduction to Object-Oriented Programming
  4. C# Object-Oriented Programming Part 2
  5. Flow Control and Control Structures in C#
  6. C# Data Types, Variables and Casting
  7. C# Collection Types (Array, List, Dictionary, Hash Table)
  8. C# Operators
  9. Using Data in C# 7 with ADO.Net & Entity Framework
  10. LINQ: .NET Language Integrated Query
  11. Error and Exception Handling in C#
  12. Advanced C# Programming Topics
  13. Reflection in C#
  14. What Are ASP.Net Webforms
  15. Introduction to ASP.Net MVC
  16. Windows Application Development
  17. Assemblies in C#
  18. Working with Resources Files, Culture & Regions
  19. Regular Expressions in C#
  20. Introduction to XML with C#
  21. Complete Guide to File Handling in C#

Regular expressions are special strings that are used to describe a search pattern. They can be used for data validation, data processing and pattern matching.

I love it when people say "a simple way to do XYZ is to use regular expressions" and then offer what amounts to a string of indecipherable hieroglyphics to answer the question. However, once you know how to leverage the power of regular expressions, they can be very useful tools.

Firstly, let's talk about what exactly a regular expression is. A regular expression is a string of characters that form what is known as a pattern. This pattern can then be used to match a part, or parts, of another string. There are usually start and end characters to indicate where the pattern starts and stops, and a seemly random bunch of characters in between. These random characters are in fact representations of different smaller patterns to match, for example, letters, numbers, punctuation or whitespace.

Regular expressions are a very fast and efficient method for string manipulation and can save tens of lines of code for complex operations, for example, an email address can be validated in just 4 lines of code, and that's splitting the lines up. You could do it in one line!

One thing that is very frustrating is that while regular expressions are fairly generic, each application "engine" has its own implementation so it is rarely a simple case of copy and paste and it'll work. Examples of different applications engines using regular expressions are Perl, PHP, .NET, Java, JavaScript, Python, Ruby and POSIX.

The term regular expression is often shortened to just regex, it's easier to say and type so I'm going to use that from now on.

Regular expressions in C# are defined within the System.Text.RegularExpressions namespace which provides a Regex class. When instantiating the class you need to pass the expression string to the constructor. We have used a verbatim string for the regex as it makes the regex easier if you don't have to escape forward slashes.

Finding Values with RegEx

One of the basic methods of the RegEx class called IsMatch. It simply returns true or false, depending on whether there is one or several matches found in the test string.

For our first RegEx example, we can use IsMatch to see if a string contains a number.

string stringValue = "Franklin Moyer, 42 years old, born in Seattle";
Regex regex = new Regex("[0-9]+");
if (regex.IsMatch(testString))
    Console.WriteLine("String contains numbers!");
else
    Console.WriteLine("String does NOT contain numbers!");

Capture Values with RegEx

In this example, we'll capture the number found in the test string and present it to the user, instead of just verifying that it's there.

string stringValue = "Franklin Moyer, 42 years old, born in Seattle";
Regex regex = new Regex("[0-9]+");
Match match = regex.Match(testString);
if (match.Success)
    Console.WriteLine("Number found: " + match.Value);

The Index and Length properties of match can be used to find out the location of the match in the string and length of the match.

Group Matching with RegEx

In the previous two examples we saw how to find and extract a number from a string, so now let's look at groups and see how to extract both the age and the name.

This new pattern first looks for the separating comma and after that, a number, which is placed in the second capture group.

string testString = string stringValue = "Franklin Moyer, 42 years old, born in Seattle";
Regex regex = new Regex(@"^([^,]+),\s([0-9]+)");
Match match = regex.Match(testString);
if (match.Success)
    Console.WriteLine("Name: " + match.Groups[1].Value + ". Age: " + match.Groups[2].Value);

The groups property is used to access the matched groups. Index 0 contains the entire match, while Index 1 is for the name and 2 for the age.

Validation with RegEx

Regular Expressions can also be used for input validation. In this example, we test two strings to see if they contain a valid email address. The emailRegex will match any valid email address.

string ValidEmailAddress = "somebody@somedomain.com";
string InvalidEmailAddress = "invalid.email-address.com&somedomann..3";
string emailRegex = @"^[w-]+(.[w-]+)*@([a-z0-9-]+(.[a-z0-9-]+)*?.[a-z]{2,6}|(d{1,3}.){3}d{1,3})(:d{4})?$";
 
Regex RegularExpression = new Regex(emailRegex);
 
if (RegularExpression.IsMatch(ValidEmailAddress))
  Console.WriteLine("{0}: is Valid Email Address", ValidEmailAddress);
else
  Console.WriteLine("{0}: is NOT a Valid Email Address", ValidEmailAddress);
 
if (RegularExpression.IsMatch(InvalidEmailAddress))
  Console.WriteLine("{0}: is Valid Email Address", InvalidEmailAddress);
else
  Console.WriteLine("{0}: is NOT a Valid Email Address", InvalidEmailAddress);

In ASP.Net you can use a RegularExpressionValidator to validate user input in forms on the client and server side.

<asp:TextBox ID="txtEmail" runat="server" ></asp:TextBox>  
    <asp:RegularExpressionValidator ID="RegularExpressionValidator1" runat="server"   
        ErrorMessage="Please enter a valid email address"   
        ToolTip="Please enter a valid email address"   
        ValidationExpression="^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$"   
        ControlToValidate="txtEmail" ForeColor="Red">Please enter a valid email address</asp:RegularExpressionValidator>

Search/Replace with the Regex

Another powerful feature of regular expressions is to perform complex search and replace functions. We'll use the Replace() to remove whitespace (spaces, tabs) from a string.

string stringValue = "Hello World 12345, Testing! - We are good!";
Regex regex = new Regex("[\s+]");
stringValue = regex.Replace(stringValue, string.Empty);

How about removing anything that is not alpha-numeric (useful for input sanitisation)

string stringValue = "Hello World 12345, Testing! - We are good!";
Regex regex = new Regex("[^a-zA-Z0-9]");
stringValue = regex.Replace(stringValue, string.Empty);

Next, we'll see how to strip HTML tags from a string using RegEx.

string stringValue = "<b>Hello, <i>world</i></b>";
Regex regex = new Regex("<[^>]+>");
string cleanString = regex.Replace(stringValue, string.Empty);
Console.WriteLine(cleanString);

You can even use DataAnnotations on your models to enforce user input in both the client and server sides.

[RegularExpression(@"^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$", ErrorMessage = "Please enter a valid email address")]  
public string EmailAddress { get; set; }

Formatting Data with RegEx

You can also use RegEx.Replace to format strings, in this example taking a series of numbers and formatting them into a telephone number.

string number = "0123456789";
string formatted = Regex.Replace(number, "(\d{3})(\d{3})(\d{4})", "($1) $2-$3");
// formatted = "(012) 345 6789"

Common RegEx Patterns

You can use any of these patterns to match common input formats for validation and sanitisation.

PatternDescription
abcLetters
123Digits
\dAny Digit
\DAny Non-digit character
.Any Character
\.Period
[abc]Only a, b, or c
[^abc]Not a, b, nor c
[a-z]Characters a to z
[0-9]Numbers 0 to 9
\wAny Alphanumeric character
\WAny Non-alphanumeric character
{m}m Repetitions
{m,n}m to n Repetitions
*Zero or more repetitions
+One or more repetitions
?Optional character
\sAny Whitespace
\SAny Non-whitespace character
^...$Starts and ends
(...)Capture Group
(a(bc))Capture Sub-group
(.*)Capture all
(abc|def)Matches abc or def

Common Validation Patterns

^[w-]+(.[w-]+)*@([a-z0-9-]+(.[a-z0-9-]+)*?.[a-z]{2,6}|(d{1,3}.){3}d{1,3})(:d{4})?$Email address
/((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])/Website

Last updated on: Thursday 11th October 2018

 

Comments

Have a question or suggestion? Please leave a comment to start the discussion.

 

Leave a Reply

Please keep in mind that all comments are moderated according to our privacy policy, and all links are nofollow. Do NOT use keywords in the name field. Let's have a personal and meaningful conversation.

Your email address will not be published.