regex expressions a working sheet

Andrew Fletcher published: 18 November 2020 (updated) 25 February 2021 7 minutes read

Regular expressions (regex) are extremely useful in extracting information from any text by searching for one or more matches of a specific search pattern.

The basic anchors - ^ and $

expression	action
^The	matches any string that starts with The
end$	matches a string that ends with end
^The end$	exact string match (starts and ends with The end)
pragmatic	matches any string that has the text pragmatic in it

The basic quantifiers — * + ? and {}

expression	action
alpha*	alph matches the characters alph literally (case sensitive) a* matches the character a literally (case sensitive) * Quantifier — Matches between zero and unlimited times
alpha+	alph matches the characters alph literally (case sensitive) a+ matches the character a literally (case sensitive) + Quantifier — Matches between one and unlimited times
alpha?	alph matches the characters alph literally (case sensitive) a? matches the character a literally (case sensitive) ? Quantifier — Matches between zero and one times
alpha{2}	alph matches the characters alph literally (case sensitive) a{2} matches the character a literally (case sensitive) {2} Quantifier — Matches exactly 2 times matches the character literally (case sensitive)
alpha{2,}	alph matches the characters alph literally (case sensitive) a{2,} matches the character a literally (case sensitive) {2,} Quantifier — Matches between 2 and unlimited
alpha{2,5}	alph matches the characters alph literally (case sensitive) a{2,5} matches the character a literally (case sensitive) {2,5} Quantifier — Matches between 2 and 5 times
alp(ha)*	alp matches the characters alp literally (case sensitive) 1st Capturing Group (ha)* * Quantifier — Matches between zero and unlimited times,
alp(ha){2,5}	alp matches the characters alp literally (case sensitive) 1st Capturing Group (ha){2,5} {2,5} Quantifier — Matches between 2 and 5 times

The basic OR operators - | and []

expression	action
alp(h\|a)	alp matches the characters alp literally (case sensitive) 1st Capturing Group (h\|a) 1st Alternative h – h matches the character h literally (case sensitive) 2nd Alternative a – a matches the character a literally (case sensitive)
a[bc]	alb matches the characters alb literally (case sensitive) Match a single character present in the list below [bc] ha matches a single character in the list bc (case sensitive)

The basic character classes - \d \w \s . \.

expression	action
\d	\d matches a digit (equal to [0-9])
\D	matches any non digit
\w	\w matches any word character (equal to [a-zA-Z0-9_])
\W	matches any non word character
\s	\s matches any whitespace character (equal to [\r\n\t\f\v ])
.	. matches any character (except for line terminators)
\.	\. matches the character . literally (case sensitive)

Bracket expressions[]

expression	action
[alpha]	matches a string that has either an a, l, p or h
[a-zA-Z]	a string that has a letter from a to z or from A to Z
[a-zA-Z0-9]	matches a string that has a letter from a to z or from A to Z or 0 to 9
[^a-zA-Z]	a string that doesn't have a letter from a to z or from A to Z. In this case the ^ is used as negation of the expression
[0-9]%	a string that has a character from 0 to 9 before a % sign

Word boundaries

expression	action
\balpha\b	\b assert position at a word boundary: (^\w\|\w$\|\W\w\|\w\W) alpha matches the characters alpha literally (case sensitive) \b assert position at a word boundary: (^\w\|\w$\|\W\w\|\w\W)
\Balpha\B	\B assert position where \b does not match alpha matches the characters alpha literally (case sensitive) \B assert position where \b does not match

Tokens

expression	action
\n	newline
\r	return
\t	tab
\0	null character

References

expression	action
(...)	Parts of the regex enclosed in parentheses may be referred to later in the expression or extracted from the results of a successful match.
(alpha)	1st Capturing Group (alpha) alpha matches the characters alpha literally (case sensitive)
([alpha])	1st Capturing Group ([alpha]) Match a single character present in the list below [alpha] alpha matches a single character in the list alph (case sensitive)
a(?=l)	a matches the character a literally (case sensitive) Positive Lookahead (?=l) Assert that the Regex below matches l matches the character l literally (case sensitive)
(?<=d)e	Positive Lookbehind (?<=d) Assert that the Regex below matches d matches the character d literally (case sensitive) e matches the character e literally (case sensitive)

Examples

expression	action
/[a-z.\/:=_]{12,}/i	{12,} Quantifier — Matches between 12 and unlimited times, as many times as possible, giving back as needed (greedy) a-z a single character in the range between a (index 97) and z (index 122) (case insensitive) . matches the character . literally (case insensitive) \/ matches the character / literally (case insensitive) :=_ matches a single character in the list :=_ (case insensitive)
/^[1-2][0-9\.]*$/	matches at the start of the string a number that is either 1 or 2 matches at the end of the string a number (0 - 9)
/^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$/	simply this is looking for an IPv4 address ^ asserts position at start of the string match between the first three variables as digits followed by a full point (.), then repeat this four times. Except on the last attempt do not include the full point

More examples

1. Removing tags and  

Using regex, remove tags such as <p>, <ul>, <li>, <h1>, <h5> and etc, plus also remove an extra spaces  

The text that we will test this on is going to be:

<p>Load testing verifies the system performance under the expected peak load.  The peak load needs to set by a series of parameters that you have benchmarked targets.  For example, these parameters could include:</p> <h5>Load testing:</h5> <ul> <li>20,000 concurrent users; and</li> <li>response time of under 4 seconds</li> </ul> <h5>Stress testing:</h5> <ul> <li>Verifies the server performance under extreme load.  Test this through examining how many users are required to bring your server</li> </ul> <h5>Endurance testing:</h5> <p>Load test over an extended period of time</p> <p> </p> <h4>Check with your hosting provider</h4>

I needed to remove the tags (<p>, <ul>, <li>, etc...) and   I could remove the tags using the php command strip_tags(). However, I do as much through regex as possible.

expression	action
/<[a-zA-Z\/][^>]*\| \|>/gi	< matches the character < literally (case insensitive) [a-zA-Z\/] a-z a single character in the range between a (index 97) and z (index 122) (case insensitive) A-Z a single character in the range between A (index 65) and Z (index 90) (case insensitive) \/ matches the character / literally (case insensitive) * Quantifier — Matches as many times as possible > matches the character > literally (case insensitive)

expression

action

/<[a-zA-Z\/][^>]*| |>/gi

< matches the character < literally (case insensitive)
[a-zA-Z\/]
a-z a single character in the range between a (index 97) and z (index 122) (case insensitive)
A-Z a single character in the range between A (index 65) and Z (index 90) (case insensitive)
\/ matches the character / literally (case insensitive)
* Quantifier — Matches as many times as possible
> matches the character > literally (case insensitive)

Through using the above regex, the outcome is as follows:

Load testing verifies the system performance under the expected peak load. The peak load needs to set by a series of parameters that you have benchmarked targets. For example, these parameters could include: Load testing: 20,000 concurrent users; and response time of under 4 seconds Stress testing: Verifies the server performance under extreme load. Test this through examining how many users are required to bring your server Endurance testing: Load test over an extended period of time Check with your hosting provider

See regex example to remove tags and space

2. Adding target, alt and title to a href

How do you add to a url string elements such as target, alt and title?

Lets begin by setting out the url string that we will work with:

<a href="https://www.codebales.com/regex-expressions-a-working-sheet">Regex examples sheet</a>

What is the regex expression going to be used for this?

/(<a\b[^<>]*href=['"]?http[^<>]+)>/gi

expression	action
/(<a\b[^<>]href=['"]?http[^<>]+)>/gi applied using a preg_replace preg_replace('/(<a\b[^<>]href=['"]?http[^<>]+)>/gi', '<a $1 target="_blank" alt="' . $alt . '" title="' . $alt . '">', $url)	(<a\b[^<>]href=['"]?http[^<>]+) <a matches the characters <a literally (case insensitive) href= matches the characters href= literally (case insensitive) [^<>] ~ * Quantifier — Matches as many times as possible ['"]? ~ ? Quantifier — Matches as many times as possible [^<>]+ ~ + Quantifier — Matches as many times as possible > matches the character > literally (case insensitive)

expression

action

/(<a\b[^<>]*href=['"]?http[^<>]+)>/gi

applied using a preg_replace

preg_replace('/(<a\b[^<>]*href=['"]?http[^<>]+)>/gi', '<a $1 target="_blank" alt="' . $alt . '" title="' . $alt . '">', $url)

(<a\b[^<>]*href=['"]?http[^<>]+)
<a matches the characters <a literally (case insensitive)
href= matches the characters href= literally (case insensitive)
[^<>]* ~ * Quantifier — Matches as many times as possible
['"]? ~ ? Quantifier — Matches as many times as possible
[^<>]+ ~ + Quantifier — Matches as many times as possible
> matches the character > literally (case insensitive)

Using the above regex, the outcome is as follows...

Based on the following variable definitions:

$alt = "Regex examples sheet"
$url = <a href="https://www.codebales.com/regex-expressions-a-working-sheet">Regex examples sheet</a>

<a href="https://www.codebales.com/regex-expressions-a-working-sheet" target="_blank" alt="Regex examples sheet" title="Regex examples sheet">Regex examples sheet</a>

see regex example add elements to url

3. Obfuscating an email

I wanted to partially hide some of the user's email. By way of example, changing the email

sarah@example.com

s****@e*******.c**

To achieve this, the regex expression that can be used is?

(?<![^\w])(?<=...)[\w]/gi

expression	action
(?<![^\w])(?<=...)[\w]/gi	Negative Lookbehind (?<![^\d\w]) [^\w] – \w matches any word character (equal to [a-zA-Z0-9_]) Positive Lookbehind (?<=...) Assert that the Regex below matches . matches any character (except for line terminators) Match a single character present in the list below

expression

action

(?<![^\w])(?<=...)[\w]/gi

Negative Lookbehind

(?<![^\d\w])

[^\w] – \w matches any word character (equal to [a-zA-Z0-9_])

Positive Lookbehind

(?<=...)

Assert that the Regex below matches
. matches any character (except for line terminators)
Match a single character present in the list below

see regex example add elements to url

Resources

Regex 101 (https://regex101.com/) – A fantastic playground for testing and experimenting with your expressions

Andrew Fletcher • 16 Jan 2025

get IP address from terminal OSX

Terminal

When troubleshooting network issues or configuring devices, knowing your IP address can be essential. Whether you're connected via Wi-Fi, Ethernet, or tethering through a mobile provider, macOS offers powerful built-in tools to quickly identify your IP address. Here's a practical guide tailored to...

Andrew Fletcher • 07 Jan 2025

Managing DDEV environment troubleshooting and setting up multiple Drupal projects

DDEV has become a popular tool for local web development, offering a streamlined approach to managing Docker-based environments. However, setting up and managing DDEV projects, particularly with the latest versions of Docker Desktop, can present challenges. This article guides you through resolving...

Andrew Fletcher • 28 Dec 2024

Optimising file transfers by improving efficiency from cp to rsync

Transferring files between development and production environments is a critical task in the deployment process. However, I continue to come across multiple approaches that scale from awesome automation using pipelines to the basic of direct command line entry. Where the basic approaches rely on...

The basic anchors - ^ and $

The basic quantifiers — * + ? and {}

The basic OR operators - | and []

The basic character classes - \d \w \s . \.

Bracket expressions[]

Word boundaries

Tokens

References

Examples

More examples

1. Removing tags and &nbsp;

2. Adding target, alt and title to a href

3. Obfuscating an email

Resources

Related articles

1. Removing tags and