Regex Tutorial [Part – 4]

In my previous post I gave an example of .* and .+ and said that they are greedy in nature. So a question arises, what do you mean when you say these regexes are greedy? In regex context, greedy means that the regex matches as much as possible. As I shown an example previously where ".+" regex on string 12"abc"def"ghi returns "abc"def". This shows the greedy nature of .+ where it tries to match as much as possible.

Why/How does this happen?

When we are using .+ or .*, it consumes the entire string from current position of matching to the end of string, and from there it starts backtracking, trying to match the character(s) just after * or + (in the above example "). This backtracking , usgoes on till the condition of matching character(s) after " is satisfied or there is failure.


Usually beginners use regex like ".+" (assuming) to match "abc" in string 12"abc"def"ghi but end up matching the string  "abc"def" and then arises the famous question :-

My regex consumes too much. How to make it non-greedy or lazy?

Answer 1. Use ? to make the regex lazy. For the above example string, we can use


Answer 2. Use negated character class approach. For the above example string, we can use


Final Words

Negated character class takes less step than the ? solution. I will explain the reason later in another blog.

Examples for Regex Tutorial [Part – 2]


You can use any programming language of your choice. You just have to take into consideration the syntax it uses. I will be using python or PHP or Java etc. here.

Q1. regex to check whether the string abc is in the starting of any string.



Q2. regex to check whether the string abc is in the ending of any string.



Q3. regex to check whether a string abc matches exactly with other string.



Q4. regex to match anything between double quotes.

Ans. If we assume that there can be 0 character  within double quotes we can use *


If we assume that there will be at least one character  within double quotes we can use +


NOTE :- .* and .+ are greedy in nature. They consume as much as possible. In string 12"abc"def"ghi, it will return "abc"def"

Regex Tutorial [Part – 3]

In my previous post, I talked about common metacharacters that are used in regex. In this part I will explain two more things groups and alternation

6. Groups

6.1 Capturing groups

Capturing groups are represented using (..) in regex. Everything inside a group succeeds or fails as a single unit. Groups are one of the most handy features in regex. Properties of groups

  • Groups are a way to remember anything that is matched.
  • These matched groups can be backreferenced later.

Groups are a way to remember anything that is matched and can be backreferenced later

From the knowledge of Finite Automata, we know that both Non-deterministic finite automata (NFAs) and Deterministic finite automata (DFAs) are unable to remember anything. In simple words, they do not have any memory to store the data. So, this feature is something that depends upon programming language you are using and is not regular expression specific.

Anything written inside () is remembered in a special variable like \1, \2 etc.. These variables can be understood as memory place to store some value. \1 stores the content of the first capturing group, \2 stores the content of second capturing group and so on. e.g. for regex


means store the matched string abcde in first capturing group i.e. \1 and de in second capturing group i.e. \2. These capturing groups can be used to backreference when required.

The ordering of group is done from left to right.


In the above the first capturing group is entire content, second capturing group is abcde third capturing group is de, fourth capturing group is fg , fifth capturing group is hijkl and sixth capturing group is jkl.

Coming back to example that I shown in my first post to find the first repeating character, I used


For string



  • Match a and store it in first capturing group \1.Now, try matching the stored content with the next \1
  • If the match fails, backtrack one character and then repeat the above step until match is found or we reach end of string.

A picture is worth of thousand words [regex101]


6.2 Non capturing groups

Suppose you do not want to remember any particular  content. In that case you can use non capturing groups.It is written within (?:..). It is used mainly where you want to repeat a whole chunk of content but you do not want to store it. The non capturing groups are not assigned to any variable. Non capturing groups have advantage that they are faster because they do not need to remember anything.

7. Alternation

Alternation is OR condition that we use in any programming language. It is represented as |. Suppose you want to match cat or dog, then you can use


In the previous post, I discussed about character class which is simple form of alternation meant only for characters. Character class cannot be used for or condition of words having any word which is of greater than one length.


In this part we have discussed groups and alternation. Next we will see some examples based on what we have read so far.

Regex Tutorial [Part – 2]

In my previous post, I discussed about brief history and usefulness of regex. Its time to move on and understand some of regex basics.

Before proceeding lets see the definition of metacharacter.

Metacharacters :- Characters that have special meaning in regex are known as metacharacters.

1. Starting and ending of string

Suppose, you want to find a match the position at starting of string or in the end of string. regex provides ^ and $ metachcarcters which indicates starting of string and ending of string respectively. These metacharacters are anchors and are of zero-width, meaning that they do not actually consume any character(s).

2. Match any character

. metacharacter in regex allows to match any character in the string except new line, carriage return etc.

3. Quantifiers

As the name suggests, we can think that it is something related to counting. There are 4 types of quantifiers supported in regex.

NOTE :- Meaning of group will be explained later

  • ? – It means match the previous character/group but is optional. e.g. ba?c means match b followed by a (optional) followed by c. So the regex can match the string bac as well as bc because a is optional.
  • * – It means match zero or more character/group (maximum possible). e.g. regex a* can match the string a, aa , aaaaaaa etc. as well as it can match an empty string or string with no a because it is happy to match zero character.
  • + – It means match one or more character/group (maximum possible). e.g. regex a+ can match the string a, aa , aaaaaaa etc. but not an empty string unlike a*.
  • {min, max} – It means match character/group at least min times and atmost max times. Depending upon the requirement the interval can be open like {min,} means match at least min times but the open interval cannot be {,max} (rather we can write it simply as {max} times without any need of interval).

4. Character Class and similars

4.1 Character Class

Character class is denoted by []. Content inside a character class is treated as single character separately. e.g. suppose we use


In the example above, it means match 1 or 2 or 3 or 4 or 5 . In simple words it can be understood as or condition for single characters.

Word of caution

  • In character class, there is no concept of matching a string. So, if you are using character class [cat], it does not mean that it should match the word cat literally but it means that it should match either c or a or t. This is a very common misunderstanding existing among people who are newer to regex.
  • Sometimes people use | (alternation) inside character class thinking it will act as OR condition which is wrong e.g. using [a|b] actually means match a or | (literally) or b.

4.2 Range in character class

Range in character class is denoted using - sign. Suppose we want to find any character within English alphabets A to Z. This can be done by using the following character class


This could be done for any valid ASCII or unicode range. Most commonly used ranges include [a-z]or [0-9]. Moreover these ranges can be combined in character class as


This means that match any character in the range A to Z or a to z or 0 to 9. The ordering can be anything. So the above is equivalent to [a-zA-Z0-9] as long as the range you define is correct.

Word of caution

  • Sometimes when writing ranges for A to Z people write it mistakenly as [A-z]. This is wrong because we are using small z instead of capital Z. So this denotes match any character from ASCII range 65 (of A) to 122 (of z) which includes many unintended character after ASCII range 90 (of Z).
  • Meaning of - inside character class is special. It denotes range as explained above.What if we want to match - character literally? We can’t put it anywhere otherwise it will start denoting ranges. In that case we have to put - in starting of character class (like[-A-Z]) or in end of character class (like[A-Z-]) or escape it if you want to use it in middle (like [A-Z\-a-z]).

4.3 Negated character class

Negated character class is denoted by [^..]. The caret sign ^ denotes match any character except the one present in character class. e.g.


means match any character except c or a or t.

Word of caution

The meaning of caret sign(^) maps to negation only if its in the starting of character class. If its anywhere else in character class it is treated as simple caret character without any special meaning.

5. The Great Escape

We talked about many metacharacters previously. The list of metacharacters that we have talked till now are ^,$, ?,*,+,{,},[ and ]. Now comes the question, what if we want to match these characters literally. To match these characters literally, we just need to escape it using \. So if we want to match $ we can use \$. Similarly, we can use everything for other metacharacters.

In simple words if we want to match any metacharacter, then we need to escape it.


In this part, I explained the basic matacharacters which are commonly used in regex. In next part, I will explain the concept of groups and alternation.

Regex Tutorial [Part – 1]

What is regex and what is its essence?

Regular Expression (also known as regex , and we will call it this only from now on) is one of the most powerful tool that can be used to manipulate text. If you know regex, you can do many ultimate things (that may otherwise require a lot of effort) and make yourself look cool. Text manipulation that would otherwise require multiple lines of normal code can be done in single line using regex.

At first sight, regex seems to be cryptic but once you understand its concept, you will appreciate its beauty, simplicity and power and trust me you will fall in love with it.

A brief History

Perl was one of the first programming language that included the support for regex and it continues to do so. Perl is having the most powerful regex engine among all programming languages. Later on people from different communities of programming languages realized the essence of regular expressions and started developing regex engines for their own languages. Languages that adapt/use functionality from regex engine of Perl are known as PCRE (read :- Perl Compatible Regular Expression). The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. These languages include PHP, Java, Python and many others.

One (of many) example where regex is handy

I don’t expect you to understand what I am going to tell (if you are complete beginner). It is just for demonstrating the usefulness of regex and believe me this is only a basic thing that can be done using regex. There are much advanced use of regex. You will understand these as time progresses.


str = "abcdddef";
Output (Find the first repeating character. d in above example)
for i in range(0, len(str) - 2) {
    if (arr[i] == arr[i + 1]) {
       return arr[i];
Regex Solution

Using above regex and some functions provided by regex library of the programming language, you can find the desired result.


I believe regular expression is such a tool that every programmer should know of. This part was just an introduction about history of regex and places where regex can be used. We will see regex in action from next part.

Useful Resource

[1] regex101 :- For visualization of regex