Java Regular Expression Tutorial with Examples (RegEx)

Java regular expression tutorial with examples (regex) will help you understand how to use the regular expressions in Java. Java regular expressions sometimes also called Java regex and it is a powerful way to find, match, and extract data from character sequence.

There are two main classes in the java.util.regex package namely Pattern and Matcher class. The Pattern represents a compiled regular expression while the Matcher is an engine that matches character sequence with the pattern. If the pattern has a syntax error, the PatternSyntaxException is thrown to indicate that.

Java Regular Expressions RegEx

The Pattern class

The Pattern represents a compiled regular expression. The string containing regular expression must be compiled to the instance of the Pattern class. Once we have the instance of the Pattern class, we can then create a Matcher object to match the character sequence against this pattern.

How to compile a regular expression pattern using the compile method?

The static compile method of the Pattern class compiles the given string regular expression into a pattern.

The above code compiles a regular expression into a pattern.

There is an overloaded compile method which accepts special flags along with the regex.

Here are some of the important static int fields defined by the Pattern class which can be mentioned as the flags.

Pattern.UNIX_LINESThis flag enables Unix line mode where only ‘\n’ is recognized as a line terminator.
Pattern.CASE_INSENSITIVEThe default pattern matching is case sensitive. This flag enables case insensitive matching.
Pattern.COMMENTSThis mode allows white spaces and comments in patterns.
Pattern.MULTILINEThis mode enables multiline mode.
Pattern.LITERALThis mode enables literal parsing of the pattern. Any meta characters or escape sequences will be treated as literal characters and will lose the special meanings.
Pattern.DOTALLThis mode enables dotall mode where . (dot) character matches any character including the line terminator. By default, the line terminator is not matched.

The Pattern flags method returns this pattern’s match flags.

How to get the string pattern used to create the Pattern object?

The Pattern pattern method returns the string containing the regular expression which is used to create this Pattern object. You can also use the toString method to get the string representing the regular expression which is used to create this Pattern object.

How to create a matcher from the pattern?

The Pattern matcher method creates and returns a Matcher object that will match the specified character sequence against this pattern.

Output

How to match a pattern in a literal way?

Consider below given example.

Output

The string “|” did not match the pattern “|” even though it should. It is because there are several characters in the regular expressions that have special meanings. These characters are called meta characters. The pipe character (|) is a meta character and means OR in regular expression which is why it did not match.

If you want to match the pattern literally you can use the quote method of the Pattern class. It returns a literal pattern string from the specified regular expression string.

Output

How to split a string using the split method?

The split method splits the given character sequence around the matches of this pattern. It returns an array of the String containing the parts.

Output

Please visit the full how to split a string example to know more.

The Matcher class

As we have seen earlier, a matcher object can be created using the matcher method of the Pattern class. Once created, the matcher object can be used to match the input string with the pattern.

 

 

Understanding the Regular Expression Patterns

In the first two parts, I have shown you how to create a Pattern and obtain a matcher object to do various types of matching. In this part, we will dive deep into how to create different types of patterns to find and extract the data from the input string.

How to match a string literal?

The most basic type of matching is matching a string with another string like given below.

Remember that the matches method matches the whole string against the pattern so below given pattern will not match.

What are the metacharacters?

The metacharacters are the characters with a special meaning in the regular expressions. Below given are the metacharacters supported by Java regex API.

^Matches the start of the input string
$Matches the end of the input string
.Matches any character except for the new line character
*Matches the previous character zero or more times
+Matches the previous character one or more times
?Matches the previous character zero or one time
()Match grouping
[]Defines a character class
\Used to escape metacharacters to make it a literal character for matching
|OR operator
{}The number of times the previous pattern needs to be matched.
(?=pattern)Positive look ahead matching
(?!pattern)Negative look ahead matching
(?<=pattern)Positive look behind matching
(?<!pattern)Negative look behind matching

How to match at the start and end of the input string?

The ^ metacharacter matches at the start of the input string while the $ metacharacter matches at the end of the input string as given below.

How to match any number of characters?

There are 4 metacharacters that control how many characters need to be matched and how many times. These metacharacters are ., +, ? and *. The dot metacharacter matches any character in the input string.

The ? metacharacter matches the previous character zero or one time.

The + metacharacter matches the previous character one or more times.

The * metacharacter matches the previous character zero or more times.

You can also combine . with ?, + and * metacharacters to match any character given number of times as given below.

How to match using the OR (|) operator?

The | (pipe) metacharacter is used to denote the OR condition in the regular expressions.

How to escape metacharacters in the regular expression?

The double backslash (\) is used to escape the metacharacters in the regular expression to match them as the regular characters.

How to control the number of times a pattern should match?

You can use the {} to control how many times a pattern should match with a given input string.

{x}The previous character or a pattern should match exactly x number of times.
{x,}The previous pattern should match at least x number of times or minimum x number of times.
{x,y}The previous character or a pattern should match a minimum of x number of times and the maximum of y number of times.

Tip: You cannot skip the first part before the comma (,) while mentioning the range. If you do, PatternSyntaxException will be thrown. For example, {,4} will be an invalid pattern. If you just want the specify the maximum number of matches, you can specify the minimum as 0 as given in the below example.

I will explain the positive and negative look ahead and look behind patterns at the end of this tutorial.

What are the character classes?

You can define a character class using the square brackets [ and ]. You can use any of the below given syntaxes to define a character class as per your requirement.

[123]Matches 1, 2, or 3
[^123]Matches any character except 1, 2, or 3
[0-9]Matches any digit in the range of 0 to 9.
[a-z]Matches any character between a to z.
[a-zA-Z]Matches any character between a to z or A to Z.
[a-eF-H]Matches any character between a to e or F to H.
[1-3[5-8]]Matches any digit between 1 to 3 or 5 to 8. Same as above.
[a-e&&[bc]]Matches only b or c character.
[a-e&&[^bc]]Matches any character between a to e except for b and c.
[a-z&&[^b-e]]Matches any character between a to z except for character between b to e.

Putting it all together

Now you have got a basic understanding of how the regular expression works. Let’s create a little bit complex pattern using the knowledge we have gained till now. The example I want to cover is to validate date syntax in dd-mm-yyyy format.

The first part of the pattern is dd i.e. day part. The day of the month should be two digits and between 1 to 31. Breaking it up will give us the days from 01 to 09, 10 to 19, 20 to 29, 30, and 31. So the pattern will be “0[1-9]|[12][0-9]|3[01]”. The whole pattern means 0 followed by any digits between 1 to 9 OR 1 or 2 followed by any digits between 0 to 9 OR 3 followed by either 0 or 1. It should cover all the days between 1 to 31. Let’s test this part.

The second part is a month part, which should be two digits and must be between 1 to 12. So the pattern will be “0[1-9]|1[012]” means 0 followed by any digit between 1 to 9 (to cover months between 1 to 9) OR 1 followed by 0, 1 or 2 (to cover months between 10 to 12). Let’s test it.

Now let’s see the year part. The year could be anything but it must be exactly 4 digits long which is fairly easy to convert to a pattern. The pattern to check the year will be “[0-9]{4}”. Lets test it.

Now let’s put them all together in one pattern to validate date syntax in the “dd-mm-yyyy” format. The whole pattern will be like “(0[1-9]|[12][0-9]|3[01])-(0[1-9]|1[012])-([0-9]{4})”. We just clubbed all the individual parts using groups and put a “-” between them. Let’s try out the whole pattern against the example dates.

Please note that the regex can only validate the date syntax, not the actual date. For example, our pattern will say valid date to “31-02-2004” while it is not (because February does not have 31 days). Please visit how to validate a date example to understand the right way.

 

 

About the author

RahimV

RahimV

My name is RahimV and I have over 16 years of experience in designing and developing Java applications. Over the years I have worked with many fortune 500 companies as an eCommerce Architect. My goal is to provide high quality but simple to understand Java tutorials and examples for free. If you like my website, follow me on Facebook and Twitter.

Add Comment

Your email address will not be published. Required fields are marked *