Skip to content

Java RegEx – Extract Text Between XML Tags

Java RegEx – extract text between XML tags example shows how to extract all text between any XML tag using the regular expression pattern in Java.

How to extract all text between XML tags using regex pattern in Java?

Many times we want to process XML content and extract text between any specific XML tag in Java. This can be very easily done using a regex pattern.

I am going to use the below-given XML for the purpose of this example.

In this example, I want to extract the text between <price> and </price> tags. For this, I am going to use the below-given pattern.

Here the “(.*)” means a group of any character zero or more times. Let’s try to extract the text using this pattern.

Output

As you can see from the output, the pattern successfully extracted the text between the price tags. Since the price is defined using the first group, we need to mention the group as 1. The group method without any argument returns the whole match and that would be “<price>700</price>” not just “700”. Refer to the regex matching groups example to learn more about it.

The XML I used had only one book tag, but what if we have multiple book tags? Let’s find out.

Output

As you can see from the output, the pattern extracted all the text from the first opening <price> tags till the last closing </price> tag. It happened because the “.*” expression is greedy in nature. It tries to capture as many characters as possible.

To fix this, we need to use the reluctant “.*?” expression instead of the greedy “.*” expression. We also need to change the if statement with the while loop to find multiple matches since we now have two book tags.

Output

If you want to learn more about regular expression, please visit the Java regex tutorial.

Please let me know your views in the comments section below.

About the author

Leave a Reply

Your email address will not be published.