Java RegEx – Match Word Boundary example shows how to match a word boundary in Java regular expression. The example also shows how to use the word boundary matcher \b properly.
How to match a word boundary in Java regular expressions?
Let’s first see an example of why it is so important to learn boundary matching in regex. The below-given code tries to find the word “at” in the source string.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
package com.javacodeexamples.regex; import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegExWordBoundaryMatchingExample { public static void main(String[] args) { String sourceString = "There is a cat at station eating kit-kat"; Pattern p = Pattern.compile("at"); Matcher m = p.matcher(sourceString); while( m.find() ) { System.out.println("Matched \"" + m.group() + "\""); System.out.println("Starting from index " + m.start() + " to index " + (m.end() - 1)); } } } |
Output
1 2 3 4 5 6 7 8 9 10 |
Matched "at" Starting from index 12 to index 13 Matched "at" Starting from index 15 to index 16 Matched "at" Starting from index 20 to index 21 Matched "at" Starting from index 27 to index 28 Matched "at" Starting from index 38 to index 39 |
As you can see from the output, what we wanted to do was to search the word “at” in the source string. The code not only found the exact word “at” in the string, but it also matched the substring “at” inside the word “cat” and others (partial matches).
In order to find the exact match and ignore all the partial matches, we need to use the word boundaries. In regex, it is denoted by “\b”. By using the “\b” on both sides of the pattern, we can get rid of the matches inside other words as given below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
package com.javacodeexamples.regex; import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegExWordBoundaryMatchingExample { public static void main(String[] args) { String sourceString = "There is a cat at station eating kit-kat"; Pattern p = Pattern.compile("\\bat\\b"); Matcher m = p.matcher(sourceString); while( m.find() ) { System.out.println("Matched \"" + m.group() + "\""); System.out.println("Starting from index " + m.start() + " to index " + (m.end() - 1)); } } } |
Output
1 2 |
Matched "at" Starting from index 15 to index 16 |
As you can see from the output, the code worked as intended this time and found the exact word we were looking for.
The regex “\b” is essentially a zero-width assertion. It matches at the start of the string, at the end of the string, and in between two characters where one character is a word character (i.e. \w in regex) and the other is not (i.e. \W in regex).
Visit Java regex tutorial to learn more about regex.
Please let me know your views in the comments section below.