Java String Handling RegEx

Java split String by words example

Java split String by words example shows how to split string into words in Java. Example also shows how to break string sentence into words using split method.

How to split String by words?

Below given simple code will break String sentence into words.


As you can see from the output, it worked for the test sentence string. The sentence is broken down into words by splitting it using space.

Let’s try some other not-so-simple sentences.


As you can see from the output, our code did not work as expected. The reason being is simple split by space is not enough to separate words from string. Sentences may be separated by punctuation marks like dot, comma, question marks etc.

In order to make the code handle all these punctuation and symbols, we will change our regular expression pattern from only space to all the punctuation marks and symbols as given below.


This time we got the output as we wanted. The pattern [ !\"\\#$%&'()*+,-./:;<=>[email protected]\\[\\]^_`{|}~]+ includes almost all the punctuation and symbols that can be used in a sentence including space. We applied + at the end to match one or more instances of these to make sure that we do not get any empty words.

Instead of this pattern you can also use \\P{L} pattern to extract words from sentence, where \\P denotes POSIX expression and L denotes character class for word characters. You need to change the line with split method as given below.

Please note that \\P{L} expression works for both ASCII and non-ASCII characters (i.e. accented characters like “café” or “kākā”).