Skip to content

Remove non ascii characters from String in Java example

This example shows how to remove non ascii characters from String in Java using various regular expression patterns and string replaceAll method.

How to remove non ascii characters from String in Java?

Many times you want to remove non ascii characters from the string. Consider below given string containing the non ascii characters.

To remove them, we are going to use the “[^\\x00-\\x7F]” regular expression pattern where,

So our pattern “[^\\x00-\\x7F]” means “not in 0 to 127” which is the range of the ASCII characters. Here is the example program using this pattern.

Output

Alternatively, you can also use the “\\P{InBasic_Latin}” pattern as given below.

Output

How to replace non ascii characters with the ASCII equivalent character?

What if you want to replace “ä” with “a” instead of removing it? You can do that by normalizing the string first and then replace the characters as given below.

Output

Alternatively, you can also use the “[^\\p{ASCII}]” pattern as given below.

Output

If the text is in Unicode format, the “[\\p{M}]” pattern should be used instead of the “[^\\p{ASCII}]” pattern as given below.

Output

In a regular expression, the “\\p{M}” pattern matches the accent while the “\\P{M}” pattern matches the glyph of a Unicode character.

Finally, if you are using the Apache Commons library, you can use the stripAccents method of the StringUtils class to remove accents from the Unicode characters as given below.

Output

How to remove only non-printable characters?

If you want to keep only printable characters and remove all the non-printable characters from the string you can use below given code.

Please note that above code also removes \t (tab), \n (new line) and \r (carriage return) characters as well.

This example is a part of the Java String tutorial and Java RegEx tutorial.

Please let me know your views in the comments section below.

About the author

4 comments

    1. Hello Shital,

      In that case, I believe you need to replace them individually. You can still use the regular expression, but instead of specifying the range, you need to provide the exact characters you want to replace.
      I hope it helps.

      Thanks.

Leave a Reply

Your email address will not be published.