Jsoup clean HTML example

Jsoup clean HTML example shows how to clean HTML using Jsoup. The example also shows how to remove HTML tags from String and retain specific tags using whitelist while cleaning the HTML using Jsoup.

How to remove HTML tags by cleaning the HTML using Jsoup?

You can remove HTML tags from String using the clean method of the Jsoup.

This method removes all HTML tags from the HTML string while retaining the tags included in the specified whitelist. By default, Jsoup provides below given whitelists out of the box.

1) none
All HTML tags are removed except for the text nodes.

2) simpleText
This whitelist allows only text formatting HTML tags b, em, i, strong and u. All other tags are removed.

3) basic
Basic whitelist allows a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, span, strike, strong, sub, sup, u, ul tags. All other tags are removed. It does not allow images.

4) basicWithImages
As the name suggests, this whitelist allows all tags included in basic whitelist plus image (img tag).

5) relaxed
This is most accommodating whitelist which allows a, b, blockquote, br, caption, cite, code, col, colgroup, dd, div, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, span, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul tags.

How to clean HTML using whitelist?

Create appropriate whitelist object and use it along with the clean method to clean the HTML and retain tags specified in the whitelist as given below.


How to retain specific tags while cleaning the HTML document?

Default whitelists come with pre-configured tags. What if you want to retain particular tags only and remove all other HTML tags? Whitelist provides addTags method using which you can add as many tags as you want to retain them as given below.

This method adds HTML tags to the whitelist.

The below example shows how to retain only <div> tags and remove all other HTML tags from the HTML String.


This example is a part of the Jsoup tutorial with examples.

Please let me know your views in the comments section below.

About the author



My name is RahimV and I have over 16 years of experience in designing and developing Java applications. Over the years I have worked with many fortune 500 companies as an eCommerce Architect. My goal is to provide high quality but simple to understand Java tutorials and examples for free. If you like my website, follow me on Facebook and Twitter.

1 Comment

  • Hi,
    Is there a solution to remove elements in a given context : bold in bold for example ?

    Example : if I have :
    <b>text <b>1</b><b> text</b> <b>2</b></b>

    The result after cleaning should be :
    <b>text 1 text 2</b>

Your email address will not be published. Required fields are marked *

Online Shopping