External Libraries

Jsoup clean HTML example

Jsoup clean HTML example shows how to clean HTML using Jsoup. Example also shows how to remove HTML tags from String and retain specific tags using whitelist while cleaning the HTML using Jsoup.

How to remove HTML tags by cleaning the HTML using Jsoup?

You can remove HTML tags from String using clean method of Jsoup.

This method removes all HTML tags from the HTML string while retaining the tags included in the specified whitelist. By default, Jsoup provides below given whitelists out of the box.

1) none
All HTML tags are removed except for the text nodes.

2) simpleText
This whitelist allows only text formatting HTML tags b, em, i, strong and u. All other tags are removed.

3) basic
Basic whitelist allows a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, span, strike, strong, sub, sup, u, ul tags. All other tags are removed. It does not allow images.

4) basicWithImages
As the name suggests, this whitelist allows all tags included in basic whitelist plus image (img tag).

5) relaxed
This is most accommodating whitelist which allows a, b, blockquote, br, caption, cite, code, col, colgroup, dd, div, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, span, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul tags.

How to clean HTML using whitelist?

Create appropriate whitelist object and use it along with the clean method to clean the HTML and retain tags specified in the whitelist as given below.

Output

How to retain specific tags while cleaning the HTML document?

Default whitelists come with preconfigured tags. What if you want to retain particular tags only and remove all other HTML tags? Whitelist provides addTags method using which you can add as many tags as you want to retain them as given below.

This method adds HTML tags to the whitelist.

Below example shows how to retain only <div> tags and remove all other HTML tags from the HTML String.

Output

Visit Jsoup page for more examples. Please let us know your views in the comments section below.

Want to learn quickly?
Try one of the many quizzes. I promise you will not be disappointed.

Tags

About the author

rahimv

rahimv

rahimv has over 15 years of experience in designing and developing Java applications. His areas of expertise are J2EE and eCommerce. If you like the website, follow him on Facebook, Twitter or Google Plus.

Add Comment

Your email address will not be published. Required fields are marked *