External Libraries

Jsoup set character encoding example

Jsoup set character encoding example shows how to set character encoding while using Jsoup. Example also shows how to set character encoding to ISO-8859-1 or UTF-8.

How to set character encoding using Jsoup?

Jsoup automatically detects the charset for the webpage being crawled. However, many of the websites do not set character set encoding along with the content-type header by not defining charset. If you crawl such webpage, Jsoup parses the page using platform’s default character set. That also means that you might not get expected results as the platform’s default character set might be different from the webpage you are crawling. It might result in loss of characters or them being parsed/printed incorrectly.

How to set character encoding (charset) if response does not specify it?

You can get the stream from the connection and set your desired character set using InputStream and parse method of Jsoup as given below.

Please let us know your views in the comments section below.

Want to learn quickly?
Try one of the many quizzes. I promise you will not be disappointed.


About the author



rahimv has over 15 years of experience in designing and developing Java applications. His areas of expertise are J2EE and eCommerce. If you like the website, follow him on Facebook, Twitter or Google Plus.

Add Comment

Your email address will not be published. Required fields are marked *