Skip to content

Jsoup set character encoding example

Jsoup set character encoding example shows how to set character encoding while using Jsoup. The example also shows how to set the character encoding to ISO-8859-1 or UTF-8.

How to set character encoding using Jsoup?

Jsoup automatically detects the charset for the webpage being crawled. However, many of the websites do not set character set encoding along with the content-type header by not defining charset. If you crawl such a webpage, Jsoup parses the page using the platform’s default character set.

That also means that you might not get expected results as the platform’s default character set might be different from the webpage you are crawling. It might result in the loss of characters or them being parsed/printed incorrectly.

How to set character encoding (charset) if the response does not specify it?

You can get the stream from the connection and set your desired character set using the InputStream class and the parse method of the Jsoup as given below.

Please also make sure that you set the proper user agent and referer headers.

This example is a part of the Jsoup tutorial with examples.

Please let me know your views in the comments section below.

About the author

Leave a Reply

Your email address will not be published.