External Libraries

Jsoup set character encoding example

Jsoup set character encoding example shows how to set character encoding while using Jsoup. Example also shows how to set character encoding to ISO-8859-1 or UTF-8.

How to set character encoding using Jsoup?

Jsoup automatically detects the charset for the webpage being crawled. However, many of the websites do not set character set encoding along with the content-type header by not defining charset. If you crawl such webpage, Jsoup parses the page using platform’s default character set. That also means that you might not get expected results as the platform’s default character set might be different from the webpage you are crawling. It might result in loss of characters or them being parsed/printed incorrectly.

How to set character encoding (charset) if response does not specify it?

You can get the stream from the connection and set your desired character set using InputStream and parse method of Jsoup as given below.

Please let us know your views in the comments section below.

Tags
Join 1000+ fellow learners! Enter your email address below: