Adam Warski

3 Oct 2007

UTF-8 in JBoss/Tomcat + MySQL + Hibernate + JavaMail

java
jboss

While most of (web)applications communicate with the end user in English, a lot of them use native languages, which often have some special characters (not to look too far for an example, we have the Polish alphabet, with ą, ę, ś, etc). A widely accepted standard for coding such characters is UTF-8. However, it is not quite trivial to use the UTF-8 encoding in a Tomcat+MySQL+Hibernate+JavaMail combination, and have full UTF-8 support, in the database, web forms, jsp-s and e-mails.

Part I. Preliminaries

On every request, you have to set the encoding of characters manually; it is best to create a filter, with the following body:

public void doFilter(ServletRequest request,
ServletResponse response, FilterChain chain)
throws IOException, ServletException {
response.setCharacterEncoding("UTF-8");
request.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
}

This is needed by almost all successive parts.

Part II. JSPs

If you want to display native characters on a JSP page, you have to:

  • at the top of the page, add <%@page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
  • in the head section, add <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  • and, of course, you have to edit the .jsp file using the UTF-8 encoding (to set this in Eclipse, right-click a project, go to “Resource” tab, and set the “Text file encoding” value to “UTF-8”)

Part III. Java Strings

It may also be the case, that you have some strings in your code, that contain native characters, and, for example, you would like to pass them to a .jsp page using request.setAttribute(String, String) or send them as an e-mail subject/body. To have them properly handled:

  • set the encoding of the java source files to UTF-8 (just as with .jsp files) in your favorite editor
  • compile the sources using the -encoding UTF-8 option

Part IV. Forms

After displaying native characters, you may want to have some forms, where users can input text values using native characters. To have them properly handled by Tomcat, you need to edit the server.xml file, which is located:

  • in JBoss 4.0.x: $JBOSS_HOME/server/ <conf> /deploy /jbossweb-tomcat55.sar/server.xml
  • in JBoss 4.2: $JBOSS_HOME/server/ <conf> /deploy /jboss-web.deployer/server.xml

and add to the appropriate <Connector ...> (usually the first one) the following attribute: URIEncoding="UTF-8".

Part V. MySQL and Hibernate

Storing strings in a database in UTF-8 is a bit more tricky. First of all, you have to tell MySQL that your varchar/text fields will be using UTF-8.

If you already have a database, or if your database was created by hibernate (using hibernate.hbm2ddl.auto), you will have to run this statement for each column:

ALTER TABLE `<database>`.`<table_name>` MODIFY COLUMN `<column_name>` VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci;

(MySQL Administator can help you with that).

If you are creating a database, you can set a default encoding for all text fields:

CREATE TABLE `<database>`.`<table_name>` (<column_list>) DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

There are other possibilities as well, for example compiling mysql with UTF-8 support set as default. For the complete list of options, see here.

But configuring your database is not all; you also have to tell hibernate that in your connection to MySQL, you will be using the UTF-8 encoding. To do this:

  • if you are using a data source, to the connection URL add the parameters as in this example: <connection-url>jdbc:mysql://localhost:3306/<br /> <my_database>?useUnicode=true&characterEncoding=UTF-8<br /> </connection-url>
  • if you are using EJB3/JPA, add to persistence.xml the following properties (in the appropriate <persistence-unit>):
    <br /> <property name="hibernate.connection.useUnicode"<br /> value="true" /><br /> <property name="hibernate.connection.characterEncoding"<br /> value="UTF-8" />
  • in case of “plain” hibernate, just specify the above properties in your configuration file (hibernate.properties or hibernate.cfg.xml)

Part VI. Java Mail

Finally comes the easiest part: sending e-mails with the subject and body in UTF-8. The only things you have to do here is use MimeMessage, and give additional parameters when setting the subject and text of your message:

(...)
MimeMessage msg = new MimeMessage(session);
msg.setFrom(InternetAddress.parse(from, false)[0]);
msg.setSentDate(new Date());
msg.setRecipients(Message.RecipientType.TO, InternetAddress.parse(to, false));
msg.setSubject(subject, "UTF-8");
msg.setText(body, "UTF-8");
transport.sendMessage(msg, msg.getAllRecipients());

Do you know any other areas of Java which you have to configure to have full support for UTF8?

Thanks to Tomek Szymański for helping me in finding the above information.

comments powered by Disqus

Any questions?

Can’t find the answer you’re looking for?