FREE THOUGHT · FREE SOFTWARE · FREE WORLD

Home  »  Htaccess  »  Setting charset in htaccess

by 23 comments

setting charset with htaccessQuestion: How do I use .htaccess directives on an Apache server to serve files with a specific encoding?


.htaccess Charset Answer

You can try these, 1 at a time: Using FilesMatch and Files in htaccess

AddDefaultCharset UTF-8

ForceType 'text/html; charset=UTF-8'


AddDefaultCharset UTF-8

AddCharset UTF-8 .html
AddType 'text/html; charset=UTF-8' html

The method I personally use on all my sites is to use the AddDefaultCharset directive in the web root .htaccess file:

AddDefaultCharset UTF-8

This will add the following header to the server output of your text/html pages. Content-Type: text/html; charset=UTF-8

NOTE: The following meta tag is commonly used to do this same thing, so if you use this .htaccess method you will no longer need to include that meta tag, which is less code and thats always a good thing.

Background

It is important to ensure that any information about character encoding sent by the server is correct, since information in the HTTP header overrides information in the document itself.

Many Apache servers are configured to send files using the ISO-8859-1 (Latin-1) encoding. In the examples in this FAQ, we'll assume that you want to serve your file or files using a different encoding than that specified in the default configuration. (For advice on choosing an encoding see the tutorial Character sets & encodings in XHTML, HTML and CSS.)

The following shows an example of an HTTP header that accompanies a file sent to a user agent. In this case the character encoding information is included in the Content-Type header on the second line from the bottom. Example:

HTTP/1.1 200 OK Date: Wed, 05 Nov 2003 10:46:04 GMT Server: Apache/1.3.28 (Unix) PHP/4.2.3
Content-Location: CSS2-REC.en.html
Cache-Control: max-age=21600 Expires: Wed, 05 Nov 2003 16:46:04 GMT
Last-Modified: Tue, 12 May 1998 22:18:49 GMT
ETag: "3558cac9;36f99e2b" Accept-Ranges: bytes
Content-Length: 10734 Connection: close
Content-Type: text/html; charset=utf-8 Content-Language: en

In the example the Content-Type header expresses both the MIME type of the file and the character encoding. The MIME type describes the format of the file being served. HTML files are typically served as text/html. The character encoding (or 'charset') of this file is UTF-8.

To learn how to view the HTTP header for a file see the article Checking HTTP Headers.

Files on an Apache server may be served with a default character encoding declaration in the HTTP header that conflicts with the actual encoding of the file. The character encoding sent by the server may be the out-of-the-box default, a default set by the system administrator, or a result of implementing various Apache directives. In other cases no character encoding information is sent by the server when it is actually desired.

If the server is set up to allow users or administrators to change information in .htaccess files, these can provide a way to override default settings. This FAQ shows you how.

Answer

There are a couple of different scenarios to bear in mind. In the first instance, you may want to change the default for all the files in a directory with the same extension. Alternatively, you may want to change the default for a single file or small number of files. We will explore these in turn.

In our examples we will assume that the default server configuration serves files as ISO-8859-1, but that you want to serve your file or files using UTF-8 (a very sensible strategy!).

Is this answer relevant to you?

This article is written for content authors, rather than system administrators. Setting the server's default encoding is beyond the scope of this article.

This advice is only relevant if you are happy to declare the character encoding of your document via the HTTP header. In some cases you may not want that.

Note that this FAQ also assumes that your server is set up to use work in .htaccess files on your server. It is also assumed that it is not appropriate to simply change the default configuration of the server. If you are not sure, contact your server administrator.

You should also be aware of the conventions in use on your server for association of character encoding information with extensions. In some cases the server may be set up in the expectation that character encodings are indicated by encoding-specific extensions, eg. example.html.utf8 where it is the .utf8 that needs to be associated with a character encoding, rather than the .html (which may be associated with the file type).

If these approaches fail, you should consult the Apache manuals (see attached links) or your server administrator.

Specifying by extension

Use the AddCharset directive to associate the character encoding with all files having a particular extension in the current directory and its subdirectories. For example, to serve all files with the extension .html as UTF-8, open the .htaccess file in a plain text editor and type the following line:

AddCharset UTF-8 .html

The extension can be specified with or without a leading dot. You can add multiple extensions to the same line. This will still work if you have file names such as example.en.html or example.html.en.

The example will cause all files with the extension .html to be served as UTF-8. The HTTP Content-Type header will contain a line that ends with the 'charset' info as shown in the example that follows.

Content-Type: text/html; charset=UTF-8

Note: All files with this extension in all subdirectories of the current location will also be served as UTF-8. If, for some reason, you need to serve the odd file with a different encoding you will need to override this using additional directives.

Note: You can associate the character encoding with any extension attached to your file. For example, suppose you do language negotiation and you have pages in two languages that follow the model example.en.html and example.ja.html. Let's also suppose that you are happy to serve English pages using your server's ISO-8859-1 default, but want to serve Japanese files in UTF-8. To do this, you can associate the character encoding with the language extension, as follows:

AddCharset UTF-8 .ja

Take note, however, that, if you can, it might be a better solution to change the server default to UTF-8, or serve all files in new directories as UTF-8.

Note: It is also possible to achieve the same result using the AddType directive, although this declares both the character encoding and the MIME type at the same time. The decision as to which is most appropriate will depend in part on how you are using extensions for content negotiation. If you are using different extensions to express the document type and the character encoding, this is less likely to be appropriate.

AddType 'text/html; charset=UTF-8' html

Changing the occasional file

Let's now assume that you want to serve only one file as UTF-8 in a large directory where all the other older files are correctly served as ISO-8859-1. The file you want to serve as UTF-8 is called example.html. Open the .htaccess file in a plain text editor and type the following:

AddCharset UTF-8 .html

What we did here was wrap the directive discussed in the previous section in some markup that identifies the specific file we are concerned with. If you have the need, there is also a slightly different syntax that allows you to specify a number of file names using a regular expression.

Note: It is also possible to achieve the same result using the AddType directive shown above, or, in this case, the ForceType directive, although these declare both the character encoding and the MIME type at the same time.

ForceType 'text/html; charset=UTF-8'

Note: Any files with the same name in a subdirectory of the current location will also be served as UTF-8, unless you create a counter directive in the relevant directory.

More complex scenarios

When two extension rules apply to the same document the order of extensions is important. Thus, in the following example

AddCharset UTF-8 .utf8
AddCharset windows-1252 .html

The file 'example.utf8.html' will be served as "windows-1252" and 'example.html.utf8' as UTF-8.

More Htaccess Charset Resources

Tags

November 27th, 2006

Comments Welcome

  • Binny V A

    Will it work for .php? Will the output - after the PHP file is parsed be set as UTF 8?

  • Brent Lagerman

    great article, but it it didn't work for me using dreamhost for some reason, I'm still getting iso pages instead of utf after taking out my meta tag. Any reasons this could be happening

    brent

  • AskApache

    @ Brent, Binny V A

    No this will not work for PHP. Only static files. The reason is because this is simply telling Apache to handle these files using the handler for them, while php files have their own handler, the php interpreter or cgi.

    So to get the charset correct for php files you need to modify your php.ini like so.

    default_mimetype = "text/html"
    default_charset = "UTF-8"
  • AskApache

    @ Brent

    Another possibility is a small one but maybe its the problem. Apache serves ErrorDocuments by default using iso, the same can be said for Apache-generated directory index pages. Depending on the version of Apache you are using, you can change this.

  • Brent Lagerman

    Thanks so much for the followup

    The problem is that I can't get to php.ini since I'm on a shared host (dreamhost) they run php as cgi.

    I put these lines in the beginning of all the files and it seems to have worked,

    header('Content-type: text/html; charset=UTF-8');

    but it'd be much nicer to use .htaccess or just have it declared once somewhere instead of on every file.

    brent

  • AskApache

    @ Brent

    How in the world do you survive without a custom php.ini? I run dreamhost as well and heres how to get running in 3 minutes. First, login via ssh cd to your websites document root and then run these commands.

    1. mkdir -p cgi-bin
    2. cp -rp /dh/cgi-system/php5.cgi .
    3. cp -rp /etc/php5/cgi/php.ini .

    Now add this to your root .htaccess

    AddHandler php-cgi .php .htm
    Action php-cgi /cgi-bin/php5.cgi

    Ok? Good now you can do everything I talk about.

  • brent

    wow this is huge, thanks!! I didn't think they allowed me to have my own custom php.ini on dreamhost, -- it's not very well publicized

    one question, you mentioned making the cgi-bin directory in my website's document root with this command:

    mkdir -p cgi-bin

    I've already made a cgi-bin directory one directory before my website directory, this is where I have the python script I run to make my google sitemap. Is this where I should be putting the files, or does it go into my website root?

    I remember hearing somewhere that putting certain files in a webspace would be a security risk.

    brent

  • Ram

    Thanks, this was very helpful. The CPAN module Sendmail.pm uses 'text/plain; charset="iso-8859-1"', so overriding that helps our emails with Chinese characters in them be read properly.

  • lavinya

    Hello all. Thanks for this article.

    my php.ini:

    default_mimetype = "text/html"
    default_charset = "utf-8"

    my apache httpd.conf:

    AddDefaultCharset UTF-8

    my page encoding: utf-8 without bom.

    after i check my page server headers result:

    content-type:   text/html; charset=UTF-8, text/html; charset=UTF-8

    Why have two values ???


    after disabled this line to:

    #AddDefaultCharset UTF-8 and
    #default_charset = "utf-8"

    restart my apache:

    refresh my page... wrong character encodings on this my page.... please help me.. (sorry my english not enough)

  • dharmendra

    is there a way to conditionally enable htaccess authentication when browser (http accept language) is like ^zh (Chinese)? something like this -

    SetEnvIf HTTP_ACCEPT_LANGUAGE ^zh.* I_AM_CHINESE
     
    Order allow,deny
    Allow from all
    Deny from I_AM_CHINESE
    AuthName "Authorized Users Only."
    AuthType Basic
    AuthUserFile .htpasswd
    require valid user
    satisfy any

    Any help is appreciated :)

  • Brian Stegner

    I love this site. But i'm having a very rough year, and my server people, Racksopace/Mosso, have their Apache configured by a Windows IIS guy, from the looks of it,. Sending text/html out, disregarding 100% validated XHTML with the only appropriate content-type: application/xhtml+xml and UTF-8, so we end up with ISO-8859-1 and text/html. If i wanted to deal html, I'd write html 4.01 and let their 1994-style defaults have their way.

    But I associated application/xhtml+xml with .html files, and these guys are acting like 'content-negotiation'. is terrorism or something. This site's the best, of all the ones I've seen, but I am not going to redo web sites as PHP, just to feed Explorer some nice 1994 quirks mode bs. So, I am still looking for an htaccess or mod_rewrite idea that can simply deal some nice html to Windows, and let real deal pass through to everybody else.

    I'm about ready to just put a portal up with a click here or click there 'choice' for Win IE here, everybody else over there. if the guys back at CECOM even knew I was thinking of this idea, I'd be mud. Why are all these other guys writing xhtml and then stripping it of any possible benefits to go out as html? That seems irrational. I heard that with xhtml 1.1 that text/xml won't even validate, so what are these xhtml guys planning on next? But the site is terrific, I wish I understood the context, a lot more, of some of these principles and methods.

  • משקפיים

    what about hebrew font?

  • Yosef Boudra

    Hi

    I'm looking for a solution -
    I have an Apache server (shared) and the default charset is utf-8, but when I write things in French the chars are showed as little cubes.

    So when I put in my meta tags I get weird chars showing up because I'm not using UTF.

    So either I'm using UTF and get french cubes or ISO-8859 and get cubes but in both ways i get errors in my text... Please help me

  • yemek tarifleri

    @ Brent

    Another possibility is a small one but maybe its the problem. Apache serves ErrorDocuments by default using iso, the same can be said for Apache-generated directory index pages. Depending on the version of Apache you are using, you can change this.

  • Iverson

    thanks bro

  • bikini

    thank's for the help!

  • Linards

    Hello!
    I have massive problem with these bloody doctypes and charsets. The validator keeps putting out error messages, the code seems to be ok, i tried to fix it for an hour. Can someone tell me what is the problem behind this?
    thanks.

  • Biz Doktor

    Hi - This is very usefull....

    My problem is my blog in /blog likes utc-8 so my .htaccess has

    AddDefaultCharset UTF-8

    But my shop in /shop DO NOT like utc-8 ??
    So all my Danish letters looks wrong! More like Yin Tang letters :(

    Is there a way to avoid utc-8 in /shop

    In advance - Thanks

  • Bonatoc

    Thanks a lot, it helped me setting a UTF-8 charset for a site subfolder only.

  • Jenifer

    Tnx, good article work great for me

  • malkavian

    Thanks, that little addon to the htaccess-file gave me anotgher point of google pagespeed :)

  • R

    I have only one problem with UTF8. BOM. I encoded files in UTF with BOM and some characters where add in source code and even before Doctype. So I chose using UTF without BOM.
    Info for everyone: You can convert files php html etc in Notepad Plus Plus using encoding tab.
    Also you need have database with tables in the same encoding.
    And when using php you need mysql_set_charset to UTF8.


Related Articles


My Online Tools
Popular Articles


Hacking and Hackers

The use of "hacker" to mean "security breaker" is a confusion on the part of the mass media. We hackers refuse to recognize that meaning, and continue using the word to mean someone who loves to program, someone who enjoys playful cleverness, or the combination of the two. See my article, On Hacking.
-- Richard M. Stallman









[hide]

It's very simple - you read the protocol and write the code. -Bill Joy

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License, just credit with a link.
This site is not supported or endorsed by The Apache Software Foundation (ASF). All software and documentation produced by The ASF is licensed. "Apache" is a trademark of The ASF. NCSA HTTPd.
UNIX ® is a registered Trademark of The Open Group. POSIX ® is a registered Trademark of The IEEE.

+Askapache | askapache

Site Map | Contact Webmaster | License and Disclaimer | Terms of Service

↑ TOPMain