« Get the Most from Search EnginesFastCGI on DreamHost »
How do I use .htaccess directives on an Apache server to serve files with a specific encoding?
You can try these, 1 at a time :) Using FilesMatch and Files in htaccess
AddDefaultCharset UTF-8
<FilesMatch "\.(htm|html|css|js)$"> ForceType 'text/html; charset=UTF-8' </FilesMatch>
<FilesMatch "\.(htm|html|css|js)$"> AddDefaultCharset UTF-8 </FilesMatch>
AddCharset UTF-8 .html
AddType 'text/html; charset=UTF-8' html
The method I personally use on all my sites is to use the AddDefaultCharset directive in the web root .htaccess file:
AddDefaultCharset UTF-8
This will add the following header to the server output of your text/html pages. Content-Type: text/html; charset=UTF-8
NOTE: The following meta tag is commonly used to do this same thing, so if you use this .htaccess method you will no longer need to include that meta tag, which is less code and thats always a good thing.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
It is important to ensure that any information about character encoding sent by the server is correct, since information in the HTTP header overrides information in the document itself.
Many Apache servers are configured to send files using the ISO-8859-1 (Latin-1) encoding. In the examples in this FAQ, we’ll assume that you want to serve your file or files using a different encoding than that specified in the default configuration. (For advice on choosing an encoding see the tutorial Character sets & encodings in XHTML, HTML and CSS.)
The following shows an example of an HTTP header that accompanies a file sent to a user agent. In this case the character encoding information is included in the Content-Type header on the second line from the bottom.
Example:
HTTP/1.1 200 OK Date: Wed, 05 Nov 2003 10:46:04 GMT Server: Apache/1.3.28 (Unix) PHP/4.2.3 Content-Location: CSS2-REC.en.html Cache-Control: max-age=21600 Expires: Wed, 05 Nov 2003 16:46:04 GMT Last-Modified: Tue, 12 May 1998 22:18:49 GMT ETag: "3558cac9;36f99e2b" Accept-Ranges: bytes Content-Length: 10734 Connection: close Content-Type: text/html; charset=utf-8 Content-Language: en
In the example the Content-Type header expresses both the MIME type of the file and the character encoding. The MIME type describes the format of the file being served. HTML files are typically served as text/html. The character encoding (or ‘charset‘) of this file is UTF-8.
To learn how to view the HTTP header for a file see the article Checking HTTP Headers.
Files on an Apache server may be served with a default character encoding declaration in the HTTP header that conflicts with the actual encoding of the file. The character encoding sent by the server may be the out-of-the-box default, a default set by the system administrator, or a result of implementing various Apache directives. In other cases no character encoding information is sent by the server when it is actually desired.
If the server is set up to allow users or administrators to change information in .htaccess files, these can provide a way to override default settings. This FAQ shows you how.
There are a couple of different scenarios to bear in mind. In the first instance, you may want to change the default for all the files in a directory with the same extension. Alternatively, you may want to change the default for a single file or small number of files. We will explore these in turn.
In our examples we will assume that the default server configuration serves files as ISO-8859-1, but that you want to serve your file or files using UTF-8 (a very sensible strategy!).
This article is written for content authors, rather than system administrators. Setting the server’s default encoding is beyond the scope of this article.
This advice is only relevant if you are happy to declare the character encoding of your document via the HTTP header. In some cases you may not want that.
Note that this FAQ also assumes that your server is set up to use work in .htaccess files on your server. It is also assumed that it is not appropriate to simply change the default configuration of the server. If you are not sure, contact your server administrator.
You should also be aware of the conventions in use on your server for association of character encoding information with extensions. In some cases the server may be set up in the expectation that character encodings are indicated by encoding-specific extensions, eg. example.html.utf8 where it is the .utf8 that needs to be associated with a character encoding, rather than the .html (which may be associated with the file type).
If these approaches fail, you should consult the Apache manuals (see attached links) or your server administrator.
Use the AddCharset directive to associate the character encoding with all files having a particular extension in the current directory and its subdirectories. For example, to serve all files with the extension .html as UTF-8, open the .htaccess file in a plain text editor and type the following line:
AddCharset UTF-8 .html
The extension can be specified with or without a leading dot. You can add multiple extensions to the same line. This will still work if you have file names such as example.en.html or example.html.en.
The example will cause all files with the extension .html to be served as UTF-8. The HTTP Content-Type header will contain a line that ends with the ‘charset’ information as shown in the example that follows.
Content-Type: text/html; charset=UTF-8
Note: All files with this extension in all subdirectories of the current location will also be served as UTF-8. If, for some reason, you need to serve the odd file with a different encoding you will need to override this using additional directives.
Note: You can associate the character encoding with any extension attached to your file. For example, suppose you do language negotiation and you have pages in two languages that follow the model example.en.html and example.ja.html. Let’s also suppose that you are happy to serve English pages using your server’s ISO-8859-1 default, but want to serve Japanese files in UTF-8. To do this, you can associate the character encoding with the language extension, as follows:
AddCharset UTF-8 .ja
Take note, however, that, if you can, it might be a better solution to change the server default to UTF-8, or serve all files in new directories as UTF-8.
Note: It is also possible to achieve the same result using the AddType directive, although this declares both the character encoding and the MIME type at the same time. The decision as to which is most appropriate will depend in part on how you are using extensions for content negotiation. If you are using different extensions to express the document type and the character encoding, this is less likely to be appropriate.
AddType 'text/html; charset=UTF-8' html
Let’s now assume that you want to serve only one file as UTF-8 in a large directory where all the other older files are correctly served as ISO-8859-1. The file you want to serve as UTF-8 is called example.html. Open the .htaccess file in a plain text editor and type the following:
<Files "example.html"> AddCharset UTF-8 .html </Files>
What we did here was wrap the directive discussed in the previous section in some markup that identifies the specific file we are concerned with. If you have the need, there is also a slightly different syntax that allows you to specify a number of file names using a regular expression.
Note: It is also possible to achieve the same result using the AddType directive shown above, or, in this case, the ForceType directive, although these declare both the character encoding and the MIME type at the same time.
<Files "example.html"> ForceType 'text/html; charset=UTF-8' </Files>
Note: Any files with the same name in a subdirectory of the current location will also be served as UTF-8, unless you create a counter directive in the relevant directory.
When two extension rules apply to the same document the order of extensions is important. Thus, in the following example
AddCharset UTF-8 .utf8 AddCharset windows-1252 .html
the file ‘example.utf8.html’ will be served as “windows-1252″ and ‘example.html.utf8′ as UTF-8.
« Get the Most from Search Engines
FastCGI on DreamHost »
Please consider donating to support active development of the free software and articles here.![]()
The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect. Tim Berners-Lee
Hi
I’m looking for a solution –
I have an Apache server (shared) and the default charset is utf-8, but when I write things in French the chars are showed as little cubes.
So when I put in my meta tags I get weird chars showing up because I’m not using UTF.
So either I’m using UTF and get french cubes or ISO-8859 and get cubes but in both ways i get errors in my text… Please help me
what about hebrew font?
I love this site. But i’m having a very rough year, and my server people, Racksopace/Mosso, have their Apache configured by a Windows IIS guy, from the looks of it,. Sending text/html out, disregarding 100% validated XHTML with the only appropriate content-type: application/xhtml+xml and UTF-8, so we end up with ISO-8859-1 and text/html. If i wanted to deal html, I’d write html 4.01 and let their 1994-style defaults have their way.
But I associated application/xhtml+xml with .html files, and these guys are acting like ‘content-negotiation’. is terrorism or something. This site’s the best, of all the ones I’ve seen, but I am not going to redo web sites as PHP, just to feed Explorer some nice 1994 quirks mode bs. So, I am still looking for an htaccess or mod_rewrite idea that can simply deal some nice html to Windows, and let real deal pass through to everybody else.
I’m about ready to just put a portal up with a click here or click there ‘choice’ for Win IE here, everybody else over there. if the guys back at CECOM even knew I was thinking of this idea, I’d be mud. Why are all these other guys writing xhtml and then stripping it of any possible benefits to go out as html? That seems irrational. I heard that with xhtml 1.1 that text/xml won’t even validate, so what are these xhtml guys planning on next? But the site is terrific, I wish I understood the context, a lot more, of some of these principles and methods.
is there a way to conditionally enable htaccess authentication when browser (http accept language) is like ^zh (Chinese)? something like this -
SetEnvIf HTTP_ACCEPT_LANGUAGE ^zh.* I_AM_CHINESE Order allow,deny Allow from all Deny from I_AM_CHINESE AuthName "Authorized Users Only." AuthType Basic AuthUserFile .htpasswd require valid user satisfy any
Any help is appreciated :)
Hello all. Thanks for this article.
my php.ini:
default_mimetype = "text/html" default_charset = "utf-8"
my apache httpd.conf:
AddDefaultCharset UTF-8
my page encoding: utf-8 without bom.
after i check my page server headers result:
content-type: text/html; charset=UTF-8, text/html; charset=UTF-8
Why have two values ???
after disabled this line to:
#AddDefaultCharset UTF-8 and #default_charset = "utf-8"
restart my apache:
refresh my page… wrong character encodings on this my page…. please help me.. (sorry my english not enough)
Thanks, this was very helpful. The CPAN module Sendmail.pm uses ‘text/plain; charset=”iso-8859-1″‘, so overriding that helps our emails with Chinese characters in them be read properly.
wow this is huge, thanks!! I didn’t think they allowed me to have my own custom php.ini on dreamhost, — it’s not very well publicized
one question, you mentioned making the cgi-bin directory in my website’s document root with this command:
mkdir -p cgi-bin
I’ve already made a cgi-bin directory one directory before my website directory, this is where I have the python script I run to make my google sitemap. Is this where I should be putting the files, or does it go into my website root?
I remember hearing somewhere that putting certain files in a webspace would be a security risk.
brent
Thanks so much for the followup
The problem is that I can’t get to php.ini since I’m on a shared host (dreamhost) they run php as cgi.
I put these lines in the beginning of all the files and it seems to have worked,
header('Content-type: text/html; charset=UTF-8');
but it’d be much nicer to use .htaccess or just have it declared once somewhere instead of on every file.
brent
great article, but it it didn’t work for me using dreamhost for some reason, I’m still getting iso pages instead of utf after taking out my meta tag. Any reasons this could be happening
brent
Will it work for .php? Will the output – after the PHP file is parsed be set as UTF 8?
It's very simple - you read the protocol and write the code. -Bill Joy
HTML | DCMI | GRDDL | XOXO | XDMP | XFN | DOM | XML | XHTML 1.1 Strict | CSS 2.1 | W3C | TLDP | WAI | DISA | ICSI | GIAC | SANS RR | GHOST | DEFCON | NIST | DHS CYBER | NIST | .:: Phrack Magazine ::.
↑ TOPExcept where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License, just credit with a link.
This site is not supported or endorsed by The Apache Software Foundation (ASF). All software and documentation produced by The ASF is licensed. "Apache" is a trademark of The ASF. HTTPD based on NCSA HTTPd
@ Brent
Another possibility is a small one but maybe its the problem. Apache serves
ErrorDocumentsby default using iso, the same can be said for Apache-generated directory index pages. Depending on the version of Apache you are using, you can change this.