Character Sets Negotiation via Apache .var files

Apache charset negotiation via .var files

Choosing right document charset according to client Accept-Charset header field is very actual problem for countries with several active character code tables (like Russian or Japan).

Apache .var files is perfect place to handle such cases, Apache understand charset negotiation there starting from v1.2b1.

Here is an example: a.var file (try it). If you use proxy, reload result to be shure, I don't put any anti-caching directives here for simplicity. It work with following .htacess settings on this server:

AddType "text/html; charset=koi8-r" .html8
AddType "text/html; charset=windows-1251" .htmlw

If your browser generates proper Accept-Charset field, this example automatically chooses document in correct character set. When your browser accepts both KOI8-R and CP1251, KOI8-R document will be chosen with 10% comprehension.

URI: a; vary="type"

URI: a.html8
Content-Type: text/html; charset=koi8-r; qc=0.1

URI: a.htmlw
Content-Type: text/html; charset=windows-1251

It is covenient to store documents in the single character set, converting them on the fly. Sometimes it is possible to load conversion modules directly into HTTPD, but it is very implementation dependent and may require server re-building, so CGI scripts looks like more general solution here. In my previous example instead of two files in different character sets there can be one CGI script with character set passed as an argument which convert single file according to it. For example you can use trans Character Encoding Converter Generator Package to convert between various Russian character sets via UNICODE (trans has slightly incorrect koi8-r table, download right one instead).

WARNING: This method requires correct Accept-... fields coming from the browsers.