Below is my "test code", and I saved them into 4 different file formats (the additional one is Unicode).
<html> <head> </head> <body> 一 </body> </html>
Note, there's no character set or doctype being specified in the html header. It is rendered in quirks mode.
I tested in 2 different browsers, IE 9 and FireFox 20, surprisingly, I got 2 different results. I am using document.charset and document.characterSet for IE and FF respectively to check for the character encoding of the document.
|Unicode (big endian)||unicodeFEFF||UTF-16|
FF20 is giving Mojibake characters, it only show correctly when the encoding of the browser is changed to Chinese Traditional (Big5). The rest are rendered fine (readable) by default.
So, I added the charset attribute in meta tag section. I also added the 4.01 strict doctype.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
<meta http-equiv="content-type" content="text/html;
|Unicode (big endian)||UTF-8||unicodeFEFF||UTF-16|
Except ANSI file in FF20 is changed to big5, the rest remain same encoding. However, for Unicode (big endian) file in IE9, the following warning message is observed :
HTML1114: Codepage unicodeFEFF from (UNICODE byte order mark) overrides conflicting codepage utf-8 from (META tag)
There is no selection to change the page encoding to UTF-16 in FF, and there is no selection of unicodeFEFF in IE. I have no idea (yet?) why the document character set is returning those results.
From the above result, the recommended file format to have the html document to be saved is in UTF-8 format, that if we are using characters which is out of US-ASCII character set.