Posts

Showing posts from April, 2013

Sending picture file in an email

It is easy to attach a picture in an email, from user's perspective, of course. But, what is actually sent over the Internet to the recipient server? There are 2 types how a picture can be transmitted over an email, as inline picture, or as an attachement. Before they are transfer over, they need to go through some conversion, that is binary-to-text encoding called Base64. Say, the binaries of the image starts with FF D8 FF , and the translation process can be as below : Original binaries FF   D8   FF Regroup the binaries 111111 111101 100011 111111 Associate decimal 62 61 35 63 From the Base64 index table / 9 j / Inline image To send as an inline, the email message shall have the following attributes. Content-Type: image/gif; name=" file name " Content-Transfer-Encoding: base64 X-Attachment-Id: image-id Content-ID: < image-id > Followed by the image binaries in Base64. The attributes and the image binaries is placed withi

Character set of a html document

To continue with the previous post on character encoding, my actual topic of interest is how the browser detect what is the character set is used before rendering the page when it is not specified in html header. Below is my "test code", and I saved them into 4 different file formats (the additional one is Unicode). < html > < head > </ head > < body > 一 </ body > </ html > Note, there's no character set or doctype being specified in the html header. It is rendered in quirks mode. I tested in 2 different browsers, IE 9 and FireFox 20, surprisingly, I got 2 different results. I am using document.charset and document.characterSet for IE and FF respectively to check for the character encoding of the document. File format IE9 FF20 ANSI big5 windows-1252 Unicode unicode UTF-16 Unicode (big endian) unicodeFEFF UTF-16 UTF-8 utf-8 UTF-8 FF20 is giving Mojibake characters, i

Character Encoding

My study object is the Chinese character "一". I am using Notepad in Window 7 to save in different formats, namely ANSI, Unicode and UTF-8. My system locale is Chinese (Traditional, Taiwan). I am using Traditional Chinese Google IME as input method. In simple words, encoding is to represent "something" in some "notation", decoding is to return the "something" from some "notation". Notation that I am refering here, is the binary representation. I downloaded Hex Edit to observe the differences of the "encoding" or format (used in Notepad). ANSI ANSI, which always refer to ASCII character set plus an extended character set. See http://ascii-table.com/ansi-table.php . However, this character set contains only 256 characters, it does not and unable to cover Chinese character. Sometimes, when saving Chinese text in notepad, it will prompt to save in other format, but sometimes, however, I am able to save it in ANSI format.