Posts

Showing posts from April, 2013

Sending picture file in an email

It is easy to attach a picture in an email, from user's perspective, of course. But, what is actually sent over the Internet to the recipient server?

There are 2 types how a picture can be transmitted over an email, as inline picture, or as an attachement. Before they are transfer over, they need to go through some conversion, that is binary-to-text encoding called Base64.

Say, the binaries of the image starts with FF D8 FF, and the translation process can be as below :

Original binariesFF   D8   FFRegroup the binaries111111111101100011111111Associate decimal62613563From the Base64 index table/9j/

Inline image

To send as an inline, the email message shall have the following attributes.

Content-Type: image/gif; name="file name"
Content-Transfer-Encoding: base64
X-Attachment-Id: image-id
Content-ID: <image-id>Followed by the image binaries in Base64. The attributes and the image binaries is placed within the content type boundary.

In the email content, to refer to the imag…

Character set of a html document

To continue with the previous post on character encoding, my actual topic of interest is how the browser detect what is the character set is used before rendering the page when it is not specified in html header.

Below is my "test code", and I saved them into 4 different file formats (the additional one is Unicode).

<html><head> </head> <body> 一 </body> </html>
Note, there's no character set or doctype being specified in the html header. It is rendered in quirks mode.

I tested in 2 different browsers, IE 9 and FireFox 20, surprisingly, I got 2 different results. I am using document.charset and document.characterSet for IE and FF respectively to check for the character encoding of the document.

File formatIE9FF20ANSIbig5windows-1252UnicodeunicodeUTF-16Unicode (big endian)unicodeFEFFUTF-16UTF-8utf-8UTF-8
FF20 is giving Mojibake characters, it only show correctly when the encoding of the browser is changed to Chinese Traditional (Big5). T…

Character Encoding

My study object is the Chinese character "一".

I am using Notepad in Window 7 to save in different formats, namely ANSI, Unicode and UTF-8. My system locale is Chinese (Traditional, Taiwan). I am using Traditional Chinese Google IME as input method.

In simple words, encoding is to represent "something" in some "notation", decoding is to return the "something" from some "notation". Notation that I am refering here, is the binary representation.

I downloaded Hex Edit to observe the differences of the "encoding" or format (used in Notepad).

ANSI

ANSI, which always refer to ASCII character set plus an extended character set. See http://ascii-table.com/ansi-table.php. However, this character set contains only 256 characters, it does not and unable to cover Chinese character. Sometimes, when saving Chinese text in notepad, it will prompt to save in other format, but sometimes, however, I am able to save it in ANSI format. As I search…