Charset Property

Description

Gets and sets the preferred character set label that is used to for converting non-ASCII characters contained in header field strings to/from a byte encoded representation.

Property type

Read-write property

Syntax

Visual Basic
Public Property Charset As String

Remarks

When communicating with international mail clients, you may need to be aware of Unicode character encoding/decoding issues. At the highest level, character encoding/decoding can be viewed as taking a visual string representation displayable in a UI, translating that visual string representation into a multi-byte values for transmission between systems, then retranslating the multi-byte values back into the original visual string (Unicode) representation. For this operation to occur successfully, the correct preferred character set must be specified so that the correct codepage is used.

Header fields, by RFC specification, are required to use the ASCII character set. Such an approach creates a challenge when working with multi-national character sets which require going beyond the 0-127 character limit. To accommodate these non-ASCII requirements, a system of encoding and decoding that converts non-ASCII characters into a string of ASCII characters was created. This technique uses a character set designation that maps to a codepage. A codepage is a mapping between a Unicode character and a multi-byte sequence. This mapping is unique for different codepages.

When creating a message (encoding), this property will default to the current system codepage and its preferred character set label. Set this property to override this behavior and specify a different preferred character set. For example, if your operating system is using CodePage 1252, which is Western European (ISO) and the preferred character set defaults to "iso-8859-1", but you wish to use the Japanese preferred character set of iso-2022-jp that maps to CodePage 932, you would set this property to "iso-2022-jp".

When reading a message (decoding), if the header contains non-ASCII data, but is properly encoded into an ASCII representation using the RFC 2047 specification of charset/encoding/content then the preferred character set is read from the header data. If the header erroneously contains non-encoded non-ASCII characters, a conversion from multi-byte to Unicode strings must still be made so the Message object will first look to see if the Charset property has been set. If no character set information is found, then the Content-Type header is checked for a character set label. If a character set label is not found, the message will be scanned for other encoded words (words that follow the encoding rule specified below) that contain character set information. If character set information is still not found, the system default value will be used. Please note that raw non-ASCII data in a header field is not allowed by RFC standards, and this is only done to add compatibility to non-compliant mail clients.

In order to properly decode the encoded data back to its original form, you must know the character set that was used to encode the data. If a header field has been encoded, the content of the header will use a special syntax which looks like the following:

=?[charset]?[encoding]?[header content]?=

The =? and ?= delimiters at the beginning and end of the header indicate that this header was encoded. The value of [charset] indicates the preferred character set (which maps to an ANSI code page, defining the multi-byte representation for a given character) to use to correctly interpret this header (an example preferred character set for Japanese CodePage 932 is: "iso-2022-jp"). The value of [encoding] indicates the encoding type to use ("B" for Base64, "Q" for Quoted-Printable) which performs 7-bit encoding on data that could be 8-bit. The value of [header content] contains the encoded content. An example of a Base64 encoded iso-2022-jp header line is below.

Subject: =?iso-2022-jp?B?UmU6IA0KUmVsZWFzZSBkYXRlIHV0ayB2ZXIgMi4wID8=?=

When reading messages this property will contain the character set string used to encode the headers (if one was used).

This property applies to header fields only. To get/set the character set used to encode message body or body parts, see Message.Charset or Part.Charset. For more information on message encoding see Message Encoding.