DotNetNewsgroup.com  
web access to complete list of Microsoft.NET newsgroups
   home   |   control panel login   |   archive  |  
 
  carried group
academic
adonet
aspnet
aspnet.announcements
aspnet.buildingcontrols
aspnet.caching
aspnet.datagridcontrol
aspnet.mobile
aspnet.security
aspnet.webcontrols
aspnet.webservices
assignment_manager
datatools
dotnet.distributed_apps
dotnet.general
dotnet.myservices
dotnet.nternationalization
dotnet.scripting
dotnet.security
dotnet.vjsharp
dotnet.vsa
dotnet.xml
dotnetfaqs
framework
framework.clr
framework.compactframework
framework.component_services
framework.controls
framework.databinding
framework.drawing
framework.enhancements
framework.interop
framework.odbcnet
framework.performance
framework.remoting
framework.sdk
framework.setup
framework.webservices
framework.windowsforms
framework.wmi
frwk.windowsforms.designtime
lang.csharp
lang.jscript
lang.vb
lang.vb.controls
lang.vb.data
lang.vb.upgrade
lang.vc
lang.vc.libraries
  
 
start date: Mon, 20 Aug 2007 14:08:31 -0000,    posted on: microsoft.public.dotnet.framework        back       

Thread Index
  1    Shan Plourde
          2    UL-Tomten
          3    Shan Plourde
          4    Shan Plourde
          5    Shan Plourde
          6    UL-Tomten
          7    Shan Plourde
          8    UL-Tomten
          9    UL-Tomten
          10    Shan Plourde
          11    UL-Tomten
          12    UL-Tomten


.NET 2.0 - Sending Emails - Subject and Attachment Name Encoding Issues   
Hi there,
I have an e-commerce website that sends automated emails that contain
an automatically generated PDF attachment. It's similar to this email
sample:

------
Email Sample:
Subject: <Name> just completed their assessment
Attachment: <Name> assessment results.PDF
Body: This is a friendly alert that <Name> just completed their
assessment...
------

The system correctly stores <Name> in a SQL Server nvarchar database
column. Some names stored use Latin characters, while others use
Chinese characters. Notably the issue that I've just recently found is
that the automatically generated emails are not showing <Name>
correctly if the person's name happens to have Chinese characters in
it.

Here is the code that is used to create and send the emails:

------
            SmtpClient client = new SmtpClient();
            MailAddress from = new MailAddress(this.emailFrom);

            MailAddress to = new MailAddress(this.emailTo,
               String.Format("{0} {1}", this.firstName,
this.lastName));

            MailMessage msg = new MailMessage(from, to);

            if (this.attachment != null)
            {
                this.attachment.NameEncoding =
System.Text.Encoding.UTF8;
                msg.Attachments.Add(this.attachment);
            }

            msg.SubjectEncoding = System.Text.Encoding.UTF8;

            // The line below sets the subject to something like:
            // "<Name> just completed their assessment"
            msg.Subject =
templateManager.ProcessFileTemplate("AssessmentComplete.Subject.vm",
context);

            msg.Body = templateManager.ProcessFileTemplate(
                "AssessmentComplete.vm", context);

            msg.BodyEncoding = System.Text.Encoding.UTF8;
            client.Send(msg);
------

I built this awhile back assuming that since I was setting all email
encodings to UTF8, the email system would work for various languages.
Unfortunately that's not the case. When an automatically generated
email is sent where the person's name contains Chinese characters, the
subject will typically read:

"??? ?? ??? just completed their assessment"

The email body however correctly displays the Chinese name. The PDF
file attachment name will be named:
"??? ?? ??? assessment results.PDF"

msg.Subject does write to console and shows in the debugger correctly
with the Chinese characters, as does the body and file attachment
name. So the storage of the Chinese characters is correct, as is the
retrieval into .NET String objects.

The error seems to be happening during the transport of the email I
suspect. When I hard-code the subject and attachment encodings to
something such as System.Text.Encoding.GetEncoding("gb2312"), which is
Simplified Chinese, the Chinese characters display correctly.

Why wouldn't UTF8 encoding work though? Also, I'm not able to simply
hardcode an encoding of "gb2312" as the subject and file attachment
encoding - what happens if names are stored in other languages? This
would clearly fail. Maybe there's a way to guess at the encoding of
a .NET string, but I'm not aware of one. Shouldn't UTF8 work in the
first place?

What do people normally do to handle this? Should I create email
subject lines with Unicode escape codes for all characters? If so, is
there an out of the box approach to do this?

Confused!

Thanks for your help,
Shan
Date:Mon, 20 Aug 2007 14:08:31 -0000   Author:  

Re: .NET 2.0 - Sending Emails - Subject and Attachment Name Encoding Issues   
On Aug 20, 4:08 pm, Shan Plourde  wrote:


> The error seems to be happening during the transport of the email I
> suspect.


Can you first verify that the text is in fact corrupt (by, for
example, sending mail elsewhere) and that it's not just your e-mail
reader that's breaking things?
Date:Mon, 20 Aug 2007 13:41:07 -0700   Author:  

Re: .NET 2.0 - Sending Emails - Subject and Attachment Name Encoding Issues   
On Aug 20, 4:41 pm, UL-Tomten  wrote:

> On Aug 20, 4:08 pm, Shan Plourde  wrote:
>
> > The error seems to be happening during the transport of the email I
> > suspect.
>
> Can you first verify that the text is in fact corrupt (by, for
> example, sending mail elsewhere) and that it's not just your e-mail
> reader that's breaking things?


Hi UL-Tomten - I verified this issue with the following email clients:
Gmail, Outlook 2007, Outlook 2003. I also tried forwarding emails back
and forth to these clients and the same issue was occurring. Each
client demonstrates the same issue - the Chinese characters display as
question marks, and the rest of the email's Latin characters display
without any issues. And again, it's the email subject and file
attachment name that demonstrate this issue. Chinese characters in the
email body are fine.
Date:Mon, 20 Aug 2007 20:47:35 -0000   Author:  

Re: .NET 2.0 - Sending Emails - Subject and Attachment Name Encoding Issues   
Here are some sample Chinese Characters that can cause this issue to
occur (just random characters that I'm using for testing purposes):
Date:Tue, 21 Aug 2007 00:20:16 -0000   Author:  

Re: .NET 2.0 - Sending Emails - Subject and Attachment Name Encoding Issues   
On Aug 20, 8:20 pm, Shan Plourde  wrote:

> Here are some sample Chinese Characters that can cause this issue to
> occur (just random characters that I'm using for testing purposes):


The Chinese characters I tried to show just now are not appearing here
on this newgroup when I post through Google Groups. Anyhow, it doesn't
seem to matter what the Characters are that are used, if they are any
Chinese characters then the issue occurs.
Date:Tue, 21 Aug 2007 00:41:22 -0000   Author:  

Re: .NET 2.0 - Sending Emails - Subject and Attachment Name Encoding Issues   
On Aug 21, 2:41 am, Shan Plourde  wrote:


> The Chinese characters I tried to show just now are not appearing here
> on this newgroup when I post through Google Groups. Anyhow, it doesn't
> seem to matter what the Characters are that are used, if they are any
> Chinese characters then the issue occurs.


Your sample code works as it should when I run it. Could you try the
following:

1. Instead of using the templateManager, try hard-coding a string with
Chinese characters into your code
2. Send the mail to gmail, and click "options" and then "show
original" and tell us what the "subject" line says (it should be
something along the lines of "=?utf-8?B?....==?=");
Date:Tue, 21 Aug 2007 08:06:27 -0000   Author:  

Re: .NET 2.0 - Sending Emails - Subject and Attachment Name Encoding Issues   
On Aug 21, 4:06 am, UL-Tomten  wrote:

> On Aug 21, 2:41 am, Shan Plourde  wrote:
>
> > The Chinese characters I tried to show just now are not appearing here
> > on this newgroup when I post through Google Groups. Anyhow, it doesn't
> > seem to matter what the Characters are that are used, if they are any
> > Chinese characters then the issue occurs.
>
> Your sample code works as it should when I run it. Could you try the
> following:
>
> 1. Instead of using the templateManager, try hard-coding a string with
> Chinese characters into your code
> 2. Send the mail to gmail, and click "options" and then "show
> original" and tell us what the "subject" line says (it should be
> something along the lines of "=?utf-8?B?....==?=");


Hi UL-Tomten, thanks for following up. Actually I lied! I only tested
this with Outlook 2007 at first. After testing --- and I did try
everything that you suggested to isolate stuff, and the subject did
indeed only contain Chinese characters during a debugging session
where I was using the debugger to change message properties and send
to various email addresses --- I have found the following with emails
with Chinese characters in the subject, body and file attachment name:

1. When the email is sent to Gmail and Yahoo email addresses and
viewed with their web viewers, all Chinese characters show correctly
2. When the email is sent to a Microsoft Exchange Server email address
at my company and viewed in my Outlook 2007 client, the subject and
file attachment name show question marks, but the body shows Chinese
Characters
3. When the email is sent to a Gmail email address and viewed with an
Outlook 2007 configured email client that retrieves Gmail mail using
pop3, all Chinese characters show correctly!
4. When the email is sent to a Microsoft Exchange Server email address
at my company and viewed in my company's Microsoft webmail interface,
the same issue - the question marks - again happens
5. Here's an interesting one - when the email is sent to my wife's
work email, which is also Microsoft Exchange Server, her Outlook 2007
correctly shows all Chinese characters!
6. A colleague from another company, which also uses Microsoft
Exchange Server, received the email successfully with the same issue
that I have

I'm not sure what that means then. To summarize though, it's only my
work email that the issue is happening with. Could it be possible that
perhaps then there's some sort of Exchange Server problem? I am not
really sure right now, but interested to know if you may have any
ideas.

Thanks
Shan
Date:Wed, 22 Aug 2007 05:09:09 -0000   Author:  

Re: .NET 2.0 - Sending Emails - Subject and Attachment Name Encoding Issues   
On Aug 22, 7:09 am, Shan Plourde  wrote:


> Could it be possible that perhaps then there's some sort of Exchange
> Server problem? I am not really sure right now, but interested to
> know if you may have any ideas.


I've had these problems myself, in which case I think Outlook 2000 was
the problem. My guess at the time was that the UI control that
rendered the subject didn't support Unicode and/or Uniscribe. To
render the Chinese characters, a different font has to be used, so it
may even boil down to a client setting problem (if you choose a weird
font in Outlook, it could potentially break things, I'm not sure). But
since it works for you on the same computer using the same Outlook
2007 with POP3, but not Exchange (right?), that's not likely to be the
problem anymore.

Either way, the e-mails produced by your code are perfectly valid, and
they should work. Perhaps changing some settings in regard to e.g. the
transport encoding might help, but that sounds like an unreliable
workaround.

I haven't been able to reproduce these problems on XPSP2 using any
combination of Gmail, IE7, Firefox 2, OWA2007 and Thunderbird 2, and I
don't have an Outlook handy, so I'm not sure how to help you further.
But what I would do is try to track down the exact location of the
problem, both in the Web case and the Outlook case.

So, in the Web case: can you verify it's not an encoding problem in
the browser? Could you try a different browser? If you use Fiddler 2
to inspect the traffic from the web server, can you verify that the
Chinese characters are question marks before they reach your browser?
Or can you perhaps "View source" using a reliable source viewer?

In the Outlook case: If you open a corrupt message and go to View ->
Options -> Internet headers (I think that's where it was in Outlook
2000 at least), you should be able to see the un-decoded headers,
including the Subject field. Can you see what it says there?
Date:Wed, 22 Aug 2007 00:08:11 -0700   Author:  

Re: .NET 2.0 - Sending Emails - Subject and Attachment Name Encoding Issues   
On Aug 22, 9:08 am, UL-Tomten  wrote:


> I haven't been able to reproduce these problems [...]


Regarding the Web case; I've tested one OWA 2003 installation, and the
simple client renders its pages as Western European, which means
Chinese characters are lost. The rich client works as expected, and
renders its pages as Unicode. I've also tested an OWA 2007
installation, where both the simple and rich clients worked. This
might be a server-side setting; I have a feeling a Chinese Windows
Server installation would not render as Western European by default. I
also have a feeling the simple client in OWA 2003 doesn't render as
Unicode because UTF-8 support in browsers wasn't as good five years
ago as it is now. Either way, not related to the Framework or your
code.
Date:22 Aug 2007 23:40:28 -0700   Author:  

Re: .NET 2.0 - Sending Emails - Subject and Attachment Name Encoding Issues   
On Aug 23, 2:40 am, UL-Tomten  wrote:

> On Aug 22, 9:08 am, UL-Tomten  wrote:
>
> > I haven't been able to reproduce these problems [...]
>
> Regarding the Web case; I've tested one OWA 2003 installation, and the
> simple client renders its pages as Western European, which means
> Chinese characters are lost. The rich client works as expected, and
> renders its pages as Unicode. I've also tested an OWA 2007
> installation, where both the simple and rich clients worked. This
> might be a server-side setting; I have a feeling a Chinese Windows
> Server installation would not render as Western European by default. I
> also have a feeling the simple client in OWA 2003 doesn't render as
> Unicode because UTF-8 support in browsers wasn't as good five years
> ago as it is now. Either way, not related to the Framework or your
> code.


Hi UL - You're absolutely right. I was testing yesterday with our IT
director and he was analyzing the raw incoming data into our Microsoft
Exchange Server mail server - the raw message subject coming in was
indeed UTF-8 encoded, but something in the email server processing
pipeline was stripping out some of the stream's characters, and
leaving the encoding as UTF-8. The net result is whatever that server
side process in the pipeline is destroying the subject, specifically
if the encoding is UTF-8. If I set the encoding to a specific
encoding, i.e. such as "gb2312" to handle Chinese characters within a
subject line, then the server side process keeps the subject stream
intact. As of now he wasn't sure what in the processing pipeline was
causing the issue but he was still investigating.

I'm guessing that if I set the encoding specifically like this, it
will also help to decrease the chances of other users using this
service experiencing the same issue - some of their mail servers might
also be destroying the subject in their processing pipelines.

Unfortunately that means that I have to do a bit of refactoring to the
message sending code to not simply make it UTF-8. Since the website
operates in a finite number of languages, it should be somewhat safe
to set the encoding based on the language that the website was used by
a given user at the time that they completed a self-assessment. Of
course it won't work if someone enters their name containing say
Chinese and Japanese characters, but in reality this should never be
the case.

I wish I could just use UTF-8 for everything, it is my preference, but
unfortunately it won't work as reliably as setting the encoding.

Interested to hear if you may have found the same issue with Exchange
servers or other mail servers.

Thanks again,
Shan
Date:Thu, 23 Aug 2007 12:26:06 -0000   Author:  

Re: .NET 2.0 - Sending Emails - Subject and Attachment Name Encoding Issues   
On Aug 23, 2:26 pm, Shan Plourde  wrote:


> intact. As of now he wasn't sure what in the processing pipeline was
> causing the issue but he was still investigating.


Please post the findings here eventually. It will bring some nice
closure to my bad experiences of yore.


> message sending code to not simply make it UTF-8. Since the website
> operates in a finite number of languages, it should be somewhat safe
> to set the encoding based on the language that the website was used by


I think you'll have to challenges here:

1. Finding a suitable encoding for each language that is compatible
with .NET as well as users' e-mail clients and web browsers.

2. Coping with only having access to the encoding-specific characters
and ASCII.

You might want to make a note of which, if any, characters are lost
when you encode your messages as non-Unicode encodings: decode the
encoded string and compare it with the original string, or write your
own encoder fallback class that informs you of character fallbacks.
Date:Thu, 23 Aug 2007 13:53:17 -0000   Author:  

OT: Re: .NET 2.0 - Sending Emails - Subject and Attachment Name Encoding Issues   
On Aug 23, 2:26 pm, Shan Plourde  wrote:


> indeed UTF-8 encoded, but something in the email server processing
> pipeline was stripping out some of the stream's characters, and
> leaving the encoding as UTF-8. The net result is whatever that server


I'm guessing that Exchange is configured to somehow alter and/or
inspect subject lines, and does not support all the characters in
UTF-8, thus silently mangling subject lines upon re-encoding. Perhaps
there is a missing service pack somewhere, or there is a requirement
on "asian text support" being installed which is not satisfied.

I'm guessing there is a better newsgroup for this discussion now than
this one...
Date:Thu, 23 Aug 2007 13:57:40 -0000   Author:  

Google
 
Web dotnetnewsgroup.com


COPYRIGHT ?2005, EUROFRONT WORLDWIDE LTD., ALL RIGHT RESERVE  |   Contact us