DotNetNewsgroup.com  
web access to complete list of Microsoft.NET newsgroups
   home   |   control panel login   |   archive  |  
 
  carried group
academic
adonet
aspnet
aspnet.announcements
aspnet.buildingcontrols
aspnet.caching
aspnet.datagridcontrol
aspnet.mobile
aspnet.security
aspnet.webcontrols
aspnet.webservices
assignment_manager
datatools
dotnet.distributed_apps
dotnet.general
dotnet.myservices
dotnet.nternationalization
dotnet.scripting
dotnet.security
dotnet.vjsharp
dotnet.vsa
dotnet.xml
dotnetfaqs
framework
framework.clr
framework.compactframework
framework.component_services
framework.controls
framework.databinding
framework.drawing
framework.enhancements
framework.interop
framework.odbcnet
framework.performance
framework.remoting
framework.sdk
framework.setup
framework.webservices
framework.windowsforms
framework.wmi
frwk.windowsforms.designtime
lang.csharp
lang.jscript
lang.vb
lang.vb.controls
lang.vb.data
lang.vb.upgrade
lang.vc
lang.vc.libraries
  
 
start date: Fri, 17 Aug 2007 07:39:10 -0700,    posted on: microsoft.public.dotnet.framework.aspnet        back       

Thread Index
  1    am


Best practice for translating web page character data so that page will be scrapable/e-mailable   
I have web pages that I periodically want to a) programmatically "scrape", 
and b) programmatically send in e-mail. These web pages are built via 
content management systems and occassionally have Word "curly quotation 
marks" and other weird entities embedded in them.

If you fail to translate characters properly, you have the familiar problem 
of some characters turning into question marks when sent in e-mail and/or 
scraped. You will see this problem all of the time on web-based newsletters 
and the like.

When I was working in classic ASP, I wrote "translate" functions that would 
render weird characters into their safe equivalents using a simple string 
"replace". This was a limited solution because it was premised on my ability 
to identify all of the problematic characters myself and translate them.

I am wondering if there is an all-in-one solution to this problem inside or 
outside of the .NET framework. I have read a bit about the character 
encoding classes and I'm hoping that one of them represent a complete 
solution to my problem.

Can anyone offer any guidance?

Thanks,
-KF
Date:Fri, 17 Aug 2007 07:39:10 -0700   Author:  

Google
 
Web dotnetnewsgroup.com


COPYRIGHT ?2005, EUROFRONT WORLDWIDE LTD., ALL RIGHT RESERVE  |   Contact us