DotNetNewsgroup.com  
web access to complete list of Microsoft.NET newsgroups
   home   |   control panel login   |   archive  |  
 
  carried group
academic
adonet
aspnet
aspnet.announcements
aspnet.buildingcontrols
aspnet.caching
aspnet.datagridcontrol
aspnet.mobile
aspnet.security
aspnet.webcontrols
aspnet.webservices
assignment_manager
datatools
dotnet.distributed_apps
dotnet.general
dotnet.myservices
dotnet.nternationalization
dotnet.scripting
dotnet.security
dotnet.vjsharp
dotnet.vsa
dotnet.xml
dotnetfaqs
framework
framework.clr
framework.compactframework
framework.component_services
framework.controls
framework.databinding
framework.drawing
framework.enhancements
framework.interop
framework.odbcnet
framework.performance
framework.remoting
framework.sdk
framework.setup
framework.webservices
framework.windowsforms
framework.wmi
frwk.windowsforms.designtime
lang.csharp
lang.jscript
lang.vb
lang.vb.controls
lang.vb.data
lang.vb.upgrade
lang.vc
lang.vc.libraries
  
 
start date: Tue, 14 Aug 2007 11:01:26 +0500,    posted on: microsoft.public.dotnet.framework.aspnet        back       

Thread Index
  1    Enigma Boy
          2    Alexey Smirnov
          3    Jesse Houwing am


Retrievel Hyperlinks for a web page in code   
Hi folks,

I am retrieving a website for a site using httpWebRequest.  What I want to 
do with the retrieved webpage is list all the hyperlinks in the page.  If I 
do a simple regex search for <a then I get links that are commented out in 
code and I don't want that.  I want links that are actually active.  This is 
to do with reciprocal link check.

Can someone please point me in the right direction.

Thanks.

-- 
<a href="http://1pakistangifts.com">Send Gifts to Pakisan at #Pakistan Gifts 
Store</a> | <a href="http://dotspecialists.com">Leading Software offshoring 
and outsourcing service provider</a> | <a 
href="http://websitedesignersrus.com">Professional Websites at affordable 
prices</a>
Date:Tue, 14 Aug 2007 11:01:26 +0500   Author:  

Re: Retrievel Hyperlinks for a web page in code   
On Aug 14, 8:01 am, "Enigma Boy"  wrote:

> Hi folks,
>
> I am retrieving a website for a site using httpWebRequest.  What I want to
> do with the retrieved webpage is list all the hyperlinks in the page.  If I
> do a simple regex search for <a then I get links that are commented out in
> code and I don't want that.  I want links that are actually active.  This is
> to do with reciprocal link check.


Hi, I think you can try to clean the text before you get the links.
For example:

html_code = Regex.Replace(html_code, "<!--((.|\n)*?)-->", "");

This will replace all commented code by an empty string and then you
can get the links.
Date:Tue, 14 Aug 2007 01:17:02 -0700   Author:  

Re: Retrievel Hyperlinks for a web page in code   
Hello Enigma,


> Hi folks,
> 
> I am retrieving a website for a site using httpWebRequest.  What I
> want to do with the retrieved webpage is list all the hyperlinks in
> the page.  If I do a simple regex search for <a then I get links that
> are commented out in code and I don't want that.  I want links that
> are actually active.  This is to do with reciprocal link check.
> 
> Can someone please point me in the right direction.
> 
> Thanks.


Have a look at the HTML Agility pack. It allows you to treat the HTML as 
it were XML.

http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack

--
Jesse Houwing
jesse.houwing at sogeti.nl
Date:Tue, 14 Aug 2007 10:56:55 +0000 (UTC)   Author:  

Google
 
Web dotnetnewsgroup.com


COPYRIGHT ?2005, EUROFRONT WORLDWIDE LTD., ALL RIGHT RESERVE  |   Contact us