|
|
|
start date: Tue, 14 Aug 2007 11:01:02 +0500,
posted on: microsoft.public.dotnet.framework
back
| Thread Index |
|
1
Enigma Boy
|
|
2
Jesse Houwing am
|
|
3
Alvin Bruney [MVP] some guy without an email address
|
|
4
Enigma Boy
|
|
5
Jesse Houwing am
|
Parsing a webpage
Hi folks,
I am retrieving a website for a site using httpWebRequest. What I want to
do with the retrieved webpage is list all the hyperlinks in the page. If I
do a simple regex search for <a then I get links that are commented out in
code and I don't want that. I want links that are actually active. This is
to do with reciprocal link check.
Can someone please point me in the right direction.
Thanks.
--
<a href="http://1pakistangifts.com">Send Gifts to Pakisan at #Pakistan Gifts
Store</a> | <a href="http://dotspecialists.com">Leading Software offshoring
and outsourcing service provider</a> | <a
href="http://websitedesignersrus.com">Professional Websites at affordable
prices</a>
Date:Tue, 14 Aug 2007 11:01:02 +0500
Author:
|
Re: Parsing a webpage
Hello Enigma,
> Hi folks,
>
> I am retrieving a website for a site using httpWebRequest. What I
> want to do with the retrieved webpage is list all the hyperlinks in
> the page. If I do a simple regex search for <a then I get links that
> are commented out in code and I don't want that. I want links that
> are actually active. This is to do with reciprocal link check.
>
> Can someone please point me in the right direction.
>
> Thanks.
>
Have a look at the HTML agility pack. It allows you to parse HTML as it were
XML.
http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack
--
Jesse Houwing
jesse.houwing at sogeti.nl
Date:Tue, 14 Aug 2007 10:55:34 +0000 (UTC)
Author:
|
Re: Parsing a webpage
You can also throw a regex at it from this site regexlib.com
--
Regards,
Alvin Bruney
------------------------------------------------------
Shameless author plug
Excel Services for .NET - MS Press
Professional VSTO 2005 - Wrox/Wiley
OWC Black Book www.lulu.com/owc
"Jesse Houwing" <jesse.houwing@newsgroup.nospam> wrote in message
news:21effc90e5cc8c9ac898a167047@news.microsoft.com...
> Hello Enigma,
>
>> Hi folks,
>>
>> I am retrieving a website for a site using httpWebRequest. What I
>> want to do with the retrieved webpage is list all the hyperlinks in
>> the page. If I do a simple regex search for <a then I get links that
>> are commented out in code and I don't want that. I want links that
>> are actually active. This is to do with reciprocal link check.
>>
>> Can someone please point me in the right direction.
>>
>> Thanks.
>>
>
> Have a look at the HTML agility pack. It allows you to parse HTML as it
> were XML.
>
> http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack
>
>
> --
> Jesse Houwing
> jesse.houwing at sogeti.nl
>
>
Date:Tue, 14 Aug 2007 19:07:26 -0400
Author:
|
Re: Parsing a webpage
Jesse, you are a life savor.
Thanks,
--
<a href="http://pakistan-gifts.com">Send Gifts to Pakistan</a> | <a
href="http://dotspecialists.com">Affordable software offshoring services</a>
| <a href="http://websitedesignersrus.com">Professional Website Design and
Development Services</a>
"Jesse Houwing" <jesse.houwing@newsgroup.nospam> wrote in message
news:21effc90e5cc8c9ac898a167047@news.microsoft.com...
> Hello Enigma,
>
>> Hi folks,
>>
>> I am retrieving a website for a site using httpWebRequest. What I
>> want to do with the retrieved webpage is list all the hyperlinks in
>> the page. If I do a simple regex search for <a then I get links that
>> are commented out in code and I don't want that. I want links that
>> are actually active. This is to do with reciprocal link check.
>>
>> Can someone please point me in the right direction.
>>
>> Thanks.
>>
>
> Have a look at the HTML agility pack. It allows you to parse HTML as it
> were XML.
>
> http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack
>
>
> --
> Jesse Houwing
> jesse.houwing at sogeti.nl
>
>
>
Date:Thu, 16 Aug 2007 12:51:55 +0500
Author:
|
Re: Parsing a webpage
Hello Enigma,
> Jesse, you are a life savor.
>
> Thanks,
You're welcome. I suppose it worked like a charm ;)
--
Jesse Houwing
jesse.houwing at sogeti.nl
Date:Thu, 16 Aug 2007 10:40:26 +0000 (UTC)
Author:
|
|
|