DotNetNewsgroup.com  
web access to complete list of Microsoft.NET newsgroups
   home   |   control panel login   |   archive  |  
 
  carried group
academic
adonet
aspnet
aspnet.announcements
aspnet.buildingcontrols
aspnet.caching
aspnet.datagridcontrol
aspnet.mobile
aspnet.security
aspnet.webcontrols
aspnet.webservices
assignment_manager
datatools
dotnet.distributed_apps
dotnet.general
dotnet.myservices
dotnet.nternationalization
dotnet.scripting
dotnet.security
dotnet.vjsharp
dotnet.vsa
dotnet.xml
dotnetfaqs
framework
framework.clr
framework.compactframework
framework.component_services
framework.controls
framework.databinding
framework.drawing
framework.enhancements
framework.interop
framework.odbcnet
framework.performance
framework.remoting
framework.sdk
framework.setup
framework.webservices
framework.windowsforms
framework.wmi
frwk.windowsforms.designtime
lang.csharp
lang.jscript
lang.vb
lang.vb.controls
lang.vb.data
lang.vb.upgrade
lang.vc
lang.vc.libraries
  
 
start date: Mon, 30 Jul 2007 03:22:13 -0700,    posted on: microsoft.public.dotnet.framework        back       

Thread Index
  1    Barry
          2    Jesse Houwing
                 3    Barry
                        4    Jesse Houwing
                        5    Barry
                        6    Jesse Houwing


Regular Expression problem   
Hello

Regex regex = new
        Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");


The above regualr expression return 24 fields instead of 42 for the record 
below, it ignores empty fields like  ,,,"Hello World",,,,,

1002,Jack,Wack,23772 East Rd,,Georgestown Twp,VA,48183,United 
States,999-966-9735,Jack,Wack,23772 East Rd,,Georgtown Twp,VA,12283,United 
States,519-966-9735,501,,Y,0,4/27/2007 15:04,5/10/2007 
12:50,Shipped,,,,Regular Processing,,,,,,,,,,,

can some Regex expert help

TIA
Barry
Date:Mon, 30 Jul 2007 03:22:13 -0700   Author:  

Re: Regular Expression problem   
* Barry wrote, On 30-7-2007 12:22:

> Hello
> 
> Regex regex = new
>         Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
> 
> 
> The above regualr expression return 24 fields instead of 42 for the record 
> below, it ignores empty fields like  ,,,"Hello World",,,,,
> 
> 1002,Jack,Wack,23772 East Rd,,Georgestown Twp,VA,48183,United 
> States,999-966-9735,Jack,Wack,23772 East Rd,,Georgtown Twp,VA,12283,United 
> States,519-966-9735,501,,Y,0,4/27/2007 15:04,5/10/2007 
> 12:50,Shipped,,,,Regular Processing,,,,,,,,,,,
> 
> can some Regex expert help


[^\",\r\n]+ in your Field definition requires at least one character in 
to match (+ means one or more). Change this to * (zero or more) and 
things should start working.

Regex regex = new
	Regex("((?<field>[^\",\r\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");


 From the regex it looks like you're trying to read multiple lines with 
one Regex.Match call. This could become very expensive.

You could also give this a try:
(?:(?:^|,)(?:"(?<field>(?:[^"]|"")*)"|(?<field>[^"\r\n,]*)))*\r?$

(\r? at the end is to compensate for a bug in the .NET 2.0 
implementation of the regex parser.)

It will match a whole line in one match object. You can extract the 
values with the following code:

Regex rx = new Regex("...", RegexOptions.MultiLine);

Match m = rx.Match(input);
if (m.Success) // while (m.Success)
{
	foreach (Capture c in m.Groups["field"].Captures)
	{
		string extracted = c.Value;
	}
	// m = m.NextMatch();
}

If you have an input string that contains multiple lines you can use the 
while(m.Success) in combination with m = m.NextMatch(); which I've 
commented out in the code above to loop through all the results in the 
input.

I'm not sure exactly what you're doing with the values you've captured, 
but you might also want to have a look at the OleDB delimited text 
driver which allows you to load the contents of a comma delimited text 
file into a dataset, saving you the trouble. Or you could try and use 
SQL Server Integration Services to load the data directly into a database.

Jesse
Date:Mon, 30 Jul 2007 13:02:07 +0200   Author:  

Re: Regular Expression problem   
Hi

Thanks for you quick reply.

My project is to read a file of comma-seperated record and process them to 
create an xml file.

I am keen to explore this OleDB delimited text thing, can you give me some 
clue on what search text i should look for in MSDN.

Barry

"Jesse Houwing"  wrote in message 
news:eTCBFlp0HHA.5740@TK2MSFTNGP04.phx.gbl...

>* Barry wrote, On 30-7-2007 12:22:
>> Hello
>>
>> Regex regex = new
>> 
>> Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
>>
>>
>> The above regualr expression return 24 fields instead of 42 for the 
>> record below, it ignores empty fields like  ,,,"Hello World",,,,,
>>
>> 1002,Jack,Wack,23772 East Rd,,Georgestown Twp,VA,48183,United 
>> States,999-966-9735,Jack,Wack,23772 East Rd,,Georgtown 
>> Twp,VA,12283,United States,519-966-9735,501,,Y,0,4/27/2007 
>> 15:04,5/10/2007 12:50,Shipped,,,,Regular Processing,,,,,,,,,,,
>>
>> can some Regex expert help
>
> [^\",\r\n]+ in your Field definition requires at least one character in to 
> match (+ means one or more). Change this to * (zero or more) and things 
> should start working.
>
> Regex regex = new
> Regex("((?<field>[^\",\r\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
>
>
> From the regex it looks like you're trying to read multiple lines with one 
> Regex.Match call. This could become very expensive.
>
> You could also give this a try:
> (?:(?:^|,)(?:"(?<field>(?:[^"]|"")*)"|(?<field>[^"\r\n,]*)))*\r?$
>
> (\r? at the end is to compensate for a bug in the .NET 2.0 implementation 
> of the regex parser.)
>
> It will match a whole line in one match object. You can extract the values 
> with the following code:
>
> Regex rx = new Regex("...", RegexOptions.MultiLine);
>
> Match m = rx.Match(input);
> if (m.Success) // while (m.Success)
> {
> foreach (Capture c in m.Groups["field"].Captures)
> {
> string extracted = c.Value;
> }
> // m = m.NextMatch();
> }
>
> If you have an input string that contains multiple lines you can use the 
> while(m.Success) in combination with m = m.NextMatch(); which I've 
> commented out in the code above to loop through all the results in the 
> input.
>
> I'm not sure exactly what you're doing with the values you've captured, 
> but you might also want to have a look at the OleDB delimited text driver 
> which allows you to load the contents of a comma delimited text file into 
> a dataset, saving you the trouble. Or you could try and use SQL Server 
> Integration Services to load the data directly into a database.
>
> Jesse 
Date:Mon, 30 Jul 2007 05:34:26 -0700   Author:  

Re: Regular Expression problem   

> Thanks for you quick reply.


You're welcome :)


> My project is to read a file of comma-seperated record and process them to 
> create an xml file.
> 
> I am keen to explore this OleDB delimited text thing, can you give me some 
> clue on what search text i should look for in MSDN.


Hey Barry,

have a look at:
http://www.aurigma.com/Support/DocViewer/5/AddingDatafromTextFile.htm.aspx

Jesse


> "Jesse Houwing"  wrote in message 
> news:eTCBFlp0HHA.5740@TK2MSFTNGP04.phx.gbl...
>> * Barry wrote, On 30-7-2007 12:22:
>>> Hello
>>>
>>> Regex regex = new
>>>
>>> Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
>>>
>>>
>>> The above regualr expression return 24 fields instead of 42 for the 
>>> record below, it ignores empty fields like  ,,,"Hello World",,,,,
>>>
>>> 1002,Jack,Wack,23772 East Rd,,Georgestown Twp,VA,48183,United 
>>> States,999-966-9735,Jack,Wack,23772 East Rd,,Georgtown 
>>> Twp,VA,12283,United States,519-966-9735,501,,Y,0,4/27/2007 
>>> 15:04,5/10/2007 12:50,Shipped,,,,Regular Processing,,,,,,,,,,,
>>>
>>> can some Regex expert help
>> [^\",\r\n]+ in your Field definition requires at least one character in to 
>> match (+ means one or more). Change this to * (zero or more) and things 
>> should start working.
>>
>> Regex regex = new
>> Regex("((?<field>[^\",\r\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
>>
>>
>> From the regex it looks like you're trying to read multiple lines with one 
>> Regex.Match call. This could become very expensive.
>>
>> You could also give this a try:
>> (?:(?:^|,)(?:"(?<field>(?:[^"]|"")*)"|(?<field>[^"\r\n,]*)))*\r?$
>>
>> (\r? at the end is to compensate for a bug in the .NET 2.0 implementation 
>> of the regex parser.)
>>
>> It will match a whole line in one match object. You can extract the values 
>> with the following code:
>>
>> Regex rx = new Regex("...", RegexOptions.MultiLine);
>>
>> Match m = rx.Match(input);
>> if (m.Success) // while (m.Success)
>> {
>> foreach (Capture c in m.Groups["field"].Captures)
>> {
>> string extracted = c.Value;
>> }
>> // m = m.NextMatch();
>> }
>>
>> If you have an input string that contains multiple lines you can use the 
>> while(m.Success) in combination with m = m.NextMatch(); which I've 
>> commented out in the code above to loop through all the results in the 
>> input.
>>
>> I'm not sure exactly what you're doing with the values you've captured, 
>> but you might also want to have a look at the OleDB delimited text driver 
>> which allows you to load the contents of a comma delimited text file into 
>> a dataset, saving you the trouble. Or you could try and use SQL Server 
>> Integration Services to load the data directly into a database.
>>
>> Jesse 
> 
> 
Date:Mon, 30 Jul 2007 16:14:31 +0200   Author:  

Re: Regular Expression problem   
Hi Jesse,

I had searched the internet yesterday after posting my previous message and 
found some code snippet.

I must say that you provided me with an Excellent tip, all this time i have 
been process record-by-record and each field, with whole lot of parsing 
problems, all that has been solved with the tip you provided, i have even 
rewritten my code to use OleDb Delimited text.

you deserve Big Thanks you
Barry

"Jesse Houwing"  wrote in message 
news:%238fopQr0HHA.728@TK2MSFTNGP05.phx.gbl...

>> Thanks for you quick reply.
>
> You're welcome :)
>
>> My project is to read a file of comma-seperated record and process them 
>> to create an xml file.
>>
>> I am keen to explore this OleDB delimited text thing, can you give me 
>> some clue on what search text i should look for in MSDN.
>
> Hey Barry,
>
> have a look at:
> http://www.aurigma.com/Support/DocViewer/5/AddingDatafromTextFile.htm.aspx
>
> Jesse
>
>> "Jesse Houwing"  wrote in message 
>> news:eTCBFlp0HHA.5740@TK2MSFTNGP04.phx.gbl...
>>> * Barry wrote, On 30-7-2007 12:22:
>>>> Hello
>>>>
>>>> Regex regex = new
>>>>
>>>> Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
>>>>
>>>>
>>>> The above regualr expression return 24 fields instead of 42 for the 
>>>> record below, it ignores empty fields like  ,,,"Hello World",,,,,
>>>>
>>>> 1002,Jack,Wack,23772 East Rd,,Georgestown Twp,VA,48183,United 
>>>> States,999-966-9735,Jack,Wack,23772 East Rd,,Georgtown 
>>>> Twp,VA,12283,United States,519-966-9735,501,,Y,0,4/27/2007 
>>>> 15:04,5/10/2007 12:50,Shipped,,,,Regular Processing,,,,,,,,,,,
>>>>
>>>> can some Regex expert help
>>> [^\",\r\n]+ in your Field definition requires at least one character in 
>>> to match (+ means one or more). Change this to * (zero or more) and 
>>> things should start working.
>>>
>>> Regex regex = new
>>> Regex("((?<field>[^\",\r\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
>>>
>>>
>>> From the regex it looks like you're trying to read multiple lines with 
>>> one Regex.Match call. This could become very expensive.
>>>
>>> You could also give this a try:
>>> (?:(?:^|,)(?:"(?<field>(?:[^"]|"")*)"|(?<field>[^"\r\n,]*)))*\r?$
>>>
>>> (\r? at the end is to compensate for a bug in the .NET 2.0 
>>> implementation of the regex parser.)
>>>
>>> It will match a whole line in one match object. You can extract the 
>>> values with the following code:
>>>
>>> Regex rx = new Regex("...", RegexOptions.MultiLine);
>>>
>>> Match m = rx.Match(input);
>>> if (m.Success) // while (m.Success)
>>> {
>>> foreach (Capture c in m.Groups["field"].Captures)
>>> {
>>> string extracted = c.Value;
>>> }
>>> // m = m.NextMatch();
>>> }
>>>
>>> If you have an input string that contains multiple lines you can use the 
>>> while(m.Success) in combination with m = m.NextMatch(); which I've 
>>> commented out in the code above to loop through all the results in the 
>>> input.
>>>
>>> I'm not sure exactly what you're doing with the values you've captured, 
>>> but you might also want to have a look at the OleDB delimited text 
>>> driver which allows you to load the contents of a comma delimited text 
>>> file into a dataset, saving you the trouble. Or you could try and use 
>>> SQL Server Integration Services to load the data directly into a 
>>> database.
>>>
>>> Jesse
>> 
Date:Tue, 31 Jul 2007 13:13:26 -0700   Author:  

Re: Regular Expression problem   
* Barry wrote, On 31-7-2007 22:13:

> Hi Jesse,
> 
> I had searched the internet yesterday after posting my previous message and 
> found some code snippet.
> 
> I must say that you provided me with an Excellent tip, all this time i have 
> been process record-by-record and each field, with whole lot of parsing 
> problems, all that has been solved with the tip you provided, i have even 
> rewritten my code to use OleDb Delimited text.
> 
> you deserve Big Thanks you


Thank you :) You're very welcome.

Jesse



> Barry
> 
> "Jesse Houwing"  wrote in message 
> news:%238fopQr0HHA.728@TK2MSFTNGP05.phx.gbl...
>>> Thanks for you quick reply.
>> You're welcome :)
>>
>>> My project is to read a file of comma-seperated record and process them 
>>> to create an xml file.
>>>
>>> I am keen to explore this OleDB delimited text thing, can you give me 
>>> some clue on what search text i should look for in MSDN.
>> Hey Barry,
>>
>> have a look at:
>> http://www.aurigma.com/Support/DocViewer/5/AddingDatafromTextFile.htm.aspx
>>
>> Jesse
>>
>>> "Jesse Houwing"  wrote in message 
>>> news:eTCBFlp0HHA.5740@TK2MSFTNGP04.phx.gbl...
>>>> * Barry wrote, On 30-7-2007 12:22:
>>>>> Hello
>>>>>
>>>>> Regex regex = new
>>>>>
>>>>> Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
>>>>>
>>>>>
>>>>> The above regualr expression return 24 fields instead of 42 for the 
>>>>> record below, it ignores empty fields like  ,,,"Hello World",,,,,
>>>>>
>>>>> 1002,Jack,Wack,23772 East Rd,,Georgestown Twp,VA,48183,United 
>>>>> States,999-966-9735,Jack,Wack,23772 East Rd,,Georgtown 
>>>>> Twp,VA,12283,United States,519-966-9735,501,,Y,0,4/27/2007 
>>>>> 15:04,5/10/2007 12:50,Shipped,,,,Regular Processing,,,,,,,,,,,
>>>>>
>>>>> can some Regex expert help
>>>> [^\",\r\n]+ in your Field definition requires at least one character in 
>>>> to match (+ means one or more). Change this to * (zero or more) and 
>>>> things should start working.
>>>>
>>>> Regex regex = new
>>>> Regex("((?<field>[^\",\r\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
>>>>
>>>>
>>>> From the regex it looks like you're trying to read multiple lines with 
>>>> one Regex.Match call. This could become very expensive.
>>>>
>>>> You could also give this a try:
>>>> (?:(?:^|,)(?:"(?<field>(?:[^"]|"")*)"|(?<field>[^"\r\n,]*)))*\r?$
>>>>
>>>> (\r? at the end is to compensate for a bug in the .NET 2.0 
>>>> implementation of the regex parser.)
>>>>
>>>> It will match a whole line in one match object. You can extract the 
>>>> values with the following code:
>>>>
>>>> Regex rx = new Regex("...", RegexOptions.MultiLine);
>>>>
>>>> Match m = rx.Match(input);
>>>> if (m.Success) // while (m.Success)
>>>> {
>>>> foreach (Capture c in m.Groups["field"].Captures)
>>>> {
>>>> string extracted = c.Value;
>>>> }
>>>> // m = m.NextMatch();
>>>> }
>>>>
>>>> If you have an input string that contains multiple lines you can use the 
>>>> while(m.Success) in combination with m = m.NextMatch(); which I've 
>>>> commented out in the code above to loop through all the results in the 
>>>> input.
>>>>
>>>> I'm not sure exactly what you're doing with the values you've captured, 
>>>> but you might also want to have a look at the OleDB delimited text 
>>>> driver which allows you to load the contents of a comma delimited text 
>>>> file into a dataset, saving you the trouble. Or you could try and use 
>>>> SQL Server Integration Services to load the data directly into a 
>>>> database.
>>>>
>>>> Jesse
> 
Date:Tue, 31 Jul 2007 13:30:34 +0200   Author:  

Google
 
Web dotnetnewsgroup.com


COPYRIGHT ?2005, EUROFRONT WORLDWIDE LTD., ALL RIGHT RESERVE  |   Contact us