Multiple lines in perl regex

arsalancheema · October 14, 2009, 7:57am

hi, i am trying to make a parser for separating required text from a m-script.

In that m-script i have different kind of declarations e.g.

1: par_1.1_anything = min_value;

2: any_parameter = [ 0 2 3 5;
6 9 min_v_1 10;
12 15 30 35];

3: diff.name = [ 5
min
16
max];

4: not_req = [ 11 13 15;
19 24 30;
31 33 39];

So i want to separate all those declaration single line and as well as multi-line which have any character between “=” and the end i.e. “];”

What i am doing is using while(<>) so that it read any input file and then i am using print if $_ =~ /regular expression/ ;
and then save the output in output file.

i am quite successful with single lines but i and stuck with multiple lines.
I want something which when see “[” then i search all lines for any character [a-zA-Z] until it find “];” and then if it find any character in between it print the whole declaration.

Please help me in this matter.

Thanks a lot.

wafonso · October 19, 2009, 11:02pm

The option “s” to a regexp will tell the parser to match multi-line inputs. So,

if($_ =~ /regular expression/s)

should do the trick for you.

disgracian · October 20, 2009, 9:50am

The ‘s’ option is actually single-line; use ‘m’ for multi-line.

‘s’ will cause the ‘.’ character to match newlines in addition to everything else. ‘m’ will cause the ^ and $ anchors to match the beginning and end of lines instead of the entire string.

‘s’ and ‘m’ can be used together in some circumstances too.

Cheers,
D.

arsalancheema · October 20, 2009, 2:43pm

Hi again,

I need little more help please.

How can separate or split a string e.g.,

rpm_max3=[0,20,x_min;3,25,min/2;6,35,max];

i want to split it and then and put 2 counters one on comma “,” and 2nd on semicolon “;”. i want output like

rpm_max3 (0,0) = (0);
rpm_max3 (0,1) = (20);
rpm_max3 (0,2) = (x_min);
rpm_max3 (1,0) = (3);
rpm_max3 (1,1) = (25);
rpm_max3 (1,2) = (min/2);
rpm_max3 (2,0) = (6);
rpm_max3 (2,1) = (35);
rpm_max3 (2,2) = (max);

it is $1 (counter1 which count semicolon, counter 2 which count comma) = ( content that will be $ something)

Thanks a lot for your help.

Keep rocking,

Arsalan

disgracian · October 21, 2009, 12:44am

Split on the outermost grouping, in your case the semi-colon, and work your way inwards. You can store the results in a multi-dimensional array or some other structure of your preference.

Cheers,
D.

Bompa · October 21, 2009, 9:08am

The ‘s’ flag means to treat the input string as a single line, but only regarding the . wildcard character.

The ‘m’ flag operates on the ^ and $ characters.

However, what if the regex does not use a . or a ^ or a $?

One other option would be to set the newline separator to undef.

$/=undef
REGEX HERE
$/="
"; # restore newline separator

Bompa

disgracian · October 22, 2009, 12:43am

Apologies for sounding blunt, Bompa, but what exactly was the point of your post? It looks like you were correcting me, but saying exactly the same thing.

Cheers,
D.

Bompa · October 22, 2009, 3:20am

Hi Disg,

It’s ok to be blunt. I need to walk carefully when correcting other’s code, usually they know more than me.

My point was to restate what the s and m flags do, then pose a question.

I don’t know what is in the regex for this thread since we are just using
=~ /regular expression/ in examples, but since those two flags operate ONLY
on the . ^ and $, I am asking what if none of those special characters are
within the regex?

What if we want to match “your account is active”, but that phrase is split
over several lines in the input string?

your account<br />
is active

In my understanding, the s and m flags do not help here, am I off?

Maybe I’m just having a senior moment.

Bompa

disgracian · October 22, 2009, 10:13am

If the phrase we’re looking to match is separated by newlines then that’s not too bad because it’s all just whitespace. You could actually just specify /your\s+account\s+is\s+active/. No need for any switches at all, because \s matches any whitespace. You could even use \W (any non-alphanumeric character).

Cheers,
D.

sureshvisu6 · October 31, 2009, 12:44pm

The three:

m// match (m not required)
tr// translate
s// substitutes

“Meta characters” and their meaning

\ escapes any character in a regular expression.
^ match at the begining
$ match at the end of the string (line if /m)
| logical or

Quantifiers

. Single character
? Match 0 or 1 times.

match 0 or more times.

match 1 or more times.

Operators

c Do not reset search position on a failed match when /g is in effect.
g Match globally, i.e., find all occurrences.
i Do case-insensitive pattern matching.
m Treat string as multiple lines.
o Compile pattern only once.
s Treat string as single line.
x Use extended regular expressions.

Basic examples of matching
if ($text ~= /string/){#The varible $text contains the word ‘string’.}
Would execute if the varible $text contained

  'This is a string of text'
  'The phaser left a blastring'

but not

  'String is an important part...'

For this last example to match we add an i (ignore case).

if ($text ~= /string/i){#The varible $text contains the word ‘string’.}

You can also test to see if a pattern doesn’t match a string with
if ($text !~ /string/){#Code for negate goes here;}