Why/how is an additional variable needed in matching repeated arbitary character with capture groups?perl6 grammar , not sure about some syntax in an example
Why swap space doesn't get filesystem check at boot time?
Is using Legacy mode is a bad thing to do?
How can caller ID be faked?
Fantasy game inventory — Ch. 5 Automate the Boring Stuff
I have found ports on my Samsung smart tv running a display service. What can I do with it?
Credit card validation in C
In windows systems, is renaming files functionally similar to deleting them?
Is the infant mortality rate among African-American babies in Youngstown, Ohio greater than that of babies in Iran?
What is this plant I saw for sale at a Romanian farmer's market?
How to sort human readable size
Having some issue with notation in a Hilbert space
Print the new site header
Basic power tool set for Home repair and simple projects
You may find me... puzzling
Operator currying: how to convert f[a,b][c,d] to a+c,b+d?
Fill the maze with a wall-following Snake until it gets stuck
Justifying Affordable Bespoke Spaceships
How can I detect if I'm in a subshell?
What is the context for Napoleon's quote "[the Austrians] did not know the value of five minutes"?
Does knowing the surface area of all faces uniquely determine a tetrahedron?
Build a scale without computer
I'm yearning in grey
What kind of chart is this?
How to avoid offending original culture when making conculture inspired from original
Why/how is an additional variable needed in matching repeated arbitary character with capture groups?
perl6 grammar , not sure about some syntax in an example
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.
After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':
#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';
#Output: aaaaa
To aid in illustrating my question only, a similar regex in perl5:
#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';
Could someone enlighten me on the need/benefit of 'saving' $0
into $c
and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?
Thanks in advance.
regex perl6
add a comment |
I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.
After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':
#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';
#Output: aaaaa
To aid in illustrating my question only, a similar regex in perl5:
#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';
Could someone enlighten me on the need/benefit of 'saving' $0
into $c
and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?
Thanks in advance.
regex perl6
1
Shortest answer/((.)$0**2..*)/
– Brad Gilbert
May 31 at 14:26
Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`
– Brad Gilbert
May 31 at 14:36
Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.
– drclaw
Jun 1 at 3:56
add a comment |
I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.
After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':
#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';
#Output: aaaaa
To aid in illustrating my question only, a similar regex in perl5:
#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';
Could someone enlighten me on the need/benefit of 'saving' $0
into $c
and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?
Thanks in advance.
regex perl6
I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.
After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':
#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';
#Output: aaaaa
To aid in illustrating my question only, a similar regex in perl5:
#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';
Could someone enlighten me on the need/benefit of 'saving' $0
into $c
and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?
Thanks in advance.
regex perl6
regex perl6
asked May 31 at 11:03
drclawdrclaw
1,301521
1,301521
1
Shortest answer/((.)$0**2..*)/
– Brad Gilbert
May 31 at 14:26
Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`
– Brad Gilbert
May 31 at 14:36
Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.
– drclaw
Jun 1 at 3:56
add a comment |
1
Shortest answer/((.)$0**2..*)/
– Brad Gilbert
May 31 at 14:26
Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`
– Brad Gilbert
May 31 at 14:36
Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.
– drclaw
Jun 1 at 3:56
1
1
Shortest answer
/((.)$0**2..*)/
– Brad Gilbert
May 31 at 14:26
Shortest answer
/((.)$0**2..*)/
– Brad Gilbert
May 31 at 14:26
Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`
– Brad Gilbert
May 31 at 14:36
Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`
– Brad Gilbert
May 31 at 14:36
Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.
– drclaw
Jun 1 at 3:56
Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.
– drclaw
Jun 1 at 3:56
add a comment |
3 Answers
3
active
oldest
votes
Omitting the capture around the $0
works:
$_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa
(And then you can also omit the .)
But perhaps you wrote the capture around the $0
for a good reason.
Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:
$_="bbaaaaawer";
/ (.) $0**2..* /;
print $/.chars div $0.chars; # 5
Job done, and without unnecessary complications.
But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.
First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0
refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.
What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the ..
parent directory and its parents.
In the following, it's easy to display 'bc'
or 'c'
with a code block inserted at the second level of parens:
$_="abc";
print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc
The $/
in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'
.
The $0
at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'
.
But there's no built in way to refer to the captured 'a'
, or the full 'abc'
capture, from that code block.
Hence you have to do something like what you've done.
forces "publication" of match results thus far
The is necessary to force the
:my $c=$0;
to update after the first 'a'
is matched. Otherwise it would be stuck on the 'b'
.
Please read "Publication" of match variables by Rakudo.
add a comment |
Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match
objects. Each capture - named or positional - is either a Match
object or, if quantified, an array of Match
objects.
This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match
object, with its own set of positional and named captures. For example, if we do:
say "abab" ~~ /((a)(b))+/
Then the result is:
「abab」
0 => 「ab」
0 => 「a」
1 => 「b」
0 => 「ab」
0 => 「a」
1 => 「b」
And we can then index:
say $0; # The array of the top-level capture, which was quantified
say $0[1]; # The second Match
say $0[1][0]; # The first Match within that Match object (the (a))
It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.
Does using a ':g' modifier potentially generates an array ofMatch
objects similar to a quantifier also?
– drclaw
Jun 1 at 3:15
Yes; actually aList
since it makes no sense to mutate it.
– Jonathan Worthington
Jun 1 at 11:19
add a comment |
The reason you have to store the capture into something other than $0
is that every capturing ()
creates a new set of numbered captures.
So the $0
inside of ($0)
can never refer to anything, because you didn't set $0
inside of the ()
.
(The named captures $<foo>
are also affected by this.)
The following has 3 separate $0
“variables”, and one $1
“variable”:
'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /
'aabbaabb' ~~ /
^
# $0 = 'aabb'
(
# $0 = 'a'
(.) $0
# $1 = 'bb'
(
# $0 = 'b'
(.) $0
)
)
$0
$
/
「aabbaabb」
0 => 「aabb」
0 => 「a」
1 => 「bb」
0 => 「b」
Basically the ()
in the regex DSL act a bit like in normal Perl6.
A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.
(Pay attention to the 3 lines with my $/ = [];
)
(Also the / ^ /
style comments refer to the regex code for ^
and such above)
given 'aabbaabb'
my $/ = []; # give assignable storage for $0, $1 etc.
my $pos = 0; # position counter
my $init = $pos; # initial position
# / ^ /
fail unless $pos == 0;
# / ( /
$0 = do
my $/ = [];
my $init = $pos;
# / (.) $0 /
$0 = .substr($pos,1); # / (.) /
$pos += $0.chars;
fail unless .substr($pos,$0.chars) eq $0; # / $0 /
$pos += $0.chars;
# / ( /
$1 = do
my $/ = [];
my $init = $pos;
# / (.) $0 /
$0 = .substr($pos,1); # / (.) /
$pos += $0.chars;
fail unless .substr($pos,$0.chars) eq $0; # / $0 /
$pos += $0.chars;
# / ) /
# the returned value (becomes $1 in outer scope)
.substr($init, $pos - $init)
# / ) /
# the returned value (becomes $0 in outer scope)
.substr($init, $pos - $init)
# / $0 /
fail unless .substr($pos,$0.chars) eq $0;
$pos += $0.chars;
# / $ /
fail unless $pos = .chars;
# the returned value
.substr($init, $pos - $init)
TLDR;
Just remove the ()
surrounding ($c)
/ ($0)
.
(Assuming you didn't need the capture for something else.)
/((.) $0**2..*)/
perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';
Note that currently/((.) ("$0")**2..*)/
works, but I suspect that may be a bug. (The$0
in("$0")
refers to whatever was in$0
at the time of.) At the very least it is surprising/confusing behavior.
– Brad Gilbert
May 31 at 14:42
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56393888%2fwhy-how-is-an-additional-variable-needed-in-matching-repeated-arbitary-character%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Omitting the capture around the $0
works:
$_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa
(And then you can also omit the .)
But perhaps you wrote the capture around the $0
for a good reason.
Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:
$_="bbaaaaawer";
/ (.) $0**2..* /;
print $/.chars div $0.chars; # 5
Job done, and without unnecessary complications.
But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.
First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0
refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.
What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the ..
parent directory and its parents.
In the following, it's easy to display 'bc'
or 'c'
with a code block inserted at the second level of parens:
$_="abc";
print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc
The $/
in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'
.
The $0
at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'
.
But there's no built in way to refer to the captured 'a'
, or the full 'abc'
capture, from that code block.
Hence you have to do something like what you've done.
forces "publication" of match results thus far
The is necessary to force the
:my $c=$0;
to update after the first 'a'
is matched. Otherwise it would be stuck on the 'b'
.
Please read "Publication" of match variables by Rakudo.
add a comment |
Omitting the capture around the $0
works:
$_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa
(And then you can also omit the .)
But perhaps you wrote the capture around the $0
for a good reason.
Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:
$_="bbaaaaawer";
/ (.) $0**2..* /;
print $/.chars div $0.chars; # 5
Job done, and without unnecessary complications.
But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.
First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0
refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.
What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the ..
parent directory and its parents.
In the following, it's easy to display 'bc'
or 'c'
with a code block inserted at the second level of parens:
$_="abc";
print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc
The $/
in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'
.
The $0
at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'
.
But there's no built in way to refer to the captured 'a'
, or the full 'abc'
capture, from that code block.
Hence you have to do something like what you've done.
forces "publication" of match results thus far
The is necessary to force the
:my $c=$0;
to update after the first 'a'
is matched. Otherwise it would be stuck on the 'b'
.
Please read "Publication" of match variables by Rakudo.
add a comment |
Omitting the capture around the $0
works:
$_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa
(And then you can also omit the .)
But perhaps you wrote the capture around the $0
for a good reason.
Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:
$_="bbaaaaawer";
/ (.) $0**2..* /;
print $/.chars div $0.chars; # 5
Job done, and without unnecessary complications.
But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.
First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0
refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.
What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the ..
parent directory and its parents.
In the following, it's easy to display 'bc'
or 'c'
with a code block inserted at the second level of parens:
$_="abc";
print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc
The $/
in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'
.
The $0
at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'
.
But there's no built in way to refer to the captured 'a'
, or the full 'abc'
capture, from that code block.
Hence you have to do something like what you've done.
forces "publication" of match results thus far
The is necessary to force the
:my $c=$0;
to update after the first 'a'
is matched. Otherwise it would be stuck on the 'b'
.
Please read "Publication" of match variables by Rakudo.
Omitting the capture around the $0
works:
$_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa
(And then you can also omit the .)
But perhaps you wrote the capture around the $0
for a good reason.
Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:
$_="bbaaaaawer";
/ (.) $0**2..* /;
print $/.chars div $0.chars; # 5
Job done, and without unnecessary complications.
But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.
First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0
refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.
What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the ..
parent directory and its parents.
In the following, it's easy to display 'bc'
or 'c'
with a code block inserted at the second level of parens:
$_="abc";
print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc
The $/
in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'
.
The $0
at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'
.
But there's no built in way to refer to the captured 'a'
, or the full 'abc'
capture, from that code block.
Hence you have to do something like what you've done.
forces "publication" of match results thus far
The is necessary to force the
:my $c=$0;
to update after the first 'a'
is matched. Otherwise it would be stuck on the 'b'
.
Please read "Publication" of match variables by Rakudo.
edited May 31 at 21:07
answered May 31 at 14:49
raiphraiph
13.8k22651
13.8k22651
add a comment |
add a comment |
Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match
objects. Each capture - named or positional - is either a Match
object or, if quantified, an array of Match
objects.
This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match
object, with its own set of positional and named captures. For example, if we do:
say "abab" ~~ /((a)(b))+/
Then the result is:
「abab」
0 => 「ab」
0 => 「a」
1 => 「b」
0 => 「ab」
0 => 「a」
1 => 「b」
And we can then index:
say $0; # The array of the top-level capture, which was quantified
say $0[1]; # The second Match
say $0[1][0]; # The first Match within that Match object (the (a))
It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.
Does using a ':g' modifier potentially generates an array ofMatch
objects similar to a quantifier also?
– drclaw
Jun 1 at 3:15
Yes; actually aList
since it makes no sense to mutate it.
– Jonathan Worthington
Jun 1 at 11:19
add a comment |
Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match
objects. Each capture - named or positional - is either a Match
object or, if quantified, an array of Match
objects.
This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match
object, with its own set of positional and named captures. For example, if we do:
say "abab" ~~ /((a)(b))+/
Then the result is:
「abab」
0 => 「ab」
0 => 「a」
1 => 「b」
0 => 「ab」
0 => 「a」
1 => 「b」
And we can then index:
say $0; # The array of the top-level capture, which was quantified
say $0[1]; # The second Match
say $0[1][0]; # The first Match within that Match object (the (a))
It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.
Does using a ':g' modifier potentially generates an array ofMatch
objects similar to a quantifier also?
– drclaw
Jun 1 at 3:15
Yes; actually aList
since it makes no sense to mutate it.
– Jonathan Worthington
Jun 1 at 11:19
add a comment |
Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match
objects. Each capture - named or positional - is either a Match
object or, if quantified, an array of Match
objects.
This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match
object, with its own set of positional and named captures. For example, if we do:
say "abab" ~~ /((a)(b))+/
Then the result is:
「abab」
0 => 「ab」
0 => 「a」
1 => 「b」
0 => 「ab」
0 => 「a」
1 => 「b」
And we can then index:
say $0; # The array of the top-level capture, which was quantified
say $0[1]; # The second Match
say $0[1][0]; # The first Match within that Match object (the (a))
It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.
Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match
objects. Each capture - named or positional - is either a Match
object or, if quantified, an array of Match
objects.
This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match
object, with its own set of positional and named captures. For example, if we do:
say "abab" ~~ /((a)(b))+/
Then the result is:
「abab」
0 => 「ab」
0 => 「a」
1 => 「b」
0 => 「ab」
0 => 「a」
1 => 「b」
And we can then index:
say $0; # The array of the top-level capture, which was quantified
say $0[1]; # The second Match
say $0[1][0]; # The first Match within that Match object (the (a))
It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.
answered May 31 at 13:26
Jonathan WorthingtonJonathan Worthington
11.4k13461
11.4k13461
Does using a ':g' modifier potentially generates an array ofMatch
objects similar to a quantifier also?
– drclaw
Jun 1 at 3:15
Yes; actually aList
since it makes no sense to mutate it.
– Jonathan Worthington
Jun 1 at 11:19
add a comment |
Does using a ':g' modifier potentially generates an array ofMatch
objects similar to a quantifier also?
– drclaw
Jun 1 at 3:15
Yes; actually aList
since it makes no sense to mutate it.
– Jonathan Worthington
Jun 1 at 11:19
Does using a ':g' modifier potentially generates an array of
Match
objects similar to a quantifier also?– drclaw
Jun 1 at 3:15
Does using a ':g' modifier potentially generates an array of
Match
objects similar to a quantifier also?– drclaw
Jun 1 at 3:15
Yes; actually a
List
since it makes no sense to mutate it.– Jonathan Worthington
Jun 1 at 11:19
Yes; actually a
List
since it makes no sense to mutate it.– Jonathan Worthington
Jun 1 at 11:19
add a comment |
The reason you have to store the capture into something other than $0
is that every capturing ()
creates a new set of numbered captures.
So the $0
inside of ($0)
can never refer to anything, because you didn't set $0
inside of the ()
.
(The named captures $<foo>
are also affected by this.)
The following has 3 separate $0
“variables”, and one $1
“variable”:
'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /
'aabbaabb' ~~ /
^
# $0 = 'aabb'
(
# $0 = 'a'
(.) $0
# $1 = 'bb'
(
# $0 = 'b'
(.) $0
)
)
$0
$
/
「aabbaabb」
0 => 「aabb」
0 => 「a」
1 => 「bb」
0 => 「b」
Basically the ()
in the regex DSL act a bit like in normal Perl6.
A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.
(Pay attention to the 3 lines with my $/ = [];
)
(Also the / ^ /
style comments refer to the regex code for ^
and such above)
given 'aabbaabb'
my $/ = []; # give assignable storage for $0, $1 etc.
my $pos = 0; # position counter
my $init = $pos; # initial position
# / ^ /
fail unless $pos == 0;
# / ( /
$0 = do
my $/ = [];
my $init = $pos;
# / (.) $0 /
$0 = .substr($pos,1); # / (.) /
$pos += $0.chars;
fail unless .substr($pos,$0.chars) eq $0; # / $0 /
$pos += $0.chars;
# / ( /
$1 = do
my $/ = [];
my $init = $pos;
# / (.) $0 /
$0 = .substr($pos,1); # / (.) /
$pos += $0.chars;
fail unless .substr($pos,$0.chars) eq $0; # / $0 /
$pos += $0.chars;
# / ) /
# the returned value (becomes $1 in outer scope)
.substr($init, $pos - $init)
# / ) /
# the returned value (becomes $0 in outer scope)
.substr($init, $pos - $init)
# / $0 /
fail unless .substr($pos,$0.chars) eq $0;
$pos += $0.chars;
# / $ /
fail unless $pos = .chars;
# the returned value
.substr($init, $pos - $init)
TLDR;
Just remove the ()
surrounding ($c)
/ ($0)
.
(Assuming you didn't need the capture for something else.)
/((.) $0**2..*)/
perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';
Note that currently/((.) ("$0")**2..*)/
works, but I suspect that may be a bug. (The$0
in("$0")
refers to whatever was in$0
at the time of.) At the very least it is surprising/confusing behavior.
– Brad Gilbert
May 31 at 14:42
add a comment |
The reason you have to store the capture into something other than $0
is that every capturing ()
creates a new set of numbered captures.
So the $0
inside of ($0)
can never refer to anything, because you didn't set $0
inside of the ()
.
(The named captures $<foo>
are also affected by this.)
The following has 3 separate $0
“variables”, and one $1
“variable”:
'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /
'aabbaabb' ~~ /
^
# $0 = 'aabb'
(
# $0 = 'a'
(.) $0
# $1 = 'bb'
(
# $0 = 'b'
(.) $0
)
)
$0
$
/
「aabbaabb」
0 => 「aabb」
0 => 「a」
1 => 「bb」
0 => 「b」
Basically the ()
in the regex DSL act a bit like in normal Perl6.
A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.
(Pay attention to the 3 lines with my $/ = [];
)
(Also the / ^ /
style comments refer to the regex code for ^
and such above)
given 'aabbaabb'
my $/ = []; # give assignable storage for $0, $1 etc.
my $pos = 0; # position counter
my $init = $pos; # initial position
# / ^ /
fail unless $pos == 0;
# / ( /
$0 = do
my $/ = [];
my $init = $pos;
# / (.) $0 /
$0 = .substr($pos,1); # / (.) /
$pos += $0.chars;
fail unless .substr($pos,$0.chars) eq $0; # / $0 /
$pos += $0.chars;
# / ( /
$1 = do
my $/ = [];
my $init = $pos;
# / (.) $0 /
$0 = .substr($pos,1); # / (.) /
$pos += $0.chars;
fail unless .substr($pos,$0.chars) eq $0; # / $0 /
$pos += $0.chars;
# / ) /
# the returned value (becomes $1 in outer scope)
.substr($init, $pos - $init)
# / ) /
# the returned value (becomes $0 in outer scope)
.substr($init, $pos - $init)
# / $0 /
fail unless .substr($pos,$0.chars) eq $0;
$pos += $0.chars;
# / $ /
fail unless $pos = .chars;
# the returned value
.substr($init, $pos - $init)
TLDR;
Just remove the ()
surrounding ($c)
/ ($0)
.
(Assuming you didn't need the capture for something else.)
/((.) $0**2..*)/
perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';
Note that currently/((.) ("$0")**2..*)/
works, but I suspect that may be a bug. (The$0
in("$0")
refers to whatever was in$0
at the time of.) At the very least it is surprising/confusing behavior.
– Brad Gilbert
May 31 at 14:42
add a comment |
The reason you have to store the capture into something other than $0
is that every capturing ()
creates a new set of numbered captures.
So the $0
inside of ($0)
can never refer to anything, because you didn't set $0
inside of the ()
.
(The named captures $<foo>
are also affected by this.)
The following has 3 separate $0
“variables”, and one $1
“variable”:
'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /
'aabbaabb' ~~ /
^
# $0 = 'aabb'
(
# $0 = 'a'
(.) $0
# $1 = 'bb'
(
# $0 = 'b'
(.) $0
)
)
$0
$
/
「aabbaabb」
0 => 「aabb」
0 => 「a」
1 => 「bb」
0 => 「b」
Basically the ()
in the regex DSL act a bit like in normal Perl6.
A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.
(Pay attention to the 3 lines with my $/ = [];
)
(Also the / ^ /
style comments refer to the regex code for ^
and such above)
given 'aabbaabb'
my $/ = []; # give assignable storage for $0, $1 etc.
my $pos = 0; # position counter
my $init = $pos; # initial position
# / ^ /
fail unless $pos == 0;
# / ( /
$0 = do
my $/ = [];
my $init = $pos;
# / (.) $0 /
$0 = .substr($pos,1); # / (.) /
$pos += $0.chars;
fail unless .substr($pos,$0.chars) eq $0; # / $0 /
$pos += $0.chars;
# / ( /
$1 = do
my $/ = [];
my $init = $pos;
# / (.) $0 /
$0 = .substr($pos,1); # / (.) /
$pos += $0.chars;
fail unless .substr($pos,$0.chars) eq $0; # / $0 /
$pos += $0.chars;
# / ) /
# the returned value (becomes $1 in outer scope)
.substr($init, $pos - $init)
# / ) /
# the returned value (becomes $0 in outer scope)
.substr($init, $pos - $init)
# / $0 /
fail unless .substr($pos,$0.chars) eq $0;
$pos += $0.chars;
# / $ /
fail unless $pos = .chars;
# the returned value
.substr($init, $pos - $init)
TLDR;
Just remove the ()
surrounding ($c)
/ ($0)
.
(Assuming you didn't need the capture for something else.)
/((.) $0**2..*)/
perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';
The reason you have to store the capture into something other than $0
is that every capturing ()
creates a new set of numbered captures.
So the $0
inside of ($0)
can never refer to anything, because you didn't set $0
inside of the ()
.
(The named captures $<foo>
are also affected by this.)
The following has 3 separate $0
“variables”, and one $1
“variable”:
'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /
'aabbaabb' ~~ /
^
# $0 = 'aabb'
(
# $0 = 'a'
(.) $0
# $1 = 'bb'
(
# $0 = 'b'
(.) $0
)
)
$0
$
/
「aabbaabb」
0 => 「aabb」
0 => 「a」
1 => 「bb」
0 => 「b」
Basically the ()
in the regex DSL act a bit like in normal Perl6.
A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.
(Pay attention to the 3 lines with my $/ = [];
)
(Also the / ^ /
style comments refer to the regex code for ^
and such above)
given 'aabbaabb'
my $/ = []; # give assignable storage for $0, $1 etc.
my $pos = 0; # position counter
my $init = $pos; # initial position
# / ^ /
fail unless $pos == 0;
# / ( /
$0 = do
my $/ = [];
my $init = $pos;
# / (.) $0 /
$0 = .substr($pos,1); # / (.) /
$pos += $0.chars;
fail unless .substr($pos,$0.chars) eq $0; # / $0 /
$pos += $0.chars;
# / ( /
$1 = do
my $/ = [];
my $init = $pos;
# / (.) $0 /
$0 = .substr($pos,1); # / (.) /
$pos += $0.chars;
fail unless .substr($pos,$0.chars) eq $0; # / $0 /
$pos += $0.chars;
# / ) /
# the returned value (becomes $1 in outer scope)
.substr($init, $pos - $init)
# / ) /
# the returned value (becomes $0 in outer scope)
.substr($init, $pos - $init)
# / $0 /
fail unless .substr($pos,$0.chars) eq $0;
$pos += $0.chars;
# / $ /
fail unless $pos = .chars;
# the returned value
.substr($init, $pos - $init)
TLDR;
Just remove the ()
surrounding ($c)
/ ($0)
.
(Assuming you didn't need the capture for something else.)
/((.) $0**2..*)/
perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';
answered May 31 at 14:21
Brad GilbertBrad Gilbert
26.6k866112
26.6k866112
Note that currently/((.) ("$0")**2..*)/
works, but I suspect that may be a bug. (The$0
in("$0")
refers to whatever was in$0
at the time of.) At the very least it is surprising/confusing behavior.
– Brad Gilbert
May 31 at 14:42
add a comment |
Note that currently/((.) ("$0")**2..*)/
works, but I suspect that may be a bug. (The$0
in("$0")
refers to whatever was in$0
at the time of.) At the very least it is surprising/confusing behavior.
– Brad Gilbert
May 31 at 14:42
Note that currently
/((.) ("$0")**2..*)/
works, but I suspect that may be a bug. (The $0
in ("$0")
refers to whatever was in $0
at the time of
.) At the very least it is surprising/confusing behavior.– Brad Gilbert
May 31 at 14:42
Note that currently
/((.) ("$0")**2..*)/
works, but I suspect that may be a bug. (The $0
in ("$0")
refers to whatever was in $0
at the time of
.) At the very least it is surprising/confusing behavior.– Brad Gilbert
May 31 at 14:42
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56393888%2fwhy-how-is-an-additional-variable-needed-in-matching-repeated-arbitary-character%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Shortest answer
/((.)$0**2..*)/
– Brad Gilbert
May 31 at 14:26
Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`
– Brad Gilbert
May 31 at 14:36
Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.
– drclaw
Jun 1 at 3:56