Why/how is an additional variable needed in matching repeated arbitary character with capture groups?perl6 grammar , not sure about some syntax in an example

Why swap space doesn't get filesystem check at boot time?

Is using Legacy mode is a bad thing to do?

How can caller ID be faked?

Fantasy game inventory — Ch. 5 Automate the Boring Stuff

I have found ports on my Samsung smart tv running a display service. What can I do with it?

Credit card validation in C

In windows systems, is renaming files functionally similar to deleting them?

Is the infant mortality rate among African-American babies in Youngstown, Ohio greater than that of babies in Iran?

What is this plant I saw for sale at a Romanian farmer's market?

How to sort human readable size

Having some issue with notation in a Hilbert space

Print the new site header

Basic power tool set for Home repair and simple projects

You may find me... puzzling

Operator currying: how to convert f[a,b][c,d] to a+c,b+d?

Fill the maze with a wall-following Snake until it gets stuck

Justifying Affordable Bespoke Spaceships

How can I detect if I'm in a subshell?

What is the context for Napoleon's quote "[the Austrians] did not know the value of five minutes"?

Does knowing the surface area of all faces uniquely determine a tetrahedron?

Build a scale without computer

I'm yearning in grey

What kind of chart is this?

How to avoid offending original culture when making conculture inspired from original



Why/how is an additional variable needed in matching repeated arbitary character with capture groups?


perl6 grammar , not sure about some syntax in an example






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








10















I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.



After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':



#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';

#Output: aaaaa


To aid in illustrating my question only, a similar regex in perl5:



#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';


Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?



Thanks in advance.










share|improve this question

















  • 1





    Shortest answer /((.)$0**2..*)/

    – Brad Gilbert
    May 31 at 14:26












  • Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

    – Brad Gilbert
    May 31 at 14:36











  • Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

    – drclaw
    Jun 1 at 3:56


















10















I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.



After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':



#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';

#Output: aaaaa


To aid in illustrating my question only, a similar regex in perl5:



#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';


Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?



Thanks in advance.










share|improve this question

















  • 1





    Shortest answer /((.)$0**2..*)/

    – Brad Gilbert
    May 31 at 14:26












  • Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

    – Brad Gilbert
    May 31 at 14:36











  • Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

    – drclaw
    Jun 1 at 3:56














10












10








10








I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.



After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':



#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';

#Output: aaaaa


To aid in illustrating my question only, a similar regex in perl5:



#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';


Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?



Thanks in advance.










share|improve this question














I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.



After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':



#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';

#Output: aaaaa


To aid in illustrating my question only, a similar regex in perl5:



#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';


Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?



Thanks in advance.







regex perl6






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked May 31 at 11:03









drclawdrclaw

1,301521




1,301521







  • 1





    Shortest answer /((.)$0**2..*)/

    – Brad Gilbert
    May 31 at 14:26












  • Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

    – Brad Gilbert
    May 31 at 14:36











  • Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

    – drclaw
    Jun 1 at 3:56













  • 1





    Shortest answer /((.)$0**2..*)/

    – Brad Gilbert
    May 31 at 14:26












  • Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

    – Brad Gilbert
    May 31 at 14:36











  • Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

    – drclaw
    Jun 1 at 3:56








1




1





Shortest answer /((.)$0**2..*)/

– Brad Gilbert
May 31 at 14:26






Shortest answer /((.)$0**2..*)/

– Brad Gilbert
May 31 at 14:26














Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

– Brad Gilbert
May 31 at 14:36





Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

– Brad Gilbert
May 31 at 14:36













Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

– drclaw
Jun 1 at 3:56






Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

– drclaw
Jun 1 at 3:56













3 Answers
3






active

oldest

votes


















6














Omitting the capture around the $0 works:



$_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


(And then you can also omit the .)



But perhaps you wrote the capture around the $0 for a good reason.



Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



$_="bbaaaaawer";
/ (.) $0**2..* /;
print $/.chars div $0.chars; # 5


Job done, and without unnecessary complications.



But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



$_="abc";
print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



Hence you have to do something like what you've done.




forces "publication" of match results thus far



The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



Please read "Publication" of match variables by Rakudo.






share|improve this answer
































    10














    Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



    This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



    say "abab" ~~ /((a)(b))+/


    Then the result is:



    「abab」
    0 => 「ab」
    0 => 「a」
    1 => 「b」
    0 => 「ab」
    0 => 「a」
    1 => 「b」


    And we can then index:



    say $0; # The array of the top-level capture, which was quantified
    say $0[1]; # The second Match
    say $0[1][0]; # The first Match within that Match object (the (a))


    It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.






    share|improve this answer























    • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

      – drclaw
      Jun 1 at 3:15











    • Yes; actually a List since it makes no sense to mutate it.

      – Jonathan Worthington
      Jun 1 at 11:19


















    4
















    The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



    So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



    (The named captures $<foo> are also affected by this.)




    The following has 3 separate $0 “variables”, and one $1 “variable”:



    'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

    'aabbaabb' ~~ /
    ^

    # $0 = 'aabb'
    (

    # $0 = 'a'
    (.) $0

    # $1 = 'bb'
    (

    # $0 = 'b'
    (.) $0
    )
    )

    $0

    $
    /


    「aabbaabb」
    0 => 「aabb」
    0 => 「a」
    1 => 「bb」
    0 => 「b」


    Basically the () in the regex DSL act a bit like in normal Perl6.



    A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

    (Pay attention to the 3 lines with my $/ = [];)

    (Also the / ^ / style comments refer to the regex code for ^ and such above)



    given 'aabbaabb' 
    my $/ = []; # give assignable storage for $0, $1 etc.
    my $pos = 0; # position counter
    my $init = $pos; # initial position

    # / ^ /
    fail unless $pos == 0;

    # / ( /
    $0 = do
    my $/ = [];
    my $init = $pos;

    # / (.) $0 /
    $0 = .substr($pos,1); # / (.) /
    $pos += $0.chars;
    fail unless .substr($pos,$0.chars) eq $0; # / $0 /
    $pos += $0.chars;

    # / ( /
    $1 = do
    my $/ = [];
    my $init = $pos;

    # / (.) $0 /
    $0 = .substr($pos,1); # / (.) /
    $pos += $0.chars;
    fail unless .substr($pos,$0.chars) eq $0; # / $0 /
    $pos += $0.chars;

    # / ) /
    # the returned value (becomes $1 in outer scope)
    .substr($init, $pos - $init)


    # / ) /
    # the returned value (becomes $0 in outer scope)
    .substr($init, $pos - $init)


    # / $0 /
    fail unless .substr($pos,$0.chars) eq $0;
    $pos += $0.chars;

    # / $ /
    fail unless $pos = .chars;

    # the returned value
    .substr($init, $pos - $init)




    TLDR;



    Just remove the () surrounding ($c) / ($0).

    (Assuming you didn't need the capture for something else.)



    /((.) $0**2..*)/


    perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';





    share|improve this answer























    • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

      – Brad Gilbert
      May 31 at 14:42











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56393888%2fwhy-how-is-an-additional-variable-needed-in-matching-repeated-arbitary-character%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    6














    Omitting the capture around the $0 works:



    $_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


    (And then you can also omit the .)



    But perhaps you wrote the capture around the $0 for a good reason.



    Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



    $_="bbaaaaawer";
    / (.) $0**2..* /;
    print $/.chars div $0.chars; # 5


    Job done, and without unnecessary complications.



    But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



    First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



    What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



    In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



    $_="abc";
    print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


    The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



    The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



    But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



    Hence you have to do something like what you've done.




    forces "publication" of match results thus far



    The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



    Please read "Publication" of match variables by Rakudo.






    share|improve this answer





























      6














      Omitting the capture around the $0 works:



      $_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


      (And then you can also omit the .)



      But perhaps you wrote the capture around the $0 for a good reason.



      Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



      $_="bbaaaaawer";
      / (.) $0**2..* /;
      print $/.chars div $0.chars; # 5


      Job done, and without unnecessary complications.



      But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



      First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



      What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



      In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



      $_="abc";
      print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


      The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



      The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



      But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



      Hence you have to do something like what you've done.




      forces "publication" of match results thus far



      The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



      Please read "Publication" of match variables by Rakudo.






      share|improve this answer



























        6












        6








        6







        Omitting the capture around the $0 works:



        $_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


        (And then you can also omit the .)



        But perhaps you wrote the capture around the $0 for a good reason.



        Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



        $_="bbaaaaawer";
        / (.) $0**2..* /;
        print $/.chars div $0.chars; # 5


        Job done, and without unnecessary complications.



        But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



        First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



        What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



        In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



        $_="abc";
        print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


        The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



        The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



        But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



        Hence you have to do something like what you've done.




        forces "publication" of match results thus far



        The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



        Please read "Publication" of match variables by Rakudo.






        share|improve this answer















        Omitting the capture around the $0 works:



        $_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


        (And then you can also omit the .)



        But perhaps you wrote the capture around the $0 for a good reason.



        Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



        $_="bbaaaaawer";
        / (.) $0**2..* /;
        print $/.chars div $0.chars; # 5


        Job done, and without unnecessary complications.



        But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



        First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



        What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



        In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



        $_="abc";
        print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


        The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



        The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



        But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



        Hence you have to do something like what you've done.




        forces "publication" of match results thus far



        The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



        Please read "Publication" of match variables by Rakudo.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited May 31 at 21:07

























        answered May 31 at 14:49









        raiphraiph

        13.8k22651




        13.8k22651























            10














            Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



            This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



            say "abab" ~~ /((a)(b))+/


            Then the result is:



            「abab」
            0 => 「ab」
            0 => 「a」
            1 => 「b」
            0 => 「ab」
            0 => 「a」
            1 => 「b」


            And we can then index:



            say $0; # The array of the top-level capture, which was quantified
            say $0[1]; # The second Match
            say $0[1][0]; # The first Match within that Match object (the (a))


            It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.






            share|improve this answer























            • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

              – drclaw
              Jun 1 at 3:15











            • Yes; actually a List since it makes no sense to mutate it.

              – Jonathan Worthington
              Jun 1 at 11:19















            10














            Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



            This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



            say "abab" ~~ /((a)(b))+/


            Then the result is:



            「abab」
            0 => 「ab」
            0 => 「a」
            1 => 「b」
            0 => 「ab」
            0 => 「a」
            1 => 「b」


            And we can then index:



            say $0; # The array of the top-level capture, which was quantified
            say $0[1]; # The second Match
            say $0[1][0]; # The first Match within that Match object (the (a))


            It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.






            share|improve this answer























            • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

              – drclaw
              Jun 1 at 3:15











            • Yes; actually a List since it makes no sense to mutate it.

              – Jonathan Worthington
              Jun 1 at 11:19













            10












            10








            10







            Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



            This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



            say "abab" ~~ /((a)(b))+/


            Then the result is:



            「abab」
            0 => 「ab」
            0 => 「a」
            1 => 「b」
            0 => 「ab」
            0 => 「a」
            1 => 「b」


            And we can then index:



            say $0; # The array of the top-level capture, which was quantified
            say $0[1]; # The second Match
            say $0[1][0]; # The first Match within that Match object (the (a))


            It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.






            share|improve this answer













            Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



            This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



            say "abab" ~~ /((a)(b))+/


            Then the result is:



            「abab」
            0 => 「ab」
            0 => 「a」
            1 => 「b」
            0 => 「ab」
            0 => 「a」
            1 => 「b」


            And we can then index:



            say $0; # The array of the top-level capture, which was quantified
            say $0[1]; # The second Match
            say $0[1][0]; # The first Match within that Match object (the (a))


            It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered May 31 at 13:26









            Jonathan WorthingtonJonathan Worthington

            11.4k13461




            11.4k13461












            • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

              – drclaw
              Jun 1 at 3:15











            • Yes; actually a List since it makes no sense to mutate it.

              – Jonathan Worthington
              Jun 1 at 11:19

















            • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

              – drclaw
              Jun 1 at 3:15











            • Yes; actually a List since it makes no sense to mutate it.

              – Jonathan Worthington
              Jun 1 at 11:19
















            Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

            – drclaw
            Jun 1 at 3:15





            Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

            – drclaw
            Jun 1 at 3:15













            Yes; actually a List since it makes no sense to mutate it.

            – Jonathan Worthington
            Jun 1 at 11:19





            Yes; actually a List since it makes no sense to mutate it.

            – Jonathan Worthington
            Jun 1 at 11:19











            4
















            The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



            So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



            (The named captures $<foo> are also affected by this.)




            The following has 3 separate $0 “variables”, and one $1 “variable”:



            'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

            'aabbaabb' ~~ /
            ^

            # $0 = 'aabb'
            (

            # $0 = 'a'
            (.) $0

            # $1 = 'bb'
            (

            # $0 = 'b'
            (.) $0
            )
            )

            $0

            $
            /


            「aabbaabb」
            0 => 「aabb」
            0 => 「a」
            1 => 「bb」
            0 => 「b」


            Basically the () in the regex DSL act a bit like in normal Perl6.



            A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

            (Pay attention to the 3 lines with my $/ = [];)

            (Also the / ^ / style comments refer to the regex code for ^ and such above)



            given 'aabbaabb' 
            my $/ = []; # give assignable storage for $0, $1 etc.
            my $pos = 0; # position counter
            my $init = $pos; # initial position

            # / ^ /
            fail unless $pos == 0;

            # / ( /
            $0 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ( /
            $1 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ) /
            # the returned value (becomes $1 in outer scope)
            .substr($init, $pos - $init)


            # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)


            # / $0 /
            fail unless .substr($pos,$0.chars) eq $0;
            $pos += $0.chars;

            # / $ /
            fail unless $pos = .chars;

            # the returned value
            .substr($init, $pos - $init)




            TLDR;



            Just remove the () surrounding ($c) / ($0).

            (Assuming you didn't need the capture for something else.)



            /((.) $0**2..*)/


            perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';





            share|improve this answer























            • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

              – Brad Gilbert
              May 31 at 14:42















            4
















            The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



            So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



            (The named captures $<foo> are also affected by this.)




            The following has 3 separate $0 “variables”, and one $1 “variable”:



            'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

            'aabbaabb' ~~ /
            ^

            # $0 = 'aabb'
            (

            # $0 = 'a'
            (.) $0

            # $1 = 'bb'
            (

            # $0 = 'b'
            (.) $0
            )
            )

            $0

            $
            /


            「aabbaabb」
            0 => 「aabb」
            0 => 「a」
            1 => 「bb」
            0 => 「b」


            Basically the () in the regex DSL act a bit like in normal Perl6.



            A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

            (Pay attention to the 3 lines with my $/ = [];)

            (Also the / ^ / style comments refer to the regex code for ^ and such above)



            given 'aabbaabb' 
            my $/ = []; # give assignable storage for $0, $1 etc.
            my $pos = 0; # position counter
            my $init = $pos; # initial position

            # / ^ /
            fail unless $pos == 0;

            # / ( /
            $0 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ( /
            $1 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ) /
            # the returned value (becomes $1 in outer scope)
            .substr($init, $pos - $init)


            # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)


            # / $0 /
            fail unless .substr($pos,$0.chars) eq $0;
            $pos += $0.chars;

            # / $ /
            fail unless $pos = .chars;

            # the returned value
            .substr($init, $pos - $init)




            TLDR;



            Just remove the () surrounding ($c) / ($0).

            (Assuming you didn't need the capture for something else.)



            /((.) $0**2..*)/


            perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';





            share|improve this answer























            • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

              – Brad Gilbert
              May 31 at 14:42













            4












            4








            4









            The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



            So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



            (The named captures $<foo> are also affected by this.)




            The following has 3 separate $0 “variables”, and one $1 “variable”:



            'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

            'aabbaabb' ~~ /
            ^

            # $0 = 'aabb'
            (

            # $0 = 'a'
            (.) $0

            # $1 = 'bb'
            (

            # $0 = 'b'
            (.) $0
            )
            )

            $0

            $
            /


            「aabbaabb」
            0 => 「aabb」
            0 => 「a」
            1 => 「bb」
            0 => 「b」


            Basically the () in the regex DSL act a bit like in normal Perl6.



            A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

            (Pay attention to the 3 lines with my $/ = [];)

            (Also the / ^ / style comments refer to the regex code for ^ and such above)



            given 'aabbaabb' 
            my $/ = []; # give assignable storage for $0, $1 etc.
            my $pos = 0; # position counter
            my $init = $pos; # initial position

            # / ^ /
            fail unless $pos == 0;

            # / ( /
            $0 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ( /
            $1 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ) /
            # the returned value (becomes $1 in outer scope)
            .substr($init, $pos - $init)


            # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)


            # / $0 /
            fail unless .substr($pos,$0.chars) eq $0;
            $pos += $0.chars;

            # / $ /
            fail unless $pos = .chars;

            # the returned value
            .substr($init, $pos - $init)




            TLDR;



            Just remove the () surrounding ($c) / ($0).

            (Assuming you didn't need the capture for something else.)



            /((.) $0**2..*)/


            perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';





            share|improve this answer















            The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



            So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



            (The named captures $<foo> are also affected by this.)




            The following has 3 separate $0 “variables”, and one $1 “variable”:



            'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

            'aabbaabb' ~~ /
            ^

            # $0 = 'aabb'
            (

            # $0 = 'a'
            (.) $0

            # $1 = 'bb'
            (

            # $0 = 'b'
            (.) $0
            )
            )

            $0

            $
            /


            「aabbaabb」
            0 => 「aabb」
            0 => 「a」
            1 => 「bb」
            0 => 「b」


            Basically the () in the regex DSL act a bit like in normal Perl6.



            A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

            (Pay attention to the 3 lines with my $/ = [];)

            (Also the / ^ / style comments refer to the regex code for ^ and such above)



            given 'aabbaabb' 
            my $/ = []; # give assignable storage for $0, $1 etc.
            my $pos = 0; # position counter
            my $init = $pos; # initial position

            # / ^ /
            fail unless $pos == 0;

            # / ( /
            $0 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ( /
            $1 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ) /
            # the returned value (becomes $1 in outer scope)
            .substr($init, $pos - $init)


            # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)


            # / $0 /
            fail unless .substr($pos,$0.chars) eq $0;
            $pos += $0.chars;

            # / $ /
            fail unless $pos = .chars;

            # the returned value
            .substr($init, $pos - $init)




            TLDR;



            Just remove the () surrounding ($c) / ($0).

            (Assuming you didn't need the capture for something else.)



            /((.) $0**2..*)/


            perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered May 31 at 14:21









            Brad GilbertBrad Gilbert

            26.6k866112




            26.6k866112












            • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

              – Brad Gilbert
              May 31 at 14:42

















            • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

              – Brad Gilbert
              May 31 at 14:42
















            Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

            – Brad Gilbert
            May 31 at 14:42





            Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

            – Brad Gilbert
            May 31 at 14:42

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56393888%2fwhy-how-is-an-additional-variable-needed-in-matching-repeated-arbitary-character%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

            Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

            Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020