Why/how is an additional variable needed in matching repeated arbitary character with capture groups?perl6 grammar , not sure about some syntax in an example

Multi tool use
Multi tool use

Why swap space doesn't get filesystem check at boot time?

Is using Legacy mode is a bad thing to do?

How can caller ID be faked?

Fantasy game inventory — Ch. 5 Automate the Boring Stuff

I have found ports on my Samsung smart tv running a display service. What can I do with it?

Credit card validation in C

In windows systems, is renaming files functionally similar to deleting them?

Is the infant mortality rate among African-American babies in Youngstown, Ohio greater than that of babies in Iran?

What is this plant I saw for sale at a Romanian farmer's market?

How to sort human readable size

Having some issue with notation in a Hilbert space

Print the new site header

Basic power tool set for Home repair and simple projects

You may find me... puzzling

Operator currying: how to convert f[a,b][c,d] to a+c,b+d?

Fill the maze with a wall-following Snake until it gets stuck

Justifying Affordable Bespoke Spaceships

How can I detect if I'm in a subshell?

What is the context for Napoleon's quote "[the Austrians] did not know the value of five minutes"?

Does knowing the surface area of all faces uniquely determine a tetrahedron?

Build a scale without computer

I'm yearning in grey

What kind of chart is this?

How to avoid offending original culture when making conculture inspired from original



Why/how is an additional variable needed in matching repeated arbitary character with capture groups?


perl6 grammar , not sure about some syntax in an example






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








10















I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.



After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':



#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';

#Output: aaaaa


To aid in illustrating my question only, a similar regex in perl5:



#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';


Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?



Thanks in advance.










share|improve this question

















  • 1





    Shortest answer /((.)$0**2..*)/

    – Brad Gilbert
    May 31 at 14:26












  • Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

    – Brad Gilbert
    May 31 at 14:36











  • Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

    – drclaw
    Jun 1 at 3:56


















10















I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.



After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':



#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';

#Output: aaaaa


To aid in illustrating my question only, a similar regex in perl5:



#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';


Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?



Thanks in advance.










share|improve this question

















  • 1





    Shortest answer /((.)$0**2..*)/

    – Brad Gilbert
    May 31 at 14:26












  • Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

    – Brad Gilbert
    May 31 at 14:36











  • Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

    – drclaw
    Jun 1 at 3:56














10












10








10








I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.



After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':



#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';

#Output: aaaaa


To aid in illustrating my question only, a similar regex in perl5:



#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';


Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?



Thanks in advance.










share|improve this question














I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.



After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':



#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';

#Output: aaaaa


To aid in illustrating my question only, a similar regex in perl5:



#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';


Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?



Thanks in advance.







regex perl6






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked May 31 at 11:03









drclawdrclaw

1,301521




1,301521







  • 1





    Shortest answer /((.)$0**2..*)/

    – Brad Gilbert
    May 31 at 14:26












  • Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

    – Brad Gilbert
    May 31 at 14:36











  • Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

    – drclaw
    Jun 1 at 3:56













  • 1





    Shortest answer /((.)$0**2..*)/

    – Brad Gilbert
    May 31 at 14:26












  • Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

    – Brad Gilbert
    May 31 at 14:36











  • Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

    – drclaw
    Jun 1 at 3:56








1




1





Shortest answer /((.)$0**2..*)/

– Brad Gilbert
May 31 at 14:26






Shortest answer /((.)$0**2..*)/

– Brad Gilbert
May 31 at 14:26














Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

– Brad Gilbert
May 31 at 14:36





Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

– Brad Gilbert
May 31 at 14:36













Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

– drclaw
Jun 1 at 3:56






Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

– drclaw
Jun 1 at 3:56













3 Answers
3






active

oldest

votes


















6














Omitting the capture around the $0 works:



$_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


(And then you can also omit the .)



But perhaps you wrote the capture around the $0 for a good reason.



Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



$_="bbaaaaawer";
/ (.) $0**2..* /;
print $/.chars div $0.chars; # 5


Job done, and without unnecessary complications.



But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



$_="abc";
print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



Hence you have to do something like what you've done.




forces "publication" of match results thus far



The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



Please read "Publication" of match variables by Rakudo.






share|improve this answer
































    10














    Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



    This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



    say "abab" ~~ /((a)(b))+/


    Then the result is:



    「abab」
    0 => 「ab」
    0 => 「a」
    1 => 「b」
    0 => 「ab」
    0 => 「a」
    1 => 「b」


    And we can then index:



    say $0; # The array of the top-level capture, which was quantified
    say $0[1]; # The second Match
    say $0[1][0]; # The first Match within that Match object (the (a))


    It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.






    share|improve this answer























    • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

      – drclaw
      Jun 1 at 3:15











    • Yes; actually a List since it makes no sense to mutate it.

      – Jonathan Worthington
      Jun 1 at 11:19


















    4
















    The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



    So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



    (The named captures $<foo> are also affected by this.)




    The following has 3 separate $0 “variables”, and one $1 “variable”:



    'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

    'aabbaabb' ~~ /
    ^

    # $0 = 'aabb'
    (

    # $0 = 'a'
    (.) $0

    # $1 = 'bb'
    (

    # $0 = 'b'
    (.) $0
    )
    )

    $0

    $
    /


    「aabbaabb」
    0 => 「aabb」
    0 => 「a」
    1 => 「bb」
    0 => 「b」


    Basically the () in the regex DSL act a bit like in normal Perl6.



    A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

    (Pay attention to the 3 lines with my $/ = [];)

    (Also the / ^ / style comments refer to the regex code for ^ and such above)



    given 'aabbaabb' 
    my $/ = []; # give assignable storage for $0, $1 etc.
    my $pos = 0; # position counter
    my $init = $pos; # initial position

    # / ^ /
    fail unless $pos == 0;

    # / ( /
    $0 = do
    my $/ = [];
    my $init = $pos;

    # / (.) $0 /
    $0 = .substr($pos,1); # / (.) /
    $pos += $0.chars;
    fail unless .substr($pos,$0.chars) eq $0; # / $0 /
    $pos += $0.chars;

    # / ( /
    $1 = do
    my $/ = [];
    my $init = $pos;

    # / (.) $0 /
    $0 = .substr($pos,1); # / (.) /
    $pos += $0.chars;
    fail unless .substr($pos,$0.chars) eq $0; # / $0 /
    $pos += $0.chars;

    # / ) /
    # the returned value (becomes $1 in outer scope)
    .substr($init, $pos - $init)


    # / ) /
    # the returned value (becomes $0 in outer scope)
    .substr($init, $pos - $init)


    # / $0 /
    fail unless .substr($pos,$0.chars) eq $0;
    $pos += $0.chars;

    # / $ /
    fail unless $pos = .chars;

    # the returned value
    .substr($init, $pos - $init)




    TLDR;



    Just remove the () surrounding ($c) / ($0).

    (Assuming you didn't need the capture for something else.)



    /((.) $0**2..*)/


    perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';





    share|improve this answer























    • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

      – Brad Gilbert
      May 31 at 14:42











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56393888%2fwhy-how-is-an-additional-variable-needed-in-matching-repeated-arbitary-character%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    6














    Omitting the capture around the $0 works:



    $_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


    (And then you can also omit the .)



    But perhaps you wrote the capture around the $0 for a good reason.



    Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



    $_="bbaaaaawer";
    / (.) $0**2..* /;
    print $/.chars div $0.chars; # 5


    Job done, and without unnecessary complications.



    But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



    First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



    What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



    In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



    $_="abc";
    print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


    The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



    The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



    But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



    Hence you have to do something like what you've done.




    forces "publication" of match results thus far



    The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



    Please read "Publication" of match variables by Rakudo.






    share|improve this answer





























      6














      Omitting the capture around the $0 works:



      $_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


      (And then you can also omit the .)



      But perhaps you wrote the capture around the $0 for a good reason.



      Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



      $_="bbaaaaawer";
      / (.) $0**2..* /;
      print $/.chars div $0.chars; # 5


      Job done, and without unnecessary complications.



      But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



      First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



      What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



      In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



      $_="abc";
      print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


      The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



      The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



      But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



      Hence you have to do something like what you've done.




      forces "publication" of match results thus far



      The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



      Please read "Publication" of match variables by Rakudo.






      share|improve this answer



























        6












        6








        6







        Omitting the capture around the $0 works:



        $_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


        (And then you can also omit the .)



        But perhaps you wrote the capture around the $0 for a good reason.



        Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



        $_="bbaaaaawer";
        / (.) $0**2..* /;
        print $/.chars div $0.chars; # 5


        Job done, and without unnecessary complications.



        But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



        First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



        What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



        In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



        $_="abc";
        print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


        The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



        The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



        But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



        Hence you have to do something like what you've done.




        forces "publication" of match results thus far



        The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



        Please read "Publication" of match variables by Rakudo.






        share|improve this answer















        Omitting the capture around the $0 works:



        $_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


        (And then you can also omit the .)



        But perhaps you wrote the capture around the $0 for a good reason.



        Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



        $_="bbaaaaawer";
        / (.) $0**2..* /;
        print $/.chars div $0.chars; # 5


        Job done, and without unnecessary complications.



        But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



        First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



        What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



        In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



        $_="abc";
        print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


        The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



        The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



        But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



        Hence you have to do something like what you've done.




        forces "publication" of match results thus far



        The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



        Please read "Publication" of match variables by Rakudo.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited May 31 at 21:07

























        answered May 31 at 14:49









        raiphraiph

        13.8k22651




        13.8k22651























            10














            Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



            This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



            say "abab" ~~ /((a)(b))+/


            Then the result is:



            「abab」
            0 => 「ab」
            0 => 「a」
            1 => 「b」
            0 => 「ab」
            0 => 「a」
            1 => 「b」


            And we can then index:



            say $0; # The array of the top-level capture, which was quantified
            say $0[1]; # The second Match
            say $0[1][0]; # The first Match within that Match object (the (a))


            It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.






            share|improve this answer























            • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

              – drclaw
              Jun 1 at 3:15











            • Yes; actually a List since it makes no sense to mutate it.

              – Jonathan Worthington
              Jun 1 at 11:19















            10














            Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



            This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



            say "abab" ~~ /((a)(b))+/


            Then the result is:



            「abab」
            0 => 「ab」
            0 => 「a」
            1 => 「b」
            0 => 「ab」
            0 => 「a」
            1 => 「b」


            And we can then index:



            say $0; # The array of the top-level capture, which was quantified
            say $0[1]; # The second Match
            say $0[1][0]; # The first Match within that Match object (the (a))


            It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.






            share|improve this answer























            • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

              – drclaw
              Jun 1 at 3:15











            • Yes; actually a List since it makes no sense to mutate it.

              – Jonathan Worthington
              Jun 1 at 11:19













            10












            10








            10







            Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



            This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



            say "abab" ~~ /((a)(b))+/


            Then the result is:



            「abab」
            0 => 「ab」
            0 => 「a」
            1 => 「b」
            0 => 「ab」
            0 => 「a」
            1 => 「b」


            And we can then index:



            say $0; # The array of the top-level capture, which was quantified
            say $0[1]; # The second Match
            say $0[1][0]; # The first Match within that Match object (the (a))


            It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.






            share|improve this answer













            Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



            This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



            say "abab" ~~ /((a)(b))+/


            Then the result is:



            「abab」
            0 => 「ab」
            0 => 「a」
            1 => 「b」
            0 => 「ab」
            0 => 「a」
            1 => 「b」


            And we can then index:



            say $0; # The array of the top-level capture, which was quantified
            say $0[1]; # The second Match
            say $0[1][0]; # The first Match within that Match object (the (a))


            It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered May 31 at 13:26









            Jonathan WorthingtonJonathan Worthington

            11.4k13461




            11.4k13461












            • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

              – drclaw
              Jun 1 at 3:15











            • Yes; actually a List since it makes no sense to mutate it.

              – Jonathan Worthington
              Jun 1 at 11:19

















            • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

              – drclaw
              Jun 1 at 3:15











            • Yes; actually a List since it makes no sense to mutate it.

              – Jonathan Worthington
              Jun 1 at 11:19
















            Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

            – drclaw
            Jun 1 at 3:15





            Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

            – drclaw
            Jun 1 at 3:15













            Yes; actually a List since it makes no sense to mutate it.

            – Jonathan Worthington
            Jun 1 at 11:19





            Yes; actually a List since it makes no sense to mutate it.

            – Jonathan Worthington
            Jun 1 at 11:19











            4
















            The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



            So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



            (The named captures $<foo> are also affected by this.)




            The following has 3 separate $0 “variables”, and one $1 “variable”:



            'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

            'aabbaabb' ~~ /
            ^

            # $0 = 'aabb'
            (

            # $0 = 'a'
            (.) $0

            # $1 = 'bb'
            (

            # $0 = 'b'
            (.) $0
            )
            )

            $0

            $
            /


            「aabbaabb」
            0 => 「aabb」
            0 => 「a」
            1 => 「bb」
            0 => 「b」


            Basically the () in the regex DSL act a bit like in normal Perl6.



            A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

            (Pay attention to the 3 lines with my $/ = [];)

            (Also the / ^ / style comments refer to the regex code for ^ and such above)



            given 'aabbaabb' 
            my $/ = []; # give assignable storage for $0, $1 etc.
            my $pos = 0; # position counter
            my $init = $pos; # initial position

            # / ^ /
            fail unless $pos == 0;

            # / ( /
            $0 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ( /
            $1 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ) /
            # the returned value (becomes $1 in outer scope)
            .substr($init, $pos - $init)


            # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)


            # / $0 /
            fail unless .substr($pos,$0.chars) eq $0;
            $pos += $0.chars;

            # / $ /
            fail unless $pos = .chars;

            # the returned value
            .substr($init, $pos - $init)




            TLDR;



            Just remove the () surrounding ($c) / ($0).

            (Assuming you didn't need the capture for something else.)



            /((.) $0**2..*)/


            perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';





            share|improve this answer























            • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

              – Brad Gilbert
              May 31 at 14:42















            4
















            The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



            So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



            (The named captures $<foo> are also affected by this.)




            The following has 3 separate $0 “variables”, and one $1 “variable”:



            'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

            'aabbaabb' ~~ /
            ^

            # $0 = 'aabb'
            (

            # $0 = 'a'
            (.) $0

            # $1 = 'bb'
            (

            # $0 = 'b'
            (.) $0
            )
            )

            $0

            $
            /


            「aabbaabb」
            0 => 「aabb」
            0 => 「a」
            1 => 「bb」
            0 => 「b」


            Basically the () in the regex DSL act a bit like in normal Perl6.



            A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

            (Pay attention to the 3 lines with my $/ = [];)

            (Also the / ^ / style comments refer to the regex code for ^ and such above)



            given 'aabbaabb' 
            my $/ = []; # give assignable storage for $0, $1 etc.
            my $pos = 0; # position counter
            my $init = $pos; # initial position

            # / ^ /
            fail unless $pos == 0;

            # / ( /
            $0 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ( /
            $1 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ) /
            # the returned value (becomes $1 in outer scope)
            .substr($init, $pos - $init)


            # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)


            # / $0 /
            fail unless .substr($pos,$0.chars) eq $0;
            $pos += $0.chars;

            # / $ /
            fail unless $pos = .chars;

            # the returned value
            .substr($init, $pos - $init)




            TLDR;



            Just remove the () surrounding ($c) / ($0).

            (Assuming you didn't need the capture for something else.)



            /((.) $0**2..*)/


            perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';





            share|improve this answer























            • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

              – Brad Gilbert
              May 31 at 14:42













            4












            4








            4









            The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



            So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



            (The named captures $<foo> are also affected by this.)




            The following has 3 separate $0 “variables”, and one $1 “variable”:



            'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

            'aabbaabb' ~~ /
            ^

            # $0 = 'aabb'
            (

            # $0 = 'a'
            (.) $0

            # $1 = 'bb'
            (

            # $0 = 'b'
            (.) $0
            )
            )

            $0

            $
            /


            「aabbaabb」
            0 => 「aabb」
            0 => 「a」
            1 => 「bb」
            0 => 「b」


            Basically the () in the regex DSL act a bit like in normal Perl6.



            A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

            (Pay attention to the 3 lines with my $/ = [];)

            (Also the / ^ / style comments refer to the regex code for ^ and such above)



            given 'aabbaabb' 
            my $/ = []; # give assignable storage for $0, $1 etc.
            my $pos = 0; # position counter
            my $init = $pos; # initial position

            # / ^ /
            fail unless $pos == 0;

            # / ( /
            $0 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ( /
            $1 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ) /
            # the returned value (becomes $1 in outer scope)
            .substr($init, $pos - $init)


            # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)


            # / $0 /
            fail unless .substr($pos,$0.chars) eq $0;
            $pos += $0.chars;

            # / $ /
            fail unless $pos = .chars;

            # the returned value
            .substr($init, $pos - $init)




            TLDR;



            Just remove the () surrounding ($c) / ($0).

            (Assuming you didn't need the capture for something else.)



            /((.) $0**2..*)/


            perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';





            share|improve this answer















            The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



            So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



            (The named captures $<foo> are also affected by this.)




            The following has 3 separate $0 “variables”, and one $1 “variable”:



            'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

            'aabbaabb' ~~ /
            ^

            # $0 = 'aabb'
            (

            # $0 = 'a'
            (.) $0

            # $1 = 'bb'
            (

            # $0 = 'b'
            (.) $0
            )
            )

            $0

            $
            /


            「aabbaabb」
            0 => 「aabb」
            0 => 「a」
            1 => 「bb」
            0 => 「b」


            Basically the () in the regex DSL act a bit like in normal Perl6.



            A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

            (Pay attention to the 3 lines with my $/ = [];)

            (Also the / ^ / style comments refer to the regex code for ^ and such above)



            given 'aabbaabb' 
            my $/ = []; # give assignable storage for $0, $1 etc.
            my $pos = 0; # position counter
            my $init = $pos; # initial position

            # / ^ /
            fail unless $pos == 0;

            # / ( /
            $0 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ( /
            $1 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ) /
            # the returned value (becomes $1 in outer scope)
            .substr($init, $pos - $init)


            # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)


            # / $0 /
            fail unless .substr($pos,$0.chars) eq $0;
            $pos += $0.chars;

            # / $ /
            fail unless $pos = .chars;

            # the returned value
            .substr($init, $pos - $init)




            TLDR;



            Just remove the () surrounding ($c) / ($0).

            (Assuming you didn't need the capture for something else.)



            /((.) $0**2..*)/


            perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered May 31 at 14:21









            Brad GilbertBrad Gilbert

            26.6k866112




            26.6k866112












            • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

              – Brad Gilbert
              May 31 at 14:42

















            • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

              – Brad Gilbert
              May 31 at 14:42
















            Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

            – Brad Gilbert
            May 31 at 14:42





            Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

            – Brad Gilbert
            May 31 at 14:42

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56393888%2fwhy-how-is-an-additional-variable-needed-in-matching-repeated-arbitary-character%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            rdbTUDv l MoFdR13ViqDmePb0fu42QggOhDz6Wt8xe0rcaGSuObsBJ1aj hI1b i0D luRWO
            EX3RTjBZPPIxoNdr4Xqp5,qGkCuDrFfZc pgn93vzN82oBPoZ0a s94USiW 8Fr1,JnYylESI U48Pnvv8uYHrPWW

            Popular posts from this blog

            RemoteApp sporadic failureWindows 2008 RemoteAPP client disconnects within a matter of minutesWhat is the minimum version of RDP supported by Server 2012 RDS?How to configure a Remoteapp server to increase stabilityMicrosoft RemoteApp Active SessionRDWeb TS connection broken for some users post RemoteApp certificate changeRemote Desktop Licensing, RemoteAPPRDS 2012 R2 some users are not able to logon after changed date and time on Connection BrokersWhat happens during Remote Desktop logon, and is there any logging?After installing RDS on WinServer 2016 I still can only connect with two users?RD Connection via RDGW to Session host is not connecting

            Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

            Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020