Why/how is an additional variable needed in matching repeated arbitary character with capture groups?perl6 grammar , not sure about some syntax in an example

Why swap space doesn't get filesystem check at boot time?

Is using Legacy mode is a bad thing to do?

How can caller ID be faked?

Fantasy game inventory — Ch. 5 Automate the Boring Stuff

I have found ports on my Samsung smart tv running a display service. What can I do with it?

Credit card validation in C

In windows systems, is renaming files functionally similar to deleting them?

Is the infant mortality rate among African-American babies in Youngstown, Ohio greater than that of babies in Iran?

What is this plant I saw for sale at a Romanian farmer's market?

How to sort human readable size

Having some issue with notation in a Hilbert space

Print the new site header

Basic power tool set for Home repair and simple projects

You may find me... puzzling

Operator currying: how to convert f[a,b][c,d] to a+c,b+d?

Fill the maze with a wall-following Snake until it gets stuck

Justifying Affordable Bespoke Spaceships

How can I detect if I'm in a subshell?

What is the context for Napoleon's quote "[the Austrians] did not know the value of five minutes"?

Does knowing the surface area of all faces uniquely determine a tetrahedron?

Build a scale without computer

I'm yearning in grey

What kind of chart is this?

How to avoid offending original culture when making conculture inspired from original



Why/how is an additional variable needed in matching repeated arbitary character with capture groups?


perl6 grammar , not sure about some syntax in an example






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








10















I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.



After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':



#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';

#Output: aaaaa


To aid in illustrating my question only, a similar regex in perl5:



#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';


Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?



Thanks in advance.










share|improve this question

















  • 1





    Shortest answer /((.)$0**2..*)/

    – Brad Gilbert
    May 31 at 14:26












  • Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

    – Brad Gilbert
    May 31 at 14:36











  • Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

    – drclaw
    Jun 1 at 3:56


















10















I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.



After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':



#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';

#Output: aaaaa


To aid in illustrating my question only, a similar regex in perl5:



#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';


Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?



Thanks in advance.










share|improve this question

















  • 1





    Shortest answer /((.)$0**2..*)/

    – Brad Gilbert
    May 31 at 14:26












  • Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

    – Brad Gilbert
    May 31 at 14:36











  • Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

    – drclaw
    Jun 1 at 3:56














10












10








10








I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.



After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':



#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';

#Output: aaaaa


To aid in illustrating my question only, a similar regex in perl5:



#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';


Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?



Thanks in advance.










share|improve this question














I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.



After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':



#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.) :my $c=$0; ($c)**2..*)/ && print $0';

#Output: aaaaa


To aid in illustrating my question only, a similar regex in perl5:



#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)22,)/ && print $1';


Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty ? Is there an alternative (better/golfed) perl6 regex that will match?



Thanks in advance.







regex perl6






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked May 31 at 11:03









drclawdrclaw

1,301521




1,301521







  • 1





    Shortest answer /((.)$0**2..*)/

    – Brad Gilbert
    May 31 at 14:26












  • Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

    – Brad Gilbert
    May 31 at 14:36











  • Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

    – drclaw
    Jun 1 at 3:56













  • 1





    Shortest answer /((.)$0**2..*)/

    – Brad Gilbert
    May 31 at 14:26












  • Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

    – Brad Gilbert
    May 31 at 14:36











  • Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

    – drclaw
    Jun 1 at 3:56








1




1





Shortest answer /((.)$0**2..*)/

– Brad Gilbert
May 31 at 14:26






Shortest answer /((.)$0**2..*)/

– Brad Gilbert
May 31 at 14:26














Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

– Brad Gilbert
May 31 at 14:36





Note that the whole point of the example was to create a short example which showed code that needed a lexical variable, but it did not explain why it needed it.`

– Brad Gilbert
May 31 at 14:36













Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

– drclaw
Jun 1 at 3:56






Thank you all for the great answers! It's a shame I can only 'accept' one! Reading the detail in all of them (multiple times!) I have a clearer understanding and realise I really need to look into perl6 Grammars. The link provided @raiph was a particularly intresting read.

– drclaw
Jun 1 at 3:56













3 Answers
3






active

oldest

votes


















6














Omitting the capture around the $0 works:



$_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


(And then you can also omit the .)



But perhaps you wrote the capture around the $0 for a good reason.



Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



$_="bbaaaaawer";
/ (.) $0**2..* /;
print $/.chars div $0.chars; # 5


Job done, and without unnecessary complications.



But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



$_="abc";
print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



Hence you have to do something like what you've done.




forces "publication" of match results thus far



The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



Please read "Publication" of match variables by Rakudo.






share|improve this answer
































    10














    Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



    This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



    say "abab" ~~ /((a)(b))+/


    Then the result is:



    「abab」
    0 => 「ab」
    0 => 「a」
    1 => 「b」
    0 => 「ab」
    0 => 「a」
    1 => 「b」


    And we can then index:



    say $0; # The array of the top-level capture, which was quantified
    say $0[1]; # The second Match
    say $0[1][0]; # The first Match within that Match object (the (a))


    It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.






    share|improve this answer























    • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

      – drclaw
      Jun 1 at 3:15











    • Yes; actually a List since it makes no sense to mutate it.

      – Jonathan Worthington
      Jun 1 at 11:19


















    4
















    The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



    So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



    (The named captures $<foo> are also affected by this.)




    The following has 3 separate $0 “variables”, and one $1 “variable”:



    'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

    'aabbaabb' ~~ /
    ^

    # $0 = 'aabb'
    (

    # $0 = 'a'
    (.) $0

    # $1 = 'bb'
    (

    # $0 = 'b'
    (.) $0
    )
    )

    $0

    $
    /


    「aabbaabb」
    0 => 「aabb」
    0 => 「a」
    1 => 「bb」
    0 => 「b」


    Basically the () in the regex DSL act a bit like in normal Perl6.



    A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

    (Pay attention to the 3 lines with my $/ = [];)

    (Also the / ^ / style comments refer to the regex code for ^ and such above)



    given 'aabbaabb' 
    my $/ = []; # give assignable storage for $0, $1 etc.
    my $pos = 0; # position counter
    my $init = $pos; # initial position

    # / ^ /
    fail unless $pos == 0;

    # / ( /
    $0 = do
    my $/ = [];
    my $init = $pos;

    # / (.) $0 /
    $0 = .substr($pos,1); # / (.) /
    $pos += $0.chars;
    fail unless .substr($pos,$0.chars) eq $0; # / $0 /
    $pos += $0.chars;

    # / ( /
    $1 = do
    my $/ = [];
    my $init = $pos;

    # / (.) $0 /
    $0 = .substr($pos,1); # / (.) /
    $pos += $0.chars;
    fail unless .substr($pos,$0.chars) eq $0; # / $0 /
    $pos += $0.chars;

    # / ) /
    # the returned value (becomes $1 in outer scope)
    .substr($init, $pos - $init)


    # / ) /
    # the returned value (becomes $0 in outer scope)
    .substr($init, $pos - $init)


    # / $0 /
    fail unless .substr($pos,$0.chars) eq $0;
    $pos += $0.chars;

    # / $ /
    fail unless $pos = .chars;

    # the returned value
    .substr($init, $pos - $init)




    TLDR;



    Just remove the () surrounding ($c) / ($0).

    (Assuming you didn't need the capture for something else.)



    /((.) $0**2..*)/


    perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';





    share|improve this answer























    • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

      – Brad Gilbert
      May 31 at 14:42











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56393888%2fwhy-how-is-an-additional-variable-needed-in-matching-repeated-arbitary-character%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    6














    Omitting the capture around the $0 works:



    $_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


    (And then you can also omit the .)



    But perhaps you wrote the capture around the $0 for a good reason.



    Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



    $_="bbaaaaawer";
    / (.) $0**2..* /;
    print $/.chars div $0.chars; # 5


    Job done, and without unnecessary complications.



    But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



    First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



    What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



    In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



    $_="abc";
    print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


    The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



    The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



    But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



    Hence you have to do something like what you've done.




    forces "publication" of match results thus far



    The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



    Please read "Publication" of match variables by Rakudo.






    share|improve this answer





























      6














      Omitting the capture around the $0 works:



      $_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


      (And then you can also omit the .)



      But perhaps you wrote the capture around the $0 for a good reason.



      Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



      $_="bbaaaaawer";
      / (.) $0**2..* /;
      print $/.chars div $0.chars; # 5


      Job done, and without unnecessary complications.



      But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



      First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



      What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



      In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



      $_="abc";
      print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


      The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



      The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



      But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



      Hence you have to do something like what you've done.




      forces "publication" of match results thus far



      The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



      Please read "Publication" of match variables by Rakudo.






      share|improve this answer



























        6












        6








        6







        Omitting the capture around the $0 works:



        $_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


        (And then you can also omit the .)



        But perhaps you wrote the capture around the $0 for a good reason.



        Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



        $_="bbaaaaawer";
        / (.) $0**2..* /;
        print $/.chars div $0.chars; # 5


        Job done, and without unnecessary complications.



        But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



        First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



        What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



        In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



        $_="abc";
        print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


        The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



        The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



        But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



        Hence you have to do something like what you've done.




        forces "publication" of match results thus far



        The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



        Please read "Publication" of match variables by Rakudo.






        share|improve this answer















        Omitting the capture around the $0 works:



        $_="bbaaaaawer"; / (.) $0**2..* / && print $/; # aaaaa


        (And then you can also omit the .)



        But perhaps you wrote the capture around the $0 for a good reason.



        Perhaps you want a way to be able to count the number of repeats. If so, you could instead write:



        $_="bbaaaaawer";
        / (.) $0**2..* /;
        print $/.chars div $0.chars; # 5


        Job done, and without unnecessary complications.



        But maybe you really need to have a pattern that must be captured and must include a copy of whatever was matched by an earlier pattern. In that case I think you have the golf'dest solution and all that's left is to explain why the various things you mention are necessary.



        First, you have to take into account that matches are nested. Once you type parens, you've inserted a new level in a tree. Second, you have to take into account that the automatically generated match names like $0 refer to matches within the current level of the tree. See jnthn's answer for why and Brad's for further discussion.



        What I'll add to those is an analogy to filesystem paths. If one compares a parse tree with a filesystem, P6 makes it easy to refer to the current directory and its sub-directories but does not include the equivalent of specifying the root directory or the .. parent directory and its parents.



        In the following, it's easy to display 'bc' or 'c' with a code block inserted at the second level of parens:



        $_="abc";
        print m/ ( ( . ) ( . ( . ) print $/, $0 ) ) /; # bcc


        The $/ in the code block refers to "the current match object". Directly inside the second level of parens the current match object corresponds to the second level of parens, which capture 'bc'.



        The $0 at the same level refers to the first inner parens of the second level, i.e. the third level that capture 'c'.



        But there's no built in way to refer to the captured 'a', or the full 'abc' capture, from that code block.



        Hence you have to do something like what you've done.




        forces "publication" of match results thus far



        The is necessary to force the :my $c=$0; to update after the first 'a' is matched. Otherwise it would be stuck on the 'b'.



        Please read "Publication" of match variables by Rakudo.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited May 31 at 21:07

























        answered May 31 at 14:49









        raiphraiph

        13.8k22651




        13.8k22651























            10














            Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



            This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



            say "abab" ~~ /((a)(b))+/


            Then the result is:



            「abab」
            0 => 「ab」
            0 => 「a」
            1 => 「b」
            0 => 「ab」
            0 => 「a」
            1 => 「b」


            And we can then index:



            say $0; # The array of the top-level capture, which was quantified
            say $0[1]; # The second Match
            say $0[1][0]; # The first Match within that Match object (the (a))


            It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.






            share|improve this answer























            • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

              – drclaw
              Jun 1 at 3:15











            • Yes; actually a List since it makes no sense to mutate it.

              – Jonathan Worthington
              Jun 1 at 11:19















            10














            Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



            This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



            say "abab" ~~ /((a)(b))+/


            Then the result is:



            「abab」
            0 => 「ab」
            0 => 「a」
            1 => 「b」
            0 => 「ab」
            0 => 「a」
            1 => 「b」


            And we can then index:



            say $0; # The array of the top-level capture, which was quantified
            say $0[1]; # The second Match
            say $0[1][0]; # The first Match within that Match object (the (a))


            It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.






            share|improve this answer























            • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

              – drclaw
              Jun 1 at 3:15











            • Yes; actually a List since it makes no sense to mutate it.

              – Jonathan Worthington
              Jun 1 at 11:19













            10












            10








            10







            Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



            This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



            say "abab" ~~ /((a)(b))+/


            Then the result is:



            「abab」
            0 => 「ab」
            0 => 「a」
            1 => 「b」
            0 => 「ab」
            0 => 「a」
            1 => 「b」


            And we can then index:



            say $0; # The array of the top-level capture, which was quantified
            say $0[1]; # The second Match
            say $0[1][0]; # The first Match within that Match object (the (a))


            It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.






            share|improve this answer













            Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.



            This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:



            say "abab" ~~ /((a)(b))+/


            Then the result is:



            「abab」
            0 => 「ab」
            0 => 「a」
            1 => 「b」
            0 => 「ab」
            0 => 「a」
            1 => 「b」


            And we can then index:



            say $0; # The array of the top-level capture, which was quantified
            say $0[1]; # The second Match
            say $0[1][0]; # The first Match within that Match object (the (a))


            It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered May 31 at 13:26









            Jonathan WorthingtonJonathan Worthington

            11.4k13461




            11.4k13461












            • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

              – drclaw
              Jun 1 at 3:15











            • Yes; actually a List since it makes no sense to mutate it.

              – Jonathan Worthington
              Jun 1 at 11:19

















            • Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

              – drclaw
              Jun 1 at 3:15











            • Yes; actually a List since it makes no sense to mutate it.

              – Jonathan Worthington
              Jun 1 at 11:19
















            Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

            – drclaw
            Jun 1 at 3:15





            Does using a ':g' modifier potentially generates an array of Match objects similar to a quantifier also?

            – drclaw
            Jun 1 at 3:15













            Yes; actually a List since it makes no sense to mutate it.

            – Jonathan Worthington
            Jun 1 at 11:19





            Yes; actually a List since it makes no sense to mutate it.

            – Jonathan Worthington
            Jun 1 at 11:19











            4
















            The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



            So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



            (The named captures $<foo> are also affected by this.)




            The following has 3 separate $0 “variables”, and one $1 “variable”:



            'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

            'aabbaabb' ~~ /
            ^

            # $0 = 'aabb'
            (

            # $0 = 'a'
            (.) $0

            # $1 = 'bb'
            (

            # $0 = 'b'
            (.) $0
            )
            )

            $0

            $
            /


            「aabbaabb」
            0 => 「aabb」
            0 => 「a」
            1 => 「bb」
            0 => 「b」


            Basically the () in the regex DSL act a bit like in normal Perl6.



            A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

            (Pay attention to the 3 lines with my $/ = [];)

            (Also the / ^ / style comments refer to the regex code for ^ and such above)



            given 'aabbaabb' 
            my $/ = []; # give assignable storage for $0, $1 etc.
            my $pos = 0; # position counter
            my $init = $pos; # initial position

            # / ^ /
            fail unless $pos == 0;

            # / ( /
            $0 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ( /
            $1 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ) /
            # the returned value (becomes $1 in outer scope)
            .substr($init, $pos - $init)


            # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)


            # / $0 /
            fail unless .substr($pos,$0.chars) eq $0;
            $pos += $0.chars;

            # / $ /
            fail unless $pos = .chars;

            # the returned value
            .substr($init, $pos - $init)




            TLDR;



            Just remove the () surrounding ($c) / ($0).

            (Assuming you didn't need the capture for something else.)



            /((.) $0**2..*)/


            perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';





            share|improve this answer























            • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

              – Brad Gilbert
              May 31 at 14:42















            4
















            The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



            So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



            (The named captures $<foo> are also affected by this.)




            The following has 3 separate $0 “variables”, and one $1 “variable”:



            'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

            'aabbaabb' ~~ /
            ^

            # $0 = 'aabb'
            (

            # $0 = 'a'
            (.) $0

            # $1 = 'bb'
            (

            # $0 = 'b'
            (.) $0
            )
            )

            $0

            $
            /


            「aabbaabb」
            0 => 「aabb」
            0 => 「a」
            1 => 「bb」
            0 => 「b」


            Basically the () in the regex DSL act a bit like in normal Perl6.



            A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

            (Pay attention to the 3 lines with my $/ = [];)

            (Also the / ^ / style comments refer to the regex code for ^ and such above)



            given 'aabbaabb' 
            my $/ = []; # give assignable storage for $0, $1 etc.
            my $pos = 0; # position counter
            my $init = $pos; # initial position

            # / ^ /
            fail unless $pos == 0;

            # / ( /
            $0 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ( /
            $1 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ) /
            # the returned value (becomes $1 in outer scope)
            .substr($init, $pos - $init)


            # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)


            # / $0 /
            fail unless .substr($pos,$0.chars) eq $0;
            $pos += $0.chars;

            # / $ /
            fail unless $pos = .chars;

            # the returned value
            .substr($init, $pos - $init)




            TLDR;



            Just remove the () surrounding ($c) / ($0).

            (Assuming you didn't need the capture for something else.)



            /((.) $0**2..*)/


            perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';





            share|improve this answer























            • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

              – Brad Gilbert
              May 31 at 14:42













            4












            4








            4









            The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



            So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



            (The named captures $<foo> are also affected by this.)




            The following has 3 separate $0 “variables”, and one $1 “variable”:



            'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

            'aabbaabb' ~~ /
            ^

            # $0 = 'aabb'
            (

            # $0 = 'a'
            (.) $0

            # $1 = 'bb'
            (

            # $0 = 'b'
            (.) $0
            )
            )

            $0

            $
            /


            「aabbaabb」
            0 => 「aabb」
            0 => 「a」
            1 => 「bb」
            0 => 「b」


            Basically the () in the regex DSL act a bit like in normal Perl6.



            A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

            (Pay attention to the 3 lines with my $/ = [];)

            (Also the / ^ / style comments refer to the regex code for ^ and such above)



            given 'aabbaabb' 
            my $/ = []; # give assignable storage for $0, $1 etc.
            my $pos = 0; # position counter
            my $init = $pos; # initial position

            # / ^ /
            fail unless $pos == 0;

            # / ( /
            $0 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ( /
            $1 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ) /
            # the returned value (becomes $1 in outer scope)
            .substr($init, $pos - $init)


            # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)


            # / $0 /
            fail unless .substr($pos,$0.chars) eq $0;
            $pos += $0.chars;

            # / $ /
            fail unless $pos = .chars;

            # the returned value
            .substr($init, $pos - $init)




            TLDR;



            Just remove the () surrounding ($c) / ($0).

            (Assuming you didn't need the capture for something else.)



            /((.) $0**2..*)/


            perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';





            share|improve this answer















            The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.



            So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().



            (The named captures $<foo> are also affected by this.)




            The following has 3 separate $0 “variables”, and one $1 “variable”:



            'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /

            'aabbaabb' ~~ /
            ^

            # $0 = 'aabb'
            (

            # $0 = 'a'
            (.) $0

            # $1 = 'bb'
            (

            # $0 = 'b'
            (.) $0
            )
            )

            $0

            $
            /


            「aabbaabb」
            0 => 「aabb」
            0 => 「a」
            1 => 「bb」
            0 => 「b」


            Basically the () in the regex DSL act a bit like in normal Perl6.



            A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.

            (Pay attention to the 3 lines with my $/ = [];)

            (Also the / ^ / style comments refer to the regex code for ^ and such above)



            given 'aabbaabb' 
            my $/ = []; # give assignable storage for $0, $1 etc.
            my $pos = 0; # position counter
            my $init = $pos; # initial position

            # / ^ /
            fail unless $pos == 0;

            # / ( /
            $0 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ( /
            $1 = do
            my $/ = [];
            my $init = $pos;

            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;

            # / ) /
            # the returned value (becomes $1 in outer scope)
            .substr($init, $pos - $init)


            # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)


            # / $0 /
            fail unless .substr($pos,$0.chars) eq $0;
            $pos += $0.chars;

            # / $ /
            fail unless $pos = .chars;

            # the returned value
            .substr($init, $pos - $init)




            TLDR;



            Just remove the () surrounding ($c) / ($0).

            (Assuming you didn't need the capture for something else.)



            /((.) $0**2..*)/


            perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered May 31 at 14:21









            Brad GilbertBrad Gilbert

            26.6k866112




            26.6k866112












            • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

              – Brad Gilbert
              May 31 at 14:42

















            • Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

              – Brad Gilbert
              May 31 at 14:42
















            Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

            – Brad Gilbert
            May 31 at 14:42





            Note that currently /((.) ("$0")**2..*)/ works, but I suspect that may be a bug. (The $0 in ("$0") refers to whatever was in $0 at the time of .) At the very least it is surprising/confusing behavior.

            – Brad Gilbert
            May 31 at 14:42

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56393888%2fwhy-how-is-an-additional-variable-needed-in-matching-repeated-arbitary-character%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

            Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

            What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company