Why are ambiguous grammars bad?Finding a unambiguous grammarWhy is left recursion bad?Show that every grammar for an inherently ambiguous CFL has infinitely many ambiguitiesAmbiguous Grammar and SLR parsing table : No conflicts?Find unambiguous grammar for an ambiguous grammarHow to find whether a grammar is ambiguous?Converting Ambiguous Grammar G to LL(1)unambiguous grammar but it's not LR(1)LR parsers and ambiguous and non deterministic grammarsIs there any relationship between grammar being ambiguous and the language itself?Whether it's necessary for a grammar to be ambiguous when it is both left recursive and right recursive
The difference between Rad1 and Rfd1
The use of "I" and "we" used in the same sentence and other questions
Going to get married soon, should I do it on Dec 31 or Jan 1?
SPI Waveform on Raspberry Pi Not clean and I'm wondering why
Find smallest index that is identical to the value in an array
I played my first (rapid) tournament recently and I wanted to calculate my ELO
Why does the A-4 Skyhawk sit nose-up when on ground?
How do I spend money in Sweden and Denmark?
Why isn’t the tax system continuous rather than bracketed?
Compute unstable integral with high precision
Procedurally generate regions on island
Should I include salary information on my CV?
How can I convince my reader that I will not use a certain trope?
Children's short story about material that accelerates away from gravity
How to determine what is the correct level of detail when modelling?
Bash echo $-1 prints hb1. Why?
Would adding an external lens allow one area outside the focal plane to be in focus?
What happens when your group is victim of a surprise attack but you can't be surprised?
How can I check type T is among parameter pack Ts... in C++?
Anagram Within an Anagram!
Confusion about multiple information Sets
Cross over of arrows in a complex diagram
How should I behave to assure my friends that I am not after their money?
Should I report a leak of confidential HR information?
Why are ambiguous grammars bad?
Finding a unambiguous grammarWhy is left recursion bad?Show that every grammar for an inherently ambiguous CFL has infinitely many ambiguitiesAmbiguous Grammar and SLR parsing table : No conflicts?Find unambiguous grammar for an ambiguous grammarHow to find whether a grammar is ambiguous?Converting Ambiguous Grammar G to LL(1)unambiguous grammar but it's not LR(1)LR parsers and ambiguous and non deterministic grammarsIs there any relationship between grammar being ambiguous and the language itself?Whether it's necessary for a grammar to be ambiguous when it is both left recursive and right recursive
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I understand that if there exist 2 or more left or right derivation trees, then the grammar is ambiguous, but I am unable to understand why it is so bad that everyone wants to get rid of it.
compilers ambiguity
$endgroup$
add a comment |
$begingroup$
I understand that if there exist 2 or more left or right derivation trees, then the grammar is ambiguous, but I am unable to understand why it is so bad that everyone wants to get rid of it.
compilers ambiguity
$endgroup$
1
$begingroup$
Related but not identical: softwareengineering.stackexchange.com/q/343872/206652 (disclaimer: I wrote the accepted answer)
$endgroup$
– marstato
Jun 9 at 21:45
$begingroup$
See also: "Finding an unambiguous grammar".
$endgroup$
– Rob
Jun 10 at 6:08
1
$begingroup$
Indeed unambiguous form are better for practical uses, unambiguous form use less number of productions rules build smaller tree in high (hence efficient compiler-take less time to parse). Most tools provide ability resolve ambiguity explicitly out side grammar.
$endgroup$
– Grijesh Chauhan
Jun 10 at 8:54
3
$begingroup$
"everyone wants to get rid of it". Well, that's just not true. In commercially relevant languages, it's common to see ambiguity added as languages evolve. E.g. C++ intentionally added the ambiguitystd::vector<std::vector<int>>
in 2011, which used to require a space between>>
before. The key insight is that these languages have many more users than vendors, so fixing a minor annoyance for users justifies a lot of work by implementors.
$endgroup$
– MSalters
Jun 11 at 7:16
add a comment |
$begingroup$
I understand that if there exist 2 or more left or right derivation trees, then the grammar is ambiguous, but I am unable to understand why it is so bad that everyone wants to get rid of it.
compilers ambiguity
$endgroup$
I understand that if there exist 2 or more left or right derivation trees, then the grammar is ambiguous, but I am unable to understand why it is so bad that everyone wants to get rid of it.
compilers ambiguity
compilers ambiguity
edited Jun 10 at 3:42
user2357112
2471 silver badge7 bronze badges
2471 silver badge7 bronze badges
asked Jun 9 at 12:52
HIRAK MONDALHIRAK MONDAL
1542 silver badges10 bronze badges
1542 silver badges10 bronze badges
1
$begingroup$
Related but not identical: softwareengineering.stackexchange.com/q/343872/206652 (disclaimer: I wrote the accepted answer)
$endgroup$
– marstato
Jun 9 at 21:45
$begingroup$
See also: "Finding an unambiguous grammar".
$endgroup$
– Rob
Jun 10 at 6:08
1
$begingroup$
Indeed unambiguous form are better for practical uses, unambiguous form use less number of productions rules build smaller tree in high (hence efficient compiler-take less time to parse). Most tools provide ability resolve ambiguity explicitly out side grammar.
$endgroup$
– Grijesh Chauhan
Jun 10 at 8:54
3
$begingroup$
"everyone wants to get rid of it". Well, that's just not true. In commercially relevant languages, it's common to see ambiguity added as languages evolve. E.g. C++ intentionally added the ambiguitystd::vector<std::vector<int>>
in 2011, which used to require a space between>>
before. The key insight is that these languages have many more users than vendors, so fixing a minor annoyance for users justifies a lot of work by implementors.
$endgroup$
– MSalters
Jun 11 at 7:16
add a comment |
1
$begingroup$
Related but not identical: softwareengineering.stackexchange.com/q/343872/206652 (disclaimer: I wrote the accepted answer)
$endgroup$
– marstato
Jun 9 at 21:45
$begingroup$
See also: "Finding an unambiguous grammar".
$endgroup$
– Rob
Jun 10 at 6:08
1
$begingroup$
Indeed unambiguous form are better for practical uses, unambiguous form use less number of productions rules build smaller tree in high (hence efficient compiler-take less time to parse). Most tools provide ability resolve ambiguity explicitly out side grammar.
$endgroup$
– Grijesh Chauhan
Jun 10 at 8:54
3
$begingroup$
"everyone wants to get rid of it". Well, that's just not true. In commercially relevant languages, it's common to see ambiguity added as languages evolve. E.g. C++ intentionally added the ambiguitystd::vector<std::vector<int>>
in 2011, which used to require a space between>>
before. The key insight is that these languages have many more users than vendors, so fixing a minor annoyance for users justifies a lot of work by implementors.
$endgroup$
– MSalters
Jun 11 at 7:16
1
1
$begingroup$
Related but not identical: softwareengineering.stackexchange.com/q/343872/206652 (disclaimer: I wrote the accepted answer)
$endgroup$
– marstato
Jun 9 at 21:45
$begingroup$
Related but not identical: softwareengineering.stackexchange.com/q/343872/206652 (disclaimer: I wrote the accepted answer)
$endgroup$
– marstato
Jun 9 at 21:45
$begingroup$
See also: "Finding an unambiguous grammar".
$endgroup$
– Rob
Jun 10 at 6:08
$begingroup$
See also: "Finding an unambiguous grammar".
$endgroup$
– Rob
Jun 10 at 6:08
1
1
$begingroup$
Indeed unambiguous form are better for practical uses, unambiguous form use less number of productions rules build smaller tree in high (hence efficient compiler-take less time to parse). Most tools provide ability resolve ambiguity explicitly out side grammar.
$endgroup$
– Grijesh Chauhan
Jun 10 at 8:54
$begingroup$
Indeed unambiguous form are better for practical uses, unambiguous form use less number of productions rules build smaller tree in high (hence efficient compiler-take less time to parse). Most tools provide ability resolve ambiguity explicitly out side grammar.
$endgroup$
– Grijesh Chauhan
Jun 10 at 8:54
3
3
$begingroup$
"everyone wants to get rid of it". Well, that's just not true. In commercially relevant languages, it's common to see ambiguity added as languages evolve. E.g. C++ intentionally added the ambiguity
std::vector<std::vector<int>>
in 2011, which used to require a space between >>
before. The key insight is that these languages have many more users than vendors, so fixing a minor annoyance for users justifies a lot of work by implementors.$endgroup$
– MSalters
Jun 11 at 7:16
$begingroup$
"everyone wants to get rid of it". Well, that's just not true. In commercially relevant languages, it's common to see ambiguity added as languages evolve. E.g. C++ intentionally added the ambiguity
std::vector<std::vector<int>>
in 2011, which used to require a space between >>
before. The key insight is that these languages have many more users than vendors, so fixing a minor annoyance for users justifies a lot of work by implementors.$endgroup$
– MSalters
Jun 11 at 7:16
add a comment |
6 Answers
6
active
oldest
votes
$begingroup$
Consider the following grammar for arithmetic expressions:
$$
X to X + X mid X - X mid X * X mid X / X mid textttvar mid textttconst
$$
Consider the following expression:
$$
a - b - c
$$
What is its value? Here are two possible parse trees:
According to the one on the left, we should interpret $a-b-c$ as $(a-b)-c$, which is the usual interpretation. According to the one on the right, we should interpret it as $a-(b-c) = a-b+c$, which is probably not what was intended.
When compiling a program, we want the interpretation of the syntax to be unambiguous. The easiest way to enforce this is using an unambiguous grammar. If the grammar is ambiguous, we can provide tie-breaking rules, like operator precedence and associativity. These rules can equivalently be expressed by making the grammar unambiguous in a particular way.
Parse trees generated using syntax tree generator.
$endgroup$
12
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator+
).
$endgroup$
– Bakuriu
Jun 9 at 21:39
13
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
3
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
add a comment |
$begingroup$
In contrast to the other existing answers [1, 2], there is indeed a field of application, where ambiguous grammars are useful. In the field of natural language processing (NLP), when you want to parse natural language (NL) with formal grammars, you've got the problem that NL is inherently ambiguous on different levels [adapted from Koh18, ch. 6.4]:
Syntactic ambuigity:
Peter chased the man in the red sports car
Was Peter or the man in the red sports car?
Semantic ambuigity:
Peter went to the bank
A bank to sit on or a bank to withdraw money from?
Pragmatic ambuigity:
Two men carried two bags
Did they carry the bags together or did each man carry two bags?
Different approaches for NLP deal differently with processing in general and in particular these ambuigities. For example, your pipeline might look as follows:
- Parse NL with ambiguous grammar
- For every resulting AST: run model generation to generate ambiguous semantic meanings and to rule out impossible syntactic ambiguities from step 1
- For every resulting model: save it in your cache.
You do this pipeline for every sentence. The more text, say, from the same book you process, the more you can rule out impossible superfluous models, which survived until step 3, from previous sentences.
As opposed to programming language, we can let go of the requirement that every NL sentence has precise semantics. Instead, we can just bookkeep multiple possible semantic models throughout parsing of larger texts. From while to while, later insights help us to rule out previous ambiguities.
If you want to get your hands dirty with parsers being able to output multiple derivations for ambiguous grammar, have a look at the Grammatical Framework. Also, [Koh18, ch. 5] has an introduction to it showing something similar to my pipeline above. Note though that since [Koh18] are lecture notes, the notes might not be that easy to understand on their own without the lectures.
References
[Koh18]: Michael Kohlhase. "Logic-Based Natural Language Processing. Winter Semester 2018/19. Lecture Notes." URL: https://kwarc.info/teaching/LBS/notes.pdf. URL of course description: https://kwarc.info/courses/lbs/ (in German)
[Koh18, ch. 5]: See chapter 5, "Implementing Fragments: Grammatical and Logical Frameworks", in [Koh18]
[Koh18, ch. 6.4] See chapter 6.4, "The computational Role of Ambiguities", in [Koh18]
$endgroup$
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
1
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
1
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
1
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
|
show 1 more comment
$begingroup$
Even if there’s a well-defined way to handle ambiguity (ambiguous expressions are syntax errors, for example), these grammars still cause trouble. As soon as you introduce ambiguity into a grammar, a parser can no longer be sure that the first match it gets is definitive. It needs to keep trying all the other ways to parse a statement, to rule out any ambiguity. You’re also not dealing with something simple like a LL(1) language, so you can’t use a simple, small, fast parser. Your grammar has symbols that can be read multiple ways, so you have to be prepared to backtrack a lot.
In some restricted domains, you might be able to get away with proving that all possible ways to parse an expression are equivalent (for example, because they represent an associative operation). (a+b) + c = a + (b+c).
$endgroup$
add a comment |
$begingroup$
Does IF a THEN IF b THEN x ELSE y
mean
IF a THEN
IF b THEN
x
ELSE
y
or
IF a THEN
IF b THEN x
ELSE
y
? AKA the dangling else problem.
$endgroup$
1
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
add a comment |
$begingroup$
Take the most vexing parse in C++ for example:
bar foo(foobar());
Is this a function declaration foo
of type bar(foobar())
(the parameter is a function pointer returning a foobar
), or a variable declaration foo
of type int
and initialized with a default initialized foobar
?
This is differentiated in compilers by assuming the first unless the expression inside the parameter list cannot be interpreted as a type.
when you get such an ambiguous expression the compiler has 2 options
assume that the expression is a particular derivation and add some disambiguator to the grammar to allow the other derivation to be expressed.
error out and require disambiguation either way
The first can fall out naturally, the second requires that the compiler programmer knows about the ambiguity.
If this ambiguity stays undetected then it is possible that 2 different compilers default to different derivations for that ambiguous expression. Leading to code being non-portable for non-obvious reasons. Which leads people to assume it's a bug in one of the compilers while it's actually a fault in the language specification.
$endgroup$
add a comment |
$begingroup$
I think the question contains an assumption that's only borderline correct at best.
In real life it's pretty common to simply live with ambiguous grammars, as long as they aren't (so to speak) too ambiguous.
For example, if you look around at grammars compiled with yacc (or similar, such as bison or byacc) you'll find that quite a few produce warnings about "N shift/reduct conflicts" when you compile them. When yacc encounters a shift/reduce conflict, that signals an ambiguity in the grammar.
A shift/reduce conflict, however, is usually a fairly minor problem. The parser generator will resolve the conflict in favor of the "shift" rather than the reduce. The grammar is perfectly fine if that's what you want (and it does seem to work out perfectly well in practice).
A shift/reduce conflict typically arises in a case on this general order (using caps for non-terminals and lower-case for terminals):
A -> B | c
B -> a | c
When we encounter a c
, there's an ambiguity: should we parse the c
directly as an A
, or should we parse it as a B
, which in turn is an A
? In a case like this, yacc and such will choose the simpler/shorter route, and parse the c
directly as an A
, rather than going the c
-> B
-> A
route. This can be wrong, but if so, it probably means you have a really simple error in your grammar, and you shouldn't allow the c
option as a possibility for A
at all.
Now, by contrast, we could have something more like this:
A -> B | C
B -> a | c
C -> b | c
Now when we encounter a c
we have conflict between whether to treat the c
as a B
or a C
. There's a lot less chance that an automatic conflict resolution strategy is going to choose what we really want. Neither of these is a "shift"--both are "reductions", so this is a "reduce/reduce conflict" (which those accustomed to yacc and such generally recognize as a much bigger problem than a shift/reduce conflict).
So, although I'm not sure I'd go quite so far as to say that anybody really welcomes ambiguity in their grammar, in at least some cases it's minor enough that nobody really cares a whole lot about it. In the abstract they might like the idea of removing all ambiguity--but not enough to always actually do it. For example, a small, simple grammar that contains a minor ambiguity can be preferable to a larger, more complex grammar that eliminates the ambiguity (especially when you get into the practical realm of actually generating a parser from the grammar, and finding that the unambiguous grammar produces a parser that won't run on your target machine).
$endgroup$
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "419"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f110402%2fwhy-are-ambiguous-grammars-bad%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Consider the following grammar for arithmetic expressions:
$$
X to X + X mid X - X mid X * X mid X / X mid textttvar mid textttconst
$$
Consider the following expression:
$$
a - b - c
$$
What is its value? Here are two possible parse trees:
According to the one on the left, we should interpret $a-b-c$ as $(a-b)-c$, which is the usual interpretation. According to the one on the right, we should interpret it as $a-(b-c) = a-b+c$, which is probably not what was intended.
When compiling a program, we want the interpretation of the syntax to be unambiguous. The easiest way to enforce this is using an unambiguous grammar. If the grammar is ambiguous, we can provide tie-breaking rules, like operator precedence and associativity. These rules can equivalently be expressed by making the grammar unambiguous in a particular way.
Parse trees generated using syntax tree generator.
$endgroup$
12
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator+
).
$endgroup$
– Bakuriu
Jun 9 at 21:39
13
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
3
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
add a comment |
$begingroup$
Consider the following grammar for arithmetic expressions:
$$
X to X + X mid X - X mid X * X mid X / X mid textttvar mid textttconst
$$
Consider the following expression:
$$
a - b - c
$$
What is its value? Here are two possible parse trees:
According to the one on the left, we should interpret $a-b-c$ as $(a-b)-c$, which is the usual interpretation. According to the one on the right, we should interpret it as $a-(b-c) = a-b+c$, which is probably not what was intended.
When compiling a program, we want the interpretation of the syntax to be unambiguous. The easiest way to enforce this is using an unambiguous grammar. If the grammar is ambiguous, we can provide tie-breaking rules, like operator precedence and associativity. These rules can equivalently be expressed by making the grammar unambiguous in a particular way.
Parse trees generated using syntax tree generator.
$endgroup$
12
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator+
).
$endgroup$
– Bakuriu
Jun 9 at 21:39
13
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
3
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
add a comment |
$begingroup$
Consider the following grammar for arithmetic expressions:
$$
X to X + X mid X - X mid X * X mid X / X mid textttvar mid textttconst
$$
Consider the following expression:
$$
a - b - c
$$
What is its value? Here are two possible parse trees:
According to the one on the left, we should interpret $a-b-c$ as $(a-b)-c$, which is the usual interpretation. According to the one on the right, we should interpret it as $a-(b-c) = a-b+c$, which is probably not what was intended.
When compiling a program, we want the interpretation of the syntax to be unambiguous. The easiest way to enforce this is using an unambiguous grammar. If the grammar is ambiguous, we can provide tie-breaking rules, like operator precedence and associativity. These rules can equivalently be expressed by making the grammar unambiguous in a particular way.
Parse trees generated using syntax tree generator.
$endgroup$
Consider the following grammar for arithmetic expressions:
$$
X to X + X mid X - X mid X * X mid X / X mid textttvar mid textttconst
$$
Consider the following expression:
$$
a - b - c
$$
What is its value? Here are two possible parse trees:
According to the one on the left, we should interpret $a-b-c$ as $(a-b)-c$, which is the usual interpretation. According to the one on the right, we should interpret it as $a-(b-c) = a-b+c$, which is probably not what was intended.
When compiling a program, we want the interpretation of the syntax to be unambiguous. The easiest way to enforce this is using an unambiguous grammar. If the grammar is ambiguous, we can provide tie-breaking rules, like operator precedence and associativity. These rules can equivalently be expressed by making the grammar unambiguous in a particular way.
Parse trees generated using syntax tree generator.
answered Jun 9 at 13:11
Yuval FilmusYuval Filmus
202k15 gold badges197 silver badges360 bronze badges
202k15 gold badges197 silver badges360 bronze badges
12
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator+
).
$endgroup$
– Bakuriu
Jun 9 at 21:39
13
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
3
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
add a comment |
12
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator+
).
$endgroup$
– Bakuriu
Jun 9 at 21:39
13
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
3
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
12
12
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator
+
).$endgroup$
– Bakuriu
Jun 9 at 21:39
$begingroup$
@HIRAKMONDAL The fact that the syntax is ambiguous is not real issue. the problem is that the two different parse trees have different behaviour. If your language has an ambiguous grammar but all parse trees for an expression are semantically equivalent then that wouldn't be a problem (e.g. take Yuval example and consider the case where your only operator
+
).$endgroup$
– Bakuriu
Jun 9 at 21:39
13
13
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
@Bakuriu What you said is true, but "semantically equivalent" is a tall order. For example, floating point arithmetic is actually not associative (so the two "+" trees would not be equivalent). Additionally even if the answer came out the same way, undefined evaluation order matters a lot in languages where expressions can have side effects. So what you said is technically true but in practice it would be very unusual for a grammar's ambiguity to have no repercussions to the use of that grammar.
$endgroup$
– Richard Rast
Jun 10 at 2:03
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
$begingroup$
Some languages nowadays check for integer overflow in additions, so even a+b+c for integers depends on the order of evaluation.
$endgroup$
– gnasher729
Jun 10 at 7:30
3
3
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
$begingroup$
Even worse, in some cases the grammar doesn't provide any way to achieve the alternate meaning. I've seen this in query languages, where the choice of escape grammar (e.g. double the special character to escape it) makes certain queries impossible to express.
$endgroup$
– OrangeDog
Jun 10 at 12:55
add a comment |
$begingroup$
In contrast to the other existing answers [1, 2], there is indeed a field of application, where ambiguous grammars are useful. In the field of natural language processing (NLP), when you want to parse natural language (NL) with formal grammars, you've got the problem that NL is inherently ambiguous on different levels [adapted from Koh18, ch. 6.4]:
Syntactic ambuigity:
Peter chased the man in the red sports car
Was Peter or the man in the red sports car?
Semantic ambuigity:
Peter went to the bank
A bank to sit on or a bank to withdraw money from?
Pragmatic ambuigity:
Two men carried two bags
Did they carry the bags together or did each man carry two bags?
Different approaches for NLP deal differently with processing in general and in particular these ambuigities. For example, your pipeline might look as follows:
- Parse NL with ambiguous grammar
- For every resulting AST: run model generation to generate ambiguous semantic meanings and to rule out impossible syntactic ambiguities from step 1
- For every resulting model: save it in your cache.
You do this pipeline for every sentence. The more text, say, from the same book you process, the more you can rule out impossible superfluous models, which survived until step 3, from previous sentences.
As opposed to programming language, we can let go of the requirement that every NL sentence has precise semantics. Instead, we can just bookkeep multiple possible semantic models throughout parsing of larger texts. From while to while, later insights help us to rule out previous ambiguities.
If you want to get your hands dirty with parsers being able to output multiple derivations for ambiguous grammar, have a look at the Grammatical Framework. Also, [Koh18, ch. 5] has an introduction to it showing something similar to my pipeline above. Note though that since [Koh18] are lecture notes, the notes might not be that easy to understand on their own without the lectures.
References
[Koh18]: Michael Kohlhase. "Logic-Based Natural Language Processing. Winter Semester 2018/19. Lecture Notes." URL: https://kwarc.info/teaching/LBS/notes.pdf. URL of course description: https://kwarc.info/courses/lbs/ (in German)
[Koh18, ch. 5]: See chapter 5, "Implementing Fragments: Grammatical and Logical Frameworks", in [Koh18]
[Koh18, ch. 6.4] See chapter 6.4, "The computational Role of Ambiguities", in [Koh18]
$endgroup$
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
1
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
1
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
1
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
|
show 1 more comment
$begingroup$
In contrast to the other existing answers [1, 2], there is indeed a field of application, where ambiguous grammars are useful. In the field of natural language processing (NLP), when you want to parse natural language (NL) with formal grammars, you've got the problem that NL is inherently ambiguous on different levels [adapted from Koh18, ch. 6.4]:
Syntactic ambuigity:
Peter chased the man in the red sports car
Was Peter or the man in the red sports car?
Semantic ambuigity:
Peter went to the bank
A bank to sit on or a bank to withdraw money from?
Pragmatic ambuigity:
Two men carried two bags
Did they carry the bags together or did each man carry two bags?
Different approaches for NLP deal differently with processing in general and in particular these ambuigities. For example, your pipeline might look as follows:
- Parse NL with ambiguous grammar
- For every resulting AST: run model generation to generate ambiguous semantic meanings and to rule out impossible syntactic ambiguities from step 1
- For every resulting model: save it in your cache.
You do this pipeline for every sentence. The more text, say, from the same book you process, the more you can rule out impossible superfluous models, which survived until step 3, from previous sentences.
As opposed to programming language, we can let go of the requirement that every NL sentence has precise semantics. Instead, we can just bookkeep multiple possible semantic models throughout parsing of larger texts. From while to while, later insights help us to rule out previous ambiguities.
If you want to get your hands dirty with parsers being able to output multiple derivations for ambiguous grammar, have a look at the Grammatical Framework. Also, [Koh18, ch. 5] has an introduction to it showing something similar to my pipeline above. Note though that since [Koh18] are lecture notes, the notes might not be that easy to understand on their own without the lectures.
References
[Koh18]: Michael Kohlhase. "Logic-Based Natural Language Processing. Winter Semester 2018/19. Lecture Notes." URL: https://kwarc.info/teaching/LBS/notes.pdf. URL of course description: https://kwarc.info/courses/lbs/ (in German)
[Koh18, ch. 5]: See chapter 5, "Implementing Fragments: Grammatical and Logical Frameworks", in [Koh18]
[Koh18, ch. 6.4] See chapter 6.4, "The computational Role of Ambiguities", in [Koh18]
$endgroup$
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
1
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
1
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
1
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
|
show 1 more comment
$begingroup$
In contrast to the other existing answers [1, 2], there is indeed a field of application, where ambiguous grammars are useful. In the field of natural language processing (NLP), when you want to parse natural language (NL) with formal grammars, you've got the problem that NL is inherently ambiguous on different levels [adapted from Koh18, ch. 6.4]:
Syntactic ambuigity:
Peter chased the man in the red sports car
Was Peter or the man in the red sports car?
Semantic ambuigity:
Peter went to the bank
A bank to sit on or a bank to withdraw money from?
Pragmatic ambuigity:
Two men carried two bags
Did they carry the bags together or did each man carry two bags?
Different approaches for NLP deal differently with processing in general and in particular these ambuigities. For example, your pipeline might look as follows:
- Parse NL with ambiguous grammar
- For every resulting AST: run model generation to generate ambiguous semantic meanings and to rule out impossible syntactic ambiguities from step 1
- For every resulting model: save it in your cache.
You do this pipeline for every sentence. The more text, say, from the same book you process, the more you can rule out impossible superfluous models, which survived until step 3, from previous sentences.
As opposed to programming language, we can let go of the requirement that every NL sentence has precise semantics. Instead, we can just bookkeep multiple possible semantic models throughout parsing of larger texts. From while to while, later insights help us to rule out previous ambiguities.
If you want to get your hands dirty with parsers being able to output multiple derivations for ambiguous grammar, have a look at the Grammatical Framework. Also, [Koh18, ch. 5] has an introduction to it showing something similar to my pipeline above. Note though that since [Koh18] are lecture notes, the notes might not be that easy to understand on their own without the lectures.
References
[Koh18]: Michael Kohlhase. "Logic-Based Natural Language Processing. Winter Semester 2018/19. Lecture Notes." URL: https://kwarc.info/teaching/LBS/notes.pdf. URL of course description: https://kwarc.info/courses/lbs/ (in German)
[Koh18, ch. 5]: See chapter 5, "Implementing Fragments: Grammatical and Logical Frameworks", in [Koh18]
[Koh18, ch. 6.4] See chapter 6.4, "The computational Role of Ambiguities", in [Koh18]
$endgroup$
In contrast to the other existing answers [1, 2], there is indeed a field of application, where ambiguous grammars are useful. In the field of natural language processing (NLP), when you want to parse natural language (NL) with formal grammars, you've got the problem that NL is inherently ambiguous on different levels [adapted from Koh18, ch. 6.4]:
Syntactic ambuigity:
Peter chased the man in the red sports car
Was Peter or the man in the red sports car?
Semantic ambuigity:
Peter went to the bank
A bank to sit on or a bank to withdraw money from?
Pragmatic ambuigity:
Two men carried two bags
Did they carry the bags together or did each man carry two bags?
Different approaches for NLP deal differently with processing in general and in particular these ambuigities. For example, your pipeline might look as follows:
- Parse NL with ambiguous grammar
- For every resulting AST: run model generation to generate ambiguous semantic meanings and to rule out impossible syntactic ambiguities from step 1
- For every resulting model: save it in your cache.
You do this pipeline for every sentence. The more text, say, from the same book you process, the more you can rule out impossible superfluous models, which survived until step 3, from previous sentences.
As opposed to programming language, we can let go of the requirement that every NL sentence has precise semantics. Instead, we can just bookkeep multiple possible semantic models throughout parsing of larger texts. From while to while, later insights help us to rule out previous ambiguities.
If you want to get your hands dirty with parsers being able to output multiple derivations for ambiguous grammar, have a look at the Grammatical Framework. Also, [Koh18, ch. 5] has an introduction to it showing something similar to my pipeline above. Note though that since [Koh18] are lecture notes, the notes might not be that easy to understand on their own without the lectures.
References
[Koh18]: Michael Kohlhase. "Logic-Based Natural Language Processing. Winter Semester 2018/19. Lecture Notes." URL: https://kwarc.info/teaching/LBS/notes.pdf. URL of course description: https://kwarc.info/courses/lbs/ (in German)
[Koh18, ch. 5]: See chapter 5, "Implementing Fragments: Grammatical and Logical Frameworks", in [Koh18]
[Koh18, ch. 6.4] See chapter 6.4, "The computational Role of Ambiguities", in [Koh18]
edited Jun 15 at 7:03
answered Jun 10 at 8:50
ComFreekComFreek
2511 silver badge7 bronze badges
2511 silver badge7 bronze badges
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
1
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
1
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
1
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
|
show 1 more comment
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
1
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
1
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
1
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
$begingroup$
Thanks a ton.. I had the same doubt and u cleared it.. :)
$endgroup$
– HIRAK MONDAL
Jun 10 at 9:25
1
1
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
Not to mention problems with Buffalo buffalo buffalo Buffalo buffalo buffalo ... for a suitable number of buffalo
$endgroup$
– Hagen von Eitzen
Jun 10 at 11:13
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
$begingroup$
You write, “in contrast,” but I’d call this the other side of the coin from what I answered. Parsing natural languages with their ambiguous grammars is so hard that traditional parsers can’t do it!
$endgroup$
– Davislor
Jun 10 at 16:46
1
1
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
$begingroup$
@ComFreek I should be more precise here. A brief look at GF (Thanks for the link!) shows that it reads context-free grammars with three extensions (such as allowing reduplication) and returns a list of all possible derivations. Algorithms to do that have been around since the ’50s. However, being able to handle fully-general CFGs means your worst-case runtime blows up, and in practice, even when using a general parser such as GLL, software engineers try to use a subset of CFGs, such as LL grammars, that can be parsed more efficiently.
$endgroup$
– Davislor
Jun 11 at 17:07
1
1
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
$begingroup$
@ComFreek So it’s not that computers can’t handle CFG (although natural languages are not really context-free and actually-useful machine translation uses completely different techniques). It’s that, if you require your parser to handle ambiguity, that rules out certain shortcuts that would have made it more efficient.
$endgroup$
– Davislor
Jun 11 at 17:12
|
show 1 more comment
$begingroup$
Even if there’s a well-defined way to handle ambiguity (ambiguous expressions are syntax errors, for example), these grammars still cause trouble. As soon as you introduce ambiguity into a grammar, a parser can no longer be sure that the first match it gets is definitive. It needs to keep trying all the other ways to parse a statement, to rule out any ambiguity. You’re also not dealing with something simple like a LL(1) language, so you can’t use a simple, small, fast parser. Your grammar has symbols that can be read multiple ways, so you have to be prepared to backtrack a lot.
In some restricted domains, you might be able to get away with proving that all possible ways to parse an expression are equivalent (for example, because they represent an associative operation). (a+b) + c = a + (b+c).
$endgroup$
add a comment |
$begingroup$
Even if there’s a well-defined way to handle ambiguity (ambiguous expressions are syntax errors, for example), these grammars still cause trouble. As soon as you introduce ambiguity into a grammar, a parser can no longer be sure that the first match it gets is definitive. It needs to keep trying all the other ways to parse a statement, to rule out any ambiguity. You’re also not dealing with something simple like a LL(1) language, so you can’t use a simple, small, fast parser. Your grammar has symbols that can be read multiple ways, so you have to be prepared to backtrack a lot.
In some restricted domains, you might be able to get away with proving that all possible ways to parse an expression are equivalent (for example, because they represent an associative operation). (a+b) + c = a + (b+c).
$endgroup$
add a comment |
$begingroup$
Even if there’s a well-defined way to handle ambiguity (ambiguous expressions are syntax errors, for example), these grammars still cause trouble. As soon as you introduce ambiguity into a grammar, a parser can no longer be sure that the first match it gets is definitive. It needs to keep trying all the other ways to parse a statement, to rule out any ambiguity. You’re also not dealing with something simple like a LL(1) language, so you can’t use a simple, small, fast parser. Your grammar has symbols that can be read multiple ways, so you have to be prepared to backtrack a lot.
In some restricted domains, you might be able to get away with proving that all possible ways to parse an expression are equivalent (for example, because they represent an associative operation). (a+b) + c = a + (b+c).
$endgroup$
Even if there’s a well-defined way to handle ambiguity (ambiguous expressions are syntax errors, for example), these grammars still cause trouble. As soon as you introduce ambiguity into a grammar, a parser can no longer be sure that the first match it gets is definitive. It needs to keep trying all the other ways to parse a statement, to rule out any ambiguity. You’re also not dealing with something simple like a LL(1) language, so you can’t use a simple, small, fast parser. Your grammar has symbols that can be read multiple ways, so you have to be prepared to backtrack a lot.
In some restricted domains, you might be able to get away with proving that all possible ways to parse an expression are equivalent (for example, because they represent an associative operation). (a+b) + c = a + (b+c).
answered Jun 9 at 21:28
DavislorDavislor
9234 silver badges9 bronze badges
9234 silver badges9 bronze badges
add a comment |
add a comment |
$begingroup$
Does IF a THEN IF b THEN x ELSE y
mean
IF a THEN
IF b THEN
x
ELSE
y
or
IF a THEN
IF b THEN x
ELSE
y
? AKA the dangling else problem.
$endgroup$
1
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
add a comment |
$begingroup$
Does IF a THEN IF b THEN x ELSE y
mean
IF a THEN
IF b THEN
x
ELSE
y
or
IF a THEN
IF b THEN x
ELSE
y
? AKA the dangling else problem.
$endgroup$
1
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
add a comment |
$begingroup$
Does IF a THEN IF b THEN x ELSE y
mean
IF a THEN
IF b THEN
x
ELSE
y
or
IF a THEN
IF b THEN x
ELSE
y
? AKA the dangling else problem.
$endgroup$
Does IF a THEN IF b THEN x ELSE y
mean
IF a THEN
IF b THEN
x
ELSE
y
or
IF a THEN
IF b THEN x
ELSE
y
? AKA the dangling else problem.
answered Jun 10 at 13:58
David RicherbyDavid Richerby
73.1k16 gold badges114 silver badges202 bronze badges
73.1k16 gold badges114 silver badges202 bronze badges
1
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
add a comment |
1
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
1
1
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
$begingroup$
That's a good example showing that even a non-ambiguous grammar (as in Java, C, C++, ...) allows apparent (!) ambiguities from a human perspective. Even though we are formally and computationally fine, we now got more of a UX/bug-free development problem.
$endgroup$
– ComFreek
Jun 10 at 14:29
add a comment |
$begingroup$
Take the most vexing parse in C++ for example:
bar foo(foobar());
Is this a function declaration foo
of type bar(foobar())
(the parameter is a function pointer returning a foobar
), or a variable declaration foo
of type int
and initialized with a default initialized foobar
?
This is differentiated in compilers by assuming the first unless the expression inside the parameter list cannot be interpreted as a type.
when you get such an ambiguous expression the compiler has 2 options
assume that the expression is a particular derivation and add some disambiguator to the grammar to allow the other derivation to be expressed.
error out and require disambiguation either way
The first can fall out naturally, the second requires that the compiler programmer knows about the ambiguity.
If this ambiguity stays undetected then it is possible that 2 different compilers default to different derivations for that ambiguous expression. Leading to code being non-portable for non-obvious reasons. Which leads people to assume it's a bug in one of the compilers while it's actually a fault in the language specification.
$endgroup$
add a comment |
$begingroup$
Take the most vexing parse in C++ for example:
bar foo(foobar());
Is this a function declaration foo
of type bar(foobar())
(the parameter is a function pointer returning a foobar
), or a variable declaration foo
of type int
and initialized with a default initialized foobar
?
This is differentiated in compilers by assuming the first unless the expression inside the parameter list cannot be interpreted as a type.
when you get such an ambiguous expression the compiler has 2 options
assume that the expression is a particular derivation and add some disambiguator to the grammar to allow the other derivation to be expressed.
error out and require disambiguation either way
The first can fall out naturally, the second requires that the compiler programmer knows about the ambiguity.
If this ambiguity stays undetected then it is possible that 2 different compilers default to different derivations for that ambiguous expression. Leading to code being non-portable for non-obvious reasons. Which leads people to assume it's a bug in one of the compilers while it's actually a fault in the language specification.
$endgroup$
add a comment |
$begingroup$
Take the most vexing parse in C++ for example:
bar foo(foobar());
Is this a function declaration foo
of type bar(foobar())
(the parameter is a function pointer returning a foobar
), or a variable declaration foo
of type int
and initialized with a default initialized foobar
?
This is differentiated in compilers by assuming the first unless the expression inside the parameter list cannot be interpreted as a type.
when you get such an ambiguous expression the compiler has 2 options
assume that the expression is a particular derivation and add some disambiguator to the grammar to allow the other derivation to be expressed.
error out and require disambiguation either way
The first can fall out naturally, the second requires that the compiler programmer knows about the ambiguity.
If this ambiguity stays undetected then it is possible that 2 different compilers default to different derivations for that ambiguous expression. Leading to code being non-portable for non-obvious reasons. Which leads people to assume it's a bug in one of the compilers while it's actually a fault in the language specification.
$endgroup$
Take the most vexing parse in C++ for example:
bar foo(foobar());
Is this a function declaration foo
of type bar(foobar())
(the parameter is a function pointer returning a foobar
), or a variable declaration foo
of type int
and initialized with a default initialized foobar
?
This is differentiated in compilers by assuming the first unless the expression inside the parameter list cannot be interpreted as a type.
when you get such an ambiguous expression the compiler has 2 options
assume that the expression is a particular derivation and add some disambiguator to the grammar to allow the other derivation to be expressed.
error out and require disambiguation either way
The first can fall out naturally, the second requires that the compiler programmer knows about the ambiguity.
If this ambiguity stays undetected then it is possible that 2 different compilers default to different derivations for that ambiguous expression. Leading to code being non-portable for non-obvious reasons. Which leads people to assume it's a bug in one of the compilers while it's actually a fault in the language specification.
answered Jun 11 at 9:31
ratchet freakratchet freak
3,0239 silver badges10 bronze badges
3,0239 silver badges10 bronze badges
add a comment |
add a comment |
$begingroup$
I think the question contains an assumption that's only borderline correct at best.
In real life it's pretty common to simply live with ambiguous grammars, as long as they aren't (so to speak) too ambiguous.
For example, if you look around at grammars compiled with yacc (or similar, such as bison or byacc) you'll find that quite a few produce warnings about "N shift/reduct conflicts" when you compile them. When yacc encounters a shift/reduce conflict, that signals an ambiguity in the grammar.
A shift/reduce conflict, however, is usually a fairly minor problem. The parser generator will resolve the conflict in favor of the "shift" rather than the reduce. The grammar is perfectly fine if that's what you want (and it does seem to work out perfectly well in practice).
A shift/reduce conflict typically arises in a case on this general order (using caps for non-terminals and lower-case for terminals):
A -> B | c
B -> a | c
When we encounter a c
, there's an ambiguity: should we parse the c
directly as an A
, or should we parse it as a B
, which in turn is an A
? In a case like this, yacc and such will choose the simpler/shorter route, and parse the c
directly as an A
, rather than going the c
-> B
-> A
route. This can be wrong, but if so, it probably means you have a really simple error in your grammar, and you shouldn't allow the c
option as a possibility for A
at all.
Now, by contrast, we could have something more like this:
A -> B | C
B -> a | c
C -> b | c
Now when we encounter a c
we have conflict between whether to treat the c
as a B
or a C
. There's a lot less chance that an automatic conflict resolution strategy is going to choose what we really want. Neither of these is a "shift"--both are "reductions", so this is a "reduce/reduce conflict" (which those accustomed to yacc and such generally recognize as a much bigger problem than a shift/reduce conflict).
So, although I'm not sure I'd go quite so far as to say that anybody really welcomes ambiguity in their grammar, in at least some cases it's minor enough that nobody really cares a whole lot about it. In the abstract they might like the idea of removing all ambiguity--but not enough to always actually do it. For example, a small, simple grammar that contains a minor ambiguity can be preferable to a larger, more complex grammar that eliminates the ambiguity (especially when you get into the practical realm of actually generating a parser from the grammar, and finding that the unambiguous grammar produces a parser that won't run on your target machine).
$endgroup$
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
add a comment |
$begingroup$
I think the question contains an assumption that's only borderline correct at best.
In real life it's pretty common to simply live with ambiguous grammars, as long as they aren't (so to speak) too ambiguous.
For example, if you look around at grammars compiled with yacc (or similar, such as bison or byacc) you'll find that quite a few produce warnings about "N shift/reduct conflicts" when you compile them. When yacc encounters a shift/reduce conflict, that signals an ambiguity in the grammar.
A shift/reduce conflict, however, is usually a fairly minor problem. The parser generator will resolve the conflict in favor of the "shift" rather than the reduce. The grammar is perfectly fine if that's what you want (and it does seem to work out perfectly well in practice).
A shift/reduce conflict typically arises in a case on this general order (using caps for non-terminals and lower-case for terminals):
A -> B | c
B -> a | c
When we encounter a c
, there's an ambiguity: should we parse the c
directly as an A
, or should we parse it as a B
, which in turn is an A
? In a case like this, yacc and such will choose the simpler/shorter route, and parse the c
directly as an A
, rather than going the c
-> B
-> A
route. This can be wrong, but if so, it probably means you have a really simple error in your grammar, and you shouldn't allow the c
option as a possibility for A
at all.
Now, by contrast, we could have something more like this:
A -> B | C
B -> a | c
C -> b | c
Now when we encounter a c
we have conflict between whether to treat the c
as a B
or a C
. There's a lot less chance that an automatic conflict resolution strategy is going to choose what we really want. Neither of these is a "shift"--both are "reductions", so this is a "reduce/reduce conflict" (which those accustomed to yacc and such generally recognize as a much bigger problem than a shift/reduce conflict).
So, although I'm not sure I'd go quite so far as to say that anybody really welcomes ambiguity in their grammar, in at least some cases it's minor enough that nobody really cares a whole lot about it. In the abstract they might like the idea of removing all ambiguity--but not enough to always actually do it. For example, a small, simple grammar that contains a minor ambiguity can be preferable to a larger, more complex grammar that eliminates the ambiguity (especially when you get into the practical realm of actually generating a parser from the grammar, and finding that the unambiguous grammar produces a parser that won't run on your target machine).
$endgroup$
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
add a comment |
$begingroup$
I think the question contains an assumption that's only borderline correct at best.
In real life it's pretty common to simply live with ambiguous grammars, as long as they aren't (so to speak) too ambiguous.
For example, if you look around at grammars compiled with yacc (or similar, such as bison or byacc) you'll find that quite a few produce warnings about "N shift/reduct conflicts" when you compile them. When yacc encounters a shift/reduce conflict, that signals an ambiguity in the grammar.
A shift/reduce conflict, however, is usually a fairly minor problem. The parser generator will resolve the conflict in favor of the "shift" rather than the reduce. The grammar is perfectly fine if that's what you want (and it does seem to work out perfectly well in practice).
A shift/reduce conflict typically arises in a case on this general order (using caps for non-terminals and lower-case for terminals):
A -> B | c
B -> a | c
When we encounter a c
, there's an ambiguity: should we parse the c
directly as an A
, or should we parse it as a B
, which in turn is an A
? In a case like this, yacc and such will choose the simpler/shorter route, and parse the c
directly as an A
, rather than going the c
-> B
-> A
route. This can be wrong, but if so, it probably means you have a really simple error in your grammar, and you shouldn't allow the c
option as a possibility for A
at all.
Now, by contrast, we could have something more like this:
A -> B | C
B -> a | c
C -> b | c
Now when we encounter a c
we have conflict between whether to treat the c
as a B
or a C
. There's a lot less chance that an automatic conflict resolution strategy is going to choose what we really want. Neither of these is a "shift"--both are "reductions", so this is a "reduce/reduce conflict" (which those accustomed to yacc and such generally recognize as a much bigger problem than a shift/reduce conflict).
So, although I'm not sure I'd go quite so far as to say that anybody really welcomes ambiguity in their grammar, in at least some cases it's minor enough that nobody really cares a whole lot about it. In the abstract they might like the idea of removing all ambiguity--but not enough to always actually do it. For example, a small, simple grammar that contains a minor ambiguity can be preferable to a larger, more complex grammar that eliminates the ambiguity (especially when you get into the practical realm of actually generating a parser from the grammar, and finding that the unambiguous grammar produces a parser that won't run on your target machine).
$endgroup$
I think the question contains an assumption that's only borderline correct at best.
In real life it's pretty common to simply live with ambiguous grammars, as long as they aren't (so to speak) too ambiguous.
For example, if you look around at grammars compiled with yacc (or similar, such as bison or byacc) you'll find that quite a few produce warnings about "N shift/reduct conflicts" when you compile them. When yacc encounters a shift/reduce conflict, that signals an ambiguity in the grammar.
A shift/reduce conflict, however, is usually a fairly minor problem. The parser generator will resolve the conflict in favor of the "shift" rather than the reduce. The grammar is perfectly fine if that's what you want (and it does seem to work out perfectly well in practice).
A shift/reduce conflict typically arises in a case on this general order (using caps for non-terminals and lower-case for terminals):
A -> B | c
B -> a | c
When we encounter a c
, there's an ambiguity: should we parse the c
directly as an A
, or should we parse it as a B
, which in turn is an A
? In a case like this, yacc and such will choose the simpler/shorter route, and parse the c
directly as an A
, rather than going the c
-> B
-> A
route. This can be wrong, but if so, it probably means you have a really simple error in your grammar, and you shouldn't allow the c
option as a possibility for A
at all.
Now, by contrast, we could have something more like this:
A -> B | C
B -> a | c
C -> b | c
Now when we encounter a c
we have conflict between whether to treat the c
as a B
or a C
. There's a lot less chance that an automatic conflict resolution strategy is going to choose what we really want. Neither of these is a "shift"--both are "reductions", so this is a "reduce/reduce conflict" (which those accustomed to yacc and such generally recognize as a much bigger problem than a shift/reduce conflict).
So, although I'm not sure I'd go quite so far as to say that anybody really welcomes ambiguity in their grammar, in at least some cases it's minor enough that nobody really cares a whole lot about it. In the abstract they might like the idea of removing all ambiguity--but not enough to always actually do it. For example, a small, simple grammar that contains a minor ambiguity can be preferable to a larger, more complex grammar that eliminates the ambiguity (especially when you get into the practical realm of actually generating a parser from the grammar, and finding that the unambiguous grammar produces a parser that won't run on your target machine).
edited Jun 12 at 20:24
answered Jun 11 at 20:18
Jerry CoffinJerry Coffin
3312 silver badges7 bronze badges
3312 silver badges7 bronze badges
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
add a comment |
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
$begingroup$
man, wish i'd had this excellent explanation of shift-reduce conflicts 5 months ago! ^^; +1
$endgroup$
– HotelCalifornia
Jun 12 at 8:23
add a comment |
Thanks for contributing an answer to Computer Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f110402%2fwhy-are-ambiguous-grammars-bad%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Related but not identical: softwareengineering.stackexchange.com/q/343872/206652 (disclaimer: I wrote the accepted answer)
$endgroup$
– marstato
Jun 9 at 21:45
$begingroup$
See also: "Finding an unambiguous grammar".
$endgroup$
– Rob
Jun 10 at 6:08
1
$begingroup$
Indeed unambiguous form are better for practical uses, unambiguous form use less number of productions rules build smaller tree in high (hence efficient compiler-take less time to parse). Most tools provide ability resolve ambiguity explicitly out side grammar.
$endgroup$
– Grijesh Chauhan
Jun 10 at 8:54
3
$begingroup$
"everyone wants to get rid of it". Well, that's just not true. In commercially relevant languages, it's common to see ambiguity added as languages evolve. E.g. C++ intentionally added the ambiguity
std::vector<std::vector<int>>
in 2011, which used to require a space between>>
before. The key insight is that these languages have many more users than vendors, so fixing a minor annoyance for users justifies a lot of work by implementors.$endgroup$
– MSalters
Jun 11 at 7:16