Otdfbt

Question

I’m trying to wrap my head around zero knowledge proofs, but I’m having trouble understanding it.

In my current understanding, zero-knowledge proofs prove to the recipient that the sender has a certain knowledge without disclosing it. Like trying to say your password without actually giving it.

Many sources go at it with a convoluted method like a tunnel where there’s a hidden gate, and Bob is trying to know if Alice can go through the hidden gate. But then it starts saying that Bob should not see Alice enter. Wouldn’t it be easier for Bob to just see Alice go in one tunnel and come out the other? That would not disclose the secret way of opening the tunnel, right?

In that line of thinking, given a one-way function like a hash, couldn’t the other person just hash their secret and let the other see that the hashes compare?

I must be missing something (as there’s a lot of research going into it and hashes are well-known), but I can’t wrap my head around it.

Can someone tell me what’s wrong with my understanding of zero knowledge proofs?

One way of looking at it is this: if Alice just goes into one tunnel and comes out of the other, Bob can videotape this and convince Charlie that Alice knows the secret. The method as usually given in the literature doesn’t allow Bob to prove anything to any third parties. — May 29 at 14:11
An approachable discussion of zero knowledge proofs: Zero Knowledge Proofs: An illustrated primer. It is an interesting read - especially the part about time machines and how information could be leaked whether time runs forwards or backwards. — May 30 at 2:20

score 27 · Accepted Answer · 2019-06-01 22:51:58Z

There are three issues in your proposal, which I'll go over one by one; I hope this will clarify the concept.

The first issue is that the purpose of a zero-knowledge proof is not only to prove knowledge of some information without disclosing it, but something much, much more powerful: the goal is to prove that you know some information$^1$ without disclosing anything at all beyond the fact that you know this information. The point here is that you have no idea what your opponent is going to do with the information you leak about your secret value. It might well be that even some apparently harmless leakage can allow your opponent to do something unexpected and bad. Consider the authentication protocol you suggest, with a hash function $H$: here, given a secret value $v$, you leak $H(v)$. This is clearly not "no information", and you have no guarantee that this $H(v)$ cannot be used to do something bad. So, the way we define zero-knowledge proofs aims at anticipating every possible scenario, as follows: we say that the protocol is zero-knowledge if there exist an efficient algorithm that could produce an interaction with the verifier which is indistinguishable from an honest interaction, but without knowing the secret value. The intuition between this definition is that if the transcript of the interaction cannot be distinguished from something that could have been produced without even knowing the secret value, then it cannot convey anything useful about this value to the verifier (of course, defining that formally requires some care). Your protocol clearly does not satisfy this.

The second issue is the generality of the functionalities we want to consider in general. In your example, you actually assumed that you were only proving that you know the same value as the one already held by your opponent (since she must hash the value herself to compare). But this is not a very useful situation in general. A much more general setting would be for example the following: some public ciphertext $c$ is known, and you want to prove to an opponent that you know its plaintext, but without disclosing the actual plaintext. Here, your hash-then-send simple approach does obviously not work at all. In fact, zero-knowledge proofs deal with even much more general situations than this, allowing to prove wide classes of statements about secret values, which can be hashed, committed, encrypted, signed, or whatever you like.

Eventually, the third issue is that a zero-knowledge proof of knowledge must, well, prove knowledge. A protocol proves knowledge of a value if given the code of the prover, it is actually possible to extract this value efficiently (this is the formal definition of "knowledge" in cryptography). This is also clearly not possible in your example. In fact, your protocol does not prove knowledge of the secret value $v$ at all, since the prover would just have to know $H(v)$ to complete it - which, as you pointed out yourself, does not imply that he knows $v$, since $H$ is one-way.

Let's go over an actual example, to make all of that more concrete.

Take a finite group $G$ of prime order $p$ (I assume some familiarity with basic algebra here). Fix a generator $g$. Consider now another group element $h$. The goal of the prover is to demonstrate to the verifier that he knows a secret value $x$ such that $h = g^x$ (note that such an $x$ exists since $g$ is a generator, but finding it given $h$ might be hard - it's the discrete logarithm problem). This is very useful for authentication: $h$ will be the "public identity" of the prover, and demonstrating knowledge of $x$ such that $g^x = h$ can be used to authenticate him as the owner of the "private identity" associated to $h$ (its exponent in base $g$). Take one second to convince yourself that no simple "hash based" solution does work here.

Here is a protocol that actually works:

The prover picks a random exponent $r$ and sends $R = g^r$ to the verifier.

The verifier picks a random exponent $e$ (the challenge) and sends it to the prover.

The prover computes and sends $d = ex+r$ to the verifier.

The verifier checks that $h^ecdot R = g^d$.

Take a few seconds to check that the protocol is correct, meaning, if the prover plays honestly, the check of the verifier will succeed.

Why is this zero-knowledge$^2$? Because one can generate a transcript that is perfectly indistinguishable from an honest transcript of this protocol, but without knowing anything about $x$: pick $(e,d)$ at random, then set $R gets g^d/h^e$, and output the transcript $(R,e,d)$. Note that this can be easily checked to give a transcript uniformly distributed over all transcripts satisfying $h^ecdot R = g^d$ - i.e., uniformly distributed accross all honest transcripts. Therefore, interacting with the prover in this protocol does not convey any information at all about $x$ (since a transcript following the exact same distribution could have been generated without knowing $x$).

Why does this prove knowledge of $x$? To show this, I must show that given the code of a prover that succeeds in this proof, I can efficiently recover $x$ - hence showing that this prover actually knows $x$. I do it as follows: I run the prover code, to get $R$. Then, I put a breakpoint in the code, fork it, and run it twice on two random different challenges $(e_0,e_1)$ that I choose. The first copy of the code outputs $d_0$, and the second copy of the code outputs $d_1$. Since this is the code of a successful prover, I know that the check passes$^3$, hence I have $(R, e_0, e_1, d_0, d_1)$ such that:

$h^e_0cdot R = g^d_0$

$h^e_1cdot R = g^d_1$

which gives after a few easy manipulations $g^(d_0-d_1)cdot(e_0-e_1)^-1 = h$

Therefore, the value $x$ we are looking for is just $(d_0-d_1)cdot(e_0-e_1)^-1$, and we have successfully extracted it. This concludes the proof.

(1) actually, that's only for the specific case of zero-knowledge proofs of knowledge; in full generality, there are two variants of zero-knowledge proofs, which can be used either to show that some statement is true (existential proof), or that you know a proof of some statement (proof of knowledge). I focus on the latter since it seems to be the one you read about.

(2) in fact it's not truly zero-knowledge, formally it only satisfies a weaker definition known as honest-verifier zero-knowledge, but I omitted this point to simplify.

(3) I'm again hiding some technicalities here, since usually we can only assume that the prover succeeds with some noticeable probability, but it does not make a major difference.

EDIT - Answering questions from the comments

From MechMK1:

Perhaps my understanding of the matter is not good enough, but I am intrigued by the question asked "Why can Alice not simply enter one end of the tunnel and come out the other?". As far as I can tell, this is not directly answered.

First, note that this cave illustration is not meant to be a real zero-knowledge proof, but is rather a scenario given for illustration purpose that conveys some intuition about a zero-knowledge proofs. There will always be some way in which the intuitive scenario does not properly explain all of the concept.

That being said, let's answer "Why can Alice not simply enter one end of the tunnel and come out the other?" (edit: as I noticed afterward, my explanation below basically expands upon the comment to OP's question made by Roman). Recall that to prove that a zero-knowledge proof convey no knowledge, we have to simulate a valid-looking transcript without knowing the actual secret witness. How can it be done with the cave experiment? An answer is given in the actual paper this illustration is taken from: How to Explain Zero-Knowledge Protocols to Your Children, which I encourage you to read for more discussions about this. Basically, you can record a video tape of someone being asked a random side of the tunnel to show up from; the person doing it, who cannot pass through the door, will just have initially picked a side at random and went there. When the person is lucky and just get out from the right side, keep the recorded video; when he is not, delete the video and try again. In the end, what you have is a recording which is perfectly indistinguishable from an actual recording of people doing the real zero-knowledge experiment.

Now, you could argue that this is a bit fishy, perhaps we could craft a valid-looking video of someone entering from one side and leaving from the other side using video editing, hence the alternative proposal can be "simulated" as well. This is were we reach the limit of this illustration. Actually, the original protocol which inspired this illustration is the zero-knowledge protocol for graph isomorphism. It goes as follows: you are given two graphs, $G_0,G_1$ (the "entries of the cave"), and you claim that they are isomorphic (i.e. "you are able to walk from one to the other"). The protocol works as follows:

The prover knows a secret permutation of the vertices of $G_0$ that maps to the vertices of $G_1$, that's his witness. He picks a bit $b$ at random, and a random permutation $pi$, and sends $G = pi(G_b)$ (i.e., he "enters the cave through a side picked at random")

The verifier picks a bit $b'$ and sends it (i.e., she "asks the prover to come out of some random side that she picks")

The prover must reveal a permutation $pi'$ that maps $G_b'$ to $G$ (this is either $pi$, or $pi$ composed with the secret permutation). I.e., he uses his secret witness ("the key of the door") to "arrive" at the side chosen by the verifier.

Now, this video-recording illustration is actually an intuitive explanation of how to prove that the above is zero-knowledge - you can create a valid-looking transcript by replaying the protocol many times, and discarding the runs where $b' neq b$. At the same time, "Why can Alice not simply enter one end of the tunnel and come out the other?" is clear here: that would correspond to revealing the path from one side to the other -- i.e., giving away the secret permutation. But again, it's obviously much less clear that this is not a valid solution in the illustrative example, which is a limitation of this example (and one of the reasons why I do not like it much).

From NieDzejkob:

"The intuition between this definition is that if the transcript of the interaction cannot be distinguished from something that could have been produced without even knowing the secret value, then it cannot convey anything useful about this value to the verifier". The prover establishes a TLS tunnel with the verifier, and sends the secret through it. Nobody can learn anything from the transcript, and yet this will clearly let the verifier learn the secret. Am I missing something?

A comment related to my footnote (2): actually, the transcript guarantees that nothing leaks not only to external people, but also to the verifier himself, if it is indistinguishable from a transcript that could have been produced as the result of an interaction with this verifier. In the proof I've given, the transcript was simulated assuming that the verifier samples the challenge $e$ honestly, which he might not do in reality. Hence, the proof I've given does in fact only show that the protocol is zero-knowledge against verifier that honestly sample $e$ at random. But general techniques exist to transform this protocol into one that can actually be proven zero-knowledge against arbitrary verifiers, even cheating ones.

Now, back to your question: when doing so, the simulated transcript would not look indistinguishable from an honest transcript from the viewpoint of the verifier himself, since the simulated transcript would not include this TLS channel transmitting the secret value. An actual proof of zero-knowledge property must produce a simulated transcript that the verifier himself cannot distinguish from a transcript that could have been produced as a result of an interaction with himself. Hence my previous comment on the fact that, for simplicity, what I had in fact proven previously does only guarantee zero-knowledge against verifiers who sample their challenge $e$ honestly, since it's how it's done in the simulated transcript. But as I said, there are methods to simulate against arbitrary verifiers.

Wow! I get it now :D Thank you for explaining what was wrong with my line of thinking, I couldn’t figure it out. In your example about “why does this prove knowledge of x?”, I think you meant to say “given the code of a prover”, since it is there that you are putting breakpoints, unless I am looking at it wrong. — May 29 at 6:03
Perhaps my understanding of the matter is not good enough, but I am intrigued by the question asked "Why can Alice not simply enter one end of the tunnel and come out the other?". As far as I can tell, this is not directly answered. — May 29 at 12:51
Right, I did not address this point, I will edit in this regard later on today. — May 29 at 12:54
@MechMK1 The comment by Roman Odaisky on the question answers this. It's also slightly related to the sentence Therefore, interacting with the prover in this protocol does not convey any information at all about 𝑥 in the answer. The way I understand it, is that a zero-knowledge proof is only usable for the verifier, and cannot be passed to other people. But if that is correct, how can there be non-interactive zero-knowledge proofs? :thinking: — May 29 at 14:37

score 27 · Accepted Answer · 2019-06-01 22:51:58Z

There are three issues in your proposal, which I'll go over one by one; I hope this will clarify the concept.

The first issue is that the purpose of a zero-knowledge proof is not only to prove knowledge of some information without disclosing it, but something much, much more powerful: the goal is to prove that you know some information$^1$ without disclosing anything at all beyond the fact that you know this information. The point here is that you have no idea what your opponent is going to do with the information you leak about your secret value. It might well be that even some apparently harmless leakage can allow your opponent to do something unexpected and bad. Consider the authentication protocol you suggest, with a hash function $H$: here, given a secret value $v$, you leak $H(v)$. This is clearly not "no information", and you have no guarantee that this $H(v)$ cannot be used to do something bad. So, the way we define zero-knowledge proofs aims at anticipating every possible scenario, as follows: we say that the protocol is zero-knowledge if there exist an efficient algorithm that could produce an interaction with the verifier which is indistinguishable from an honest interaction, but without knowing the secret value. The intuition between this definition is that if the transcript of the interaction cannot be distinguished from something that could have been produced without even knowing the secret value, then it cannot convey anything useful about this value to the verifier (of course, defining that formally requires some care). Your protocol clearly does not satisfy this.

The second issue is the generality of the functionalities we want to consider in general. In your example, you actually assumed that you were only proving that you know the same value as the one already held by your opponent (since she must hash the value herself to compare). But this is not a very useful situation in general. A much more general setting would be for example the following: some public ciphertext $c$ is known, and you want to prove to an opponent that you know its plaintext, but without disclosing the actual plaintext. Here, your hash-then-send simple approach does obviously not work at all. In fact, zero-knowledge proofs deal with even much more general situations than this, allowing to prove wide classes of statements about secret values, which can be hashed, committed, encrypted, signed, or whatever you like.

Eventually, the third issue is that a zero-knowledge proof of knowledge must, well, prove knowledge. A protocol proves knowledge of a value if given the code of the prover, it is actually possible to extract this value efficiently (this is the formal definition of "knowledge" in cryptography). This is also clearly not possible in your example. In fact, your protocol does not prove knowledge of the secret value $v$ at all, since the prover would just have to know $H(v)$ to complete it - which, as you pointed out yourself, does not imply that he knows $v$, since $H$ is one-way.

Let's go over an actual example, to make all of that more concrete.

Take a finite group $G$ of prime order $p$ (I assume some familiarity with basic algebra here). Fix a generator $g$. Consider now another group element $h$. The goal of the prover is to demonstrate to the verifier that he knows a secret value $x$ such that $h = g^x$ (note that such an $x$ exists since $g$ is a generator, but finding it given $h$ might be hard - it's the discrete logarithm problem). This is very useful for authentication: $h$ will be the "public identity" of the prover, and demonstrating knowledge of $x$ such that $g^x = h$ can be used to authenticate him as the owner of the "private identity" associated to $h$ (its exponent in base $g$). Take one second to convince yourself that no simple "hash based" solution does work here.

Here is a protocol that actually works:

The prover picks a random exponent $r$ and sends $R = g^r$ to the verifier.

The verifier picks a random exponent $e$ (the challenge) and sends it to the prover.

The prover computes and sends $d = ex+r$ to the verifier.

The verifier checks that $h^ecdot R = g^d$.

Take a few seconds to check that the protocol is correct, meaning, if the prover plays honestly, the check of the verifier will succeed.

Why is this zero-knowledge$^2$? Because one can generate a transcript that is perfectly indistinguishable from an honest transcript of this protocol, but without knowing anything about $x$: pick $(e,d)$ at random, then set $R gets g^d/h^e$, and output the transcript $(R,e,d)$. Note that this can be easily checked to give a transcript uniformly distributed over all transcripts satisfying $h^ecdot R = g^d$ - i.e., uniformly distributed accross all honest transcripts. Therefore, interacting with the prover in this protocol does not convey any information at all about $x$ (since a transcript following the exact same distribution could have been generated without knowing $x$).

Why does this prove knowledge of $x$? To show this, I must show that given the code of a prover that succeeds in this proof, I can efficiently recover $x$ - hence showing that this prover actually knows $x$. I do it as follows: I run the prover code, to get $R$. Then, I put a breakpoint in the code, fork it, and run it twice on two random different challenges $(e_0,e_1)$ that I choose. The first copy of the code outputs $d_0$, and the second copy of the code outputs $d_1$. Since this is the code of a successful prover, I know that the check passes$^3$, hence I have $(R, e_0, e_1, d_0, d_1)$ such that:

$h^e_0cdot R = g^d_0$

$h^e_1cdot R = g^d_1$

which gives after a few easy manipulations $g^(d_0-d_1)cdot(e_0-e_1)^-1 = h$

Therefore, the value $x$ we are looking for is just $(d_0-d_1)cdot(e_0-e_1)^-1$, and we have successfully extracted it. This concludes the proof.

(1) actually, that's only for the specific case of zero-knowledge proofs of knowledge; in full generality, there are two variants of zero-knowledge proofs, which can be used either to show that some statement is true (existential proof), or that you know a proof of some statement (proof of knowledge). I focus on the latter since it seems to be the one you read about.

(2) in fact it's not truly zero-knowledge, formally it only satisfies a weaker definition known as honest-verifier zero-knowledge, but I omitted this point to simplify.

(3) I'm again hiding some technicalities here, since usually we can only assume that the prover succeeds with some noticeable probability, but it does not make a major difference.

EDIT - Answering questions from the comments

From MechMK1:

Perhaps my understanding of the matter is not good enough, but I am intrigued by the question asked "Why can Alice not simply enter one end of the tunnel and come out the other?". As far as I can tell, this is not directly answered.

First, note that this cave illustration is not meant to be a real zero-knowledge proof, but is rather a scenario given for illustration purpose that conveys some intuition about a zero-knowledge proofs. There will always be some way in which the intuitive scenario does not properly explain all of the concept.

That being said, let's answer "Why can Alice not simply enter one end of the tunnel and come out the other?" (edit: as I noticed afterward, my explanation below basically expands upon the comment to OP's question made by Roman). Recall that to prove that a zero-knowledge proof convey no knowledge, we have to simulate a valid-looking transcript without knowing the actual secret witness. How can it be done with the cave experiment? An answer is given in the actual paper this illustration is taken from: How to Explain Zero-Knowledge Protocols to Your Children, which I encourage you to read for more discussions about this. Basically, you can record a video tape of someone being asked a random side of the tunnel to show up from; the person doing it, who cannot pass through the door, will just have initially picked a side at random and went there. When the person is lucky and just get out from the right side, keep the recorded video; when he is not, delete the video and try again. In the end, what you have is a recording which is perfectly indistinguishable from an actual recording of people doing the real zero-knowledge experiment.

Now, you could argue that this is a bit fishy, perhaps we could craft a valid-looking video of someone entering from one side and leaving from the other side using video editing, hence the alternative proposal can be "simulated" as well. This is were we reach the limit of this illustration. Actually, the original protocol which inspired this illustration is the zero-knowledge protocol for graph isomorphism. It goes as follows: you are given two graphs, $G_0,G_1$ (the "entries of the cave"), and you claim that they are isomorphic (i.e. "you are able to walk from one to the other"). The protocol works as follows:

The prover knows a secret permutation of the vertices of $G_0$ that maps to the vertices of $G_1$, that's his witness. He picks a bit $b$ at random, and a random permutation $pi$, and sends $G = pi(G_b)$ (i.e., he "enters the cave through a side picked at random")

The verifier picks a bit $b'$ and sends it (i.e., she "asks the prover to come out of some random side that she picks")

The prover must reveal a permutation $pi'$ that maps $G_b'$ to $G$ (this is either $pi$, or $pi$ composed with the secret permutation). I.e., he uses his secret witness ("the key of the door") to "arrive" at the side chosen by the verifier.

Now, this video-recording illustration is actually an intuitive explanation of how to prove that the above is zero-knowledge - you can create a valid-looking transcript by replaying the protocol many times, and discarding the runs where $b' neq b$. At the same time, "Why can Alice not simply enter one end of the tunnel and come out the other?" is clear here: that would correspond to revealing the path from one side to the other -- i.e., giving away the secret permutation. But again, it's obviously much less clear that this is not a valid solution in the illustrative example, which is a limitation of this example (and one of the reasons why I do not like it much).

From NieDzejkob:

"The intuition between this definition is that if the transcript of the interaction cannot be distinguished from something that could have been produced without even knowing the secret value, then it cannot convey anything useful about this value to the verifier". The prover establishes a TLS tunnel with the verifier, and sends the secret through it. Nobody can learn anything from the transcript, and yet this will clearly let the verifier learn the secret. Am I missing something?

A comment related to my footnote (2): actually, the transcript guarantees that nothing leaks not only to external people, but also to the verifier himself, if it is indistinguishable from a transcript that could have been produced as the result of an interaction with this verifier. In the proof I've given, the transcript was simulated assuming that the verifier samples the challenge $e$ honestly, which he might not do in reality. Hence, the proof I've given does in fact only show that the protocol is zero-knowledge against verifier that honestly sample $e$ at random. But general techniques exist to transform this protocol into one that can actually be proven zero-knowledge against arbitrary verifiers, even cheating ones.

Now, back to your question: when doing so, the simulated transcript would not look indistinguishable from an honest transcript from the viewpoint of the verifier himself, since the simulated transcript would not include this TLS channel transmitting the secret value. An actual proof of zero-knowledge property must produce a simulated transcript that the verifier himself cannot distinguish from a transcript that could have been produced as a result of an interaction with himself. Hence my previous comment on the fact that, for simplicity, what I had in fact proven previously does only guarantee zero-knowledge against verifiers who sample their challenge $e$ honestly, since it's how it's done in the simulated transcript. But as I said, there are methods to simulate against arbitrary verifiers.

Wow! I get it now :D Thank you for explaining what was wrong with my line of thinking, I couldn’t figure it out. In your example about “why does this prove knowledge of x?”, I think you meant to say “given the code of a prover”, since it is there that you are putting breakpoints, unless I am looking at it wrong. — May 29 at 6:03
Perhaps my understanding of the matter is not good enough, but I am intrigued by the question asked "Why can Alice not simply enter one end of the tunnel and come out the other?". As far as I can tell, this is not directly answered. — May 29 at 12:51
Right, I did not address this point, I will edit in this regard later on today. — May 29 at 12:54
@MechMK1 The comment by Roman Odaisky on the question answers this. It's also slightly related to the sentence Therefore, interacting with the prover in this protocol does not convey any information at all about 𝑥 in the answer. The way I understand it, is that a zero-knowledge proof is only usable for the verifier, and cannot be passed to other people. But if that is correct, how can there be non-interactive zero-knowledge proofs? :thinking: — May 29 at 14:37

搜尋此網誌

Otdfbt

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1