Cannot create nested network namespaceHow can I switch from a custom linux network namespace back to the default one?Hosts cannot ping in two name spaces using Open VswitchHow to configure a Linux network namespace that allows UDP broadcastcannot ping linux network namespace within the same subnetHow can I move an interface out of a network namespace?VRF on Linux using network namespacesHow to create permanent linux network namespaceAdding vrf interface to a network namespaceRestore namespace for adapters lost to LXCHow to run a command in another process's network namespace?
Patience, young "Padovan"
Can I make popcorn with any corn?
Is it possible to make sharp wind that can cut stuff from afar?
Email Account under attack (really) - anything I can do?
Infinite past with a beginning?
Validation accuracy vs Testing accuracy
How is this relation reflexive?
I probably found a bug with the sudo apt install function
Is it tax fraud for an individual to declare non-taxable revenue as taxable income? (US tax laws)
Motorized valve interfering with button?
How to make payment on the internet without leaving a money trail?
Shell script can be run only with sh command
How do you conduct xenoanthropology after first contact?
Set-theoretical foundations of Mathematics with only bounded quantifiers
How does one intimidate enemies without having the capacity for violence?
New order #4: World
Accidentally leaked the solution to an assignment, what to do now? (I'm the prof)
What do you call a Matrix-like slowdown and camera movement effect?
Can an x86 CPU running in real mode be considered to be basically an 8086 CPU?
I’m planning on buying a laser printer but concerned about the life cycle of toner in the machine
Why was the small council so happy for Tyrion to become the Master of Coin?
A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?
What would the Romans have called "sorcery"?
How can the DM most effectively choose 1 out of an odd number of players to be targeted by an attack or effect?
Cannot create nested network namespace
How can I switch from a custom linux network namespace back to the default one?Hosts cannot ping in two name spaces using Open VswitchHow to configure a Linux network namespace that allows UDP broadcastcannot ping linux network namespace within the same subnetHow can I move an interface out of a network namespace?VRF on Linux using network namespacesHow to create permanent linux network namespaceAdding vrf interface to a network namespaceRestore namespace for adapters lost to LXCHow to run a command in another process's network namespace?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
Is seems that one is not able to create a network namespace from a network namespace. It results in "Error: Peer netns reference is invalid.".
Is this a bug or is there some kind of limitation that I am not aware of?
Below is my cmd trace of the error.
# ip netns add foo1
# ip netns exec foo1 ip netns add foo2
# ip netns
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
foo2
foo1
# ip netns exec foo2 /bin/bash
setting the network namespace "foo2" failed: Invalid argument
linux ip linux-networking namespaces network-namespace
add a comment |
Is seems that one is not able to create a network namespace from a network namespace. It results in "Error: Peer netns reference is invalid.".
Is this a bug or is there some kind of limitation that I am not aware of?
Below is my cmd trace of the error.
# ip netns add foo1
# ip netns exec foo1 ip netns add foo2
# ip netns
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
foo2
foo1
# ip netns exec foo2 /bin/bash
setting the network namespace "foo2" failed: Invalid argument
linux ip linux-networking namespaces network-namespace
add a comment |
Is seems that one is not able to create a network namespace from a network namespace. It results in "Error: Peer netns reference is invalid.".
Is this a bug or is there some kind of limitation that I am not aware of?
Below is my cmd trace of the error.
# ip netns add foo1
# ip netns exec foo1 ip netns add foo2
# ip netns
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
foo2
foo1
# ip netns exec foo2 /bin/bash
setting the network namespace "foo2" failed: Invalid argument
linux ip linux-networking namespaces network-namespace
Is seems that one is not able to create a network namespace from a network namespace. It results in "Error: Peer netns reference is invalid.".
Is this a bug or is there some kind of limitation that I am not aware of?
Below is my cmd trace of the error.
# ip netns add foo1
# ip netns exec foo1 ip netns add foo2
# ip netns
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
foo2
foo1
# ip netns exec foo2 /bin/bash
setting the network namespace "foo2" failed: Invalid argument
linux ip linux-networking namespaces network-namespace
linux ip linux-networking namespaces network-namespace
asked Apr 4 at 13:34
user98651user98651
84
84
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
TL;DR: As weird as it seems, this is actually not a network namespace issue, but a mount namespace issue and is to be expected.
You should create all new "ip netns namespaces" (see later for the meaning), i.e. run all ip netns add ...
commands from the initial (host) "ip netns namespace", not from inside an "ip netns namespace" having been entered with ip netns exec ...
. As long as you don't create them you're then free to switch between them at will including nesting commands from one to an other, with ip netns exec ...
.
Detailed explanation with step-by-step examples following...
ip netns
is specialized on network namespaces, but to handle all features, has also to mingle with mount namespaces for two reasons (at least, that I know of):
bind mounting
/etc/netns/FOO/SOMESERVICE
to/etc/SOMESERVICE
to manage alternate service/daemon configurationsA feature which can be handy to easily run some (network related) daemons in an other network namespace but beside this being still part of the "host". You can check my answer at UL on a question about it there: Namespace management with ip netns (iproute2). Its use requires the same treatment as the following feature, so I won't talk about it anymore.
remounting
/sys
to expose new network namespace's network devices in its hierarchyThis one is a mandatory feature. Example exposing the problem:
From "initial host":
# ip link add dev dummy9 type dummy
# ip -br link show dummy9
dummy9 DOWN f6:f6:48:9c:12:b9 <BROADCAST,NOARP>
# ls -l /sys/class/net/dummy9
lrwxrwxrwx. 1 root root 0 Apr 4 22:09 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9Using a lower level tool to change to an other (ephemeral) network namespace:
# unshare --net ip -br link show dummy9
Device "dummy9" does not exist.
# unshare --net ls -l /sys/class/net/dummy9
lrwxrwxrwx. 1 root root 0 Apr 4 22:13 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9And that's the issue:
/sys
still exposes initial host's interfaces instead of the new network namespace's interface. That's where there is an interaction between network namespace and with mounting/sys
: if/sys
is mounted from the new network namespace, it will switch to exposing the new network interfaces in select directory hierarchies (eg/sys/class/net
and/sys/devices/virtual/net
). This is done at mount time only, not dynamically. Some advanced network settings are easily available by just reading or writing there, so they have to be provided, and the reverse is true: the isolated processes running in the new network environment shouldn't be able to see or alter the initial host's interfaces.
So ip netns exec FOO ...
(but not ip netns add FOO
) solves this by also unsharing the mount namespace and remounting /sys/
inside it, to not disrupt initial host's network namespace. But what is important is that this mount namespace is itself ephemeral: when you run separately two ip netns exec FOO ...
commands, they don't end up in the same mount namespace. They each have their own, with /sys
remounted there pointing to the same network namespace.
Until now, no problem. I'll call this an "ip netns namespace" when this happened since there are now two types of namespaces involved. We have so far:
term1:
# ip netns add FOO
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/net -> net:[4026531992]
# ip netns exec FOO bash
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/mnt -> mnt:[4026532618]
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/net -> net:[4026532520]
term2:
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/net -> net:[4026531992]
# ip netns exec FOO bash
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/mnt -> mnt:[4026532821]
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/net -> net:[4026532520]
Note how after changing ip netns namespaces, while the new network namespace is the same for term1 and term2, the new mount namespaces are different from each others (and from initial host).
Now what happens when in term1 you create a new ip netns namespace? Let's see:
term1:
# ip netns add BAR
# ip netns ls
BAR
FOO
term2:
# ip netns ls
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
BAR
FOO
That's because the newer namespace BAR, to be kept existing without a process, is, as others, mounted on (the newly created empty file) /var/run/netns/BAR
(again, see previous link for examples). While the mount namespaces are different, they have the same root directory: initial host's root. So of course this newly created empty file /var/run/netns/BAR
could be seen everywhere (initial, term1's mount ns, term2's mount ns) when it was created.
Alas, the mount over it, being done on term1's FOO's mount namespace, can only be seen on term1, not on term2 nor anywhere else, because it's a different mount namespace. So while in term1 ('s FOO ip netns namespace) /var/run/netns/BAR
is a pseudo-file belonging to the nsfs
pseudo-filesystem:
term1:
# stat -f -c %T /var/run/netns/BAR
nsfs
It's an empty file on tmpfs
(from the actual /run
mount) anywhere else:
term2:
# stat -f -c %T /var/run/netns/BAR
tmpfs
Any other terminal:
$ stat -f -c %T /var/run/netns/BAR
tmpfs
It can still be seen in term1 as long as one doesn't exit the current "ip netns namespace". If from term1 one still switches ip netns namespaces , it will still be fine, because the new unshared ephemeral mount namespace is a copy of the previous, including all the mounts.
If exited, that mount point is lost (and that means if there are no processes or file descriptors using it anymore, BAR's corresponding network namespace will disappear because it was held only by this mount point). After this any ip netns ls
command will complain, anywhere. You can just remove the stale and now useless file /run/netns/BAR
to fix it.
After this step-by-step explanation, what to remember is that you shouldn't create new namespaces with ip netns add
inside a namespace currently entered with ip netns exec
. You should create them all from the initial (host) namespace, then you can switch at will between them from any ip netns namespace.
Of course, if /var/run/netns/
(i.e. the mount point /run
) is distinct between (staying fuzzy) namespaces, then there is no interaction, and each ip netns
invocation will be isolated from others, not seing nor interacting with others. Where does this usually happen? In full containers, where both the mount and the network namespaces are separated and point to distinct resources from the start.
UPDATE: as asked in comments, I checked how to "repair" this problem, but couldn't find any easy solution.
First there's a prerequisite: as told above, once the new "ip netns" namespace BAR is created inside FOO, and FOO is left, the only reference to BAR will disappear, thus making BAR also disappear. Something more is needed.
Actually there are three ways to keep a reference to a namespace:
- process: that's the main method, and most of the time that's how the namespace is used at all
- mount point (that's the method used by
ip netns
): allows to keep a namespace without any process, fine to have a namespace with only network settings inside (interfaces, bridges, tc rules, firewall rules, ...) - open file descriptor: rare, used when creating the namespaces, but seldom kept, except for applications dealing with multiple namespaces at the same time and switching some of their threads using the file descriptor for easy reference.
We can use the 1st or 3rd method. Here are various failed attempts before finding something that works...
As told before, won't work:
# ip netns add FOO
# ip netns exec FOO ip netns add BAR
Just leave a process running temporarily in the first "ip netns" namespace, for its ephemeral mount namespace part, to keep the needed reference to the new "ip netns" namespace's network namespace and reuse it later from outside (from the initial namespace).
Won't work either:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; sleep 999 < /var/run/netns/BAR & echo $!'
28344
# strace -e trace=readlink,mount mount --bind /proc/6295/fd/0 /var/run/netns/BAR
readlink("/proc/6295/fd/0", "/run/netns/BAR", 4095) = 14
readlink("/var/run", "/run", 4095) = 4
mount("/run/netns/BAR", "/run/netns/BAR", 0x55c88c9cccb0, MS_BIND, NULL) = 0
+++ exited with 0 +++
# stat -f -c %T /run/netns/BAR
tmpfs
As seen with strace
the mount
command followed the symlink when it shouldn't have for this use case (note: the mount is still linked to the sleep process somehow which has to be killed to unmount it).
This (entering sleep
's mount namespace, to access the BAR's mounted network namespace hidden there) works but relies on the continued existence of sleep
or any process for continued use:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; sleep 999 & echo $!'
12916
# nsenter --target=12916 --mount ip -n -brief BAR link show
lo DOWN 00:00:00:00:00:00 <LOOPBACK>
dummy8 DOWN 8e:ce:b3:d1:9c:bb <BROADCAST,NOARP>
strangely this (using the mount namespace shortcut /proc/pid/root/
) doesn't work (I don't really know why):
# stat -f -c %T /proc/12916/root/var/run/netns/BAR
tmpfs
Finally what will work:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; ip netns exec BAR sh -c '''sleep 999 & echo $!''
14124
# mount --bind /proc/14124/ns/net /var/run/netns/BAR
# ip -n BAR -brief link show
lo DOWN 00:00:00:00:00:00 <LOOPBACK>
dummy8 DOWN 3a:48:65:20:68:c1 <BROADCAST,NOARP>
So something like this could be used in the end. There might be race conditions if you attempt to delete them right after, before the sleep command ends.
# ip netns add FOO
# mount --bind /proc/$(ip netns exec FOO sh -c 'ip netns add BAR; ip netns exec BAR bash -c '''sleep 5 </dev/null >/dev/null 2>&1 & echo $!; disown'')/ns/net /var/run/netns/BAR
How could such a construct be used? I have no idea because the original problem before encountering the nested "ip netns" problem was not given. Maybe easier solutions are available without ever trying to create "a nested network namespace".
Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?
– user98651
2 days ago
It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.
– A.B
2 days ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f961504%2fcannot-create-nested-network-namespace%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
TL;DR: As weird as it seems, this is actually not a network namespace issue, but a mount namespace issue and is to be expected.
You should create all new "ip netns namespaces" (see later for the meaning), i.e. run all ip netns add ...
commands from the initial (host) "ip netns namespace", not from inside an "ip netns namespace" having been entered with ip netns exec ...
. As long as you don't create them you're then free to switch between them at will including nesting commands from one to an other, with ip netns exec ...
.
Detailed explanation with step-by-step examples following...
ip netns
is specialized on network namespaces, but to handle all features, has also to mingle with mount namespaces for two reasons (at least, that I know of):
bind mounting
/etc/netns/FOO/SOMESERVICE
to/etc/SOMESERVICE
to manage alternate service/daemon configurationsA feature which can be handy to easily run some (network related) daemons in an other network namespace but beside this being still part of the "host". You can check my answer at UL on a question about it there: Namespace management with ip netns (iproute2). Its use requires the same treatment as the following feature, so I won't talk about it anymore.
remounting
/sys
to expose new network namespace's network devices in its hierarchyThis one is a mandatory feature. Example exposing the problem:
From "initial host":
# ip link add dev dummy9 type dummy
# ip -br link show dummy9
dummy9 DOWN f6:f6:48:9c:12:b9 <BROADCAST,NOARP>
# ls -l /sys/class/net/dummy9
lrwxrwxrwx. 1 root root 0 Apr 4 22:09 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9Using a lower level tool to change to an other (ephemeral) network namespace:
# unshare --net ip -br link show dummy9
Device "dummy9" does not exist.
# unshare --net ls -l /sys/class/net/dummy9
lrwxrwxrwx. 1 root root 0 Apr 4 22:13 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9And that's the issue:
/sys
still exposes initial host's interfaces instead of the new network namespace's interface. That's where there is an interaction between network namespace and with mounting/sys
: if/sys
is mounted from the new network namespace, it will switch to exposing the new network interfaces in select directory hierarchies (eg/sys/class/net
and/sys/devices/virtual/net
). This is done at mount time only, not dynamically. Some advanced network settings are easily available by just reading or writing there, so they have to be provided, and the reverse is true: the isolated processes running in the new network environment shouldn't be able to see or alter the initial host's interfaces.
So ip netns exec FOO ...
(but not ip netns add FOO
) solves this by also unsharing the mount namespace and remounting /sys/
inside it, to not disrupt initial host's network namespace. But what is important is that this mount namespace is itself ephemeral: when you run separately two ip netns exec FOO ...
commands, they don't end up in the same mount namespace. They each have their own, with /sys
remounted there pointing to the same network namespace.
Until now, no problem. I'll call this an "ip netns namespace" when this happened since there are now two types of namespaces involved. We have so far:
term1:
# ip netns add FOO
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/net -> net:[4026531992]
# ip netns exec FOO bash
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/mnt -> mnt:[4026532618]
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/net -> net:[4026532520]
term2:
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/net -> net:[4026531992]
# ip netns exec FOO bash
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/mnt -> mnt:[4026532821]
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/net -> net:[4026532520]
Note how after changing ip netns namespaces, while the new network namespace is the same for term1 and term2, the new mount namespaces are different from each others (and from initial host).
Now what happens when in term1 you create a new ip netns namespace? Let's see:
term1:
# ip netns add BAR
# ip netns ls
BAR
FOO
term2:
# ip netns ls
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
BAR
FOO
That's because the newer namespace BAR, to be kept existing without a process, is, as others, mounted on (the newly created empty file) /var/run/netns/BAR
(again, see previous link for examples). While the mount namespaces are different, they have the same root directory: initial host's root. So of course this newly created empty file /var/run/netns/BAR
could be seen everywhere (initial, term1's mount ns, term2's mount ns) when it was created.
Alas, the mount over it, being done on term1's FOO's mount namespace, can only be seen on term1, not on term2 nor anywhere else, because it's a different mount namespace. So while in term1 ('s FOO ip netns namespace) /var/run/netns/BAR
is a pseudo-file belonging to the nsfs
pseudo-filesystem:
term1:
# stat -f -c %T /var/run/netns/BAR
nsfs
It's an empty file on tmpfs
(from the actual /run
mount) anywhere else:
term2:
# stat -f -c %T /var/run/netns/BAR
tmpfs
Any other terminal:
$ stat -f -c %T /var/run/netns/BAR
tmpfs
It can still be seen in term1 as long as one doesn't exit the current "ip netns namespace". If from term1 one still switches ip netns namespaces , it will still be fine, because the new unshared ephemeral mount namespace is a copy of the previous, including all the mounts.
If exited, that mount point is lost (and that means if there are no processes or file descriptors using it anymore, BAR's corresponding network namespace will disappear because it was held only by this mount point). After this any ip netns ls
command will complain, anywhere. You can just remove the stale and now useless file /run/netns/BAR
to fix it.
After this step-by-step explanation, what to remember is that you shouldn't create new namespaces with ip netns add
inside a namespace currently entered with ip netns exec
. You should create them all from the initial (host) namespace, then you can switch at will between them from any ip netns namespace.
Of course, if /var/run/netns/
(i.e. the mount point /run
) is distinct between (staying fuzzy) namespaces, then there is no interaction, and each ip netns
invocation will be isolated from others, not seing nor interacting with others. Where does this usually happen? In full containers, where both the mount and the network namespaces are separated and point to distinct resources from the start.
UPDATE: as asked in comments, I checked how to "repair" this problem, but couldn't find any easy solution.
First there's a prerequisite: as told above, once the new "ip netns" namespace BAR is created inside FOO, and FOO is left, the only reference to BAR will disappear, thus making BAR also disappear. Something more is needed.
Actually there are three ways to keep a reference to a namespace:
- process: that's the main method, and most of the time that's how the namespace is used at all
- mount point (that's the method used by
ip netns
): allows to keep a namespace without any process, fine to have a namespace with only network settings inside (interfaces, bridges, tc rules, firewall rules, ...) - open file descriptor: rare, used when creating the namespaces, but seldom kept, except for applications dealing with multiple namespaces at the same time and switching some of their threads using the file descriptor for easy reference.
We can use the 1st or 3rd method. Here are various failed attempts before finding something that works...
As told before, won't work:
# ip netns add FOO
# ip netns exec FOO ip netns add BAR
Just leave a process running temporarily in the first "ip netns" namespace, for its ephemeral mount namespace part, to keep the needed reference to the new "ip netns" namespace's network namespace and reuse it later from outside (from the initial namespace).
Won't work either:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; sleep 999 < /var/run/netns/BAR & echo $!'
28344
# strace -e trace=readlink,mount mount --bind /proc/6295/fd/0 /var/run/netns/BAR
readlink("/proc/6295/fd/0", "/run/netns/BAR", 4095) = 14
readlink("/var/run", "/run", 4095) = 4
mount("/run/netns/BAR", "/run/netns/BAR", 0x55c88c9cccb0, MS_BIND, NULL) = 0
+++ exited with 0 +++
# stat -f -c %T /run/netns/BAR
tmpfs
As seen with strace
the mount
command followed the symlink when it shouldn't have for this use case (note: the mount is still linked to the sleep process somehow which has to be killed to unmount it).
This (entering sleep
's mount namespace, to access the BAR's mounted network namespace hidden there) works but relies on the continued existence of sleep
or any process for continued use:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; sleep 999 & echo $!'
12916
# nsenter --target=12916 --mount ip -n -brief BAR link show
lo DOWN 00:00:00:00:00:00 <LOOPBACK>
dummy8 DOWN 8e:ce:b3:d1:9c:bb <BROADCAST,NOARP>
strangely this (using the mount namespace shortcut /proc/pid/root/
) doesn't work (I don't really know why):
# stat -f -c %T /proc/12916/root/var/run/netns/BAR
tmpfs
Finally what will work:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; ip netns exec BAR sh -c '''sleep 999 & echo $!''
14124
# mount --bind /proc/14124/ns/net /var/run/netns/BAR
# ip -n BAR -brief link show
lo DOWN 00:00:00:00:00:00 <LOOPBACK>
dummy8 DOWN 3a:48:65:20:68:c1 <BROADCAST,NOARP>
So something like this could be used in the end. There might be race conditions if you attempt to delete them right after, before the sleep command ends.
# ip netns add FOO
# mount --bind /proc/$(ip netns exec FOO sh -c 'ip netns add BAR; ip netns exec BAR bash -c '''sleep 5 </dev/null >/dev/null 2>&1 & echo $!; disown'')/ns/net /var/run/netns/BAR
How could such a construct be used? I have no idea because the original problem before encountering the nested "ip netns" problem was not given. Maybe easier solutions are available without ever trying to create "a nested network namespace".
Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?
– user98651
2 days ago
It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.
– A.B
2 days ago
add a comment |
TL;DR: As weird as it seems, this is actually not a network namespace issue, but a mount namespace issue and is to be expected.
You should create all new "ip netns namespaces" (see later for the meaning), i.e. run all ip netns add ...
commands from the initial (host) "ip netns namespace", not from inside an "ip netns namespace" having been entered with ip netns exec ...
. As long as you don't create them you're then free to switch between them at will including nesting commands from one to an other, with ip netns exec ...
.
Detailed explanation with step-by-step examples following...
ip netns
is specialized on network namespaces, but to handle all features, has also to mingle with mount namespaces for two reasons (at least, that I know of):
bind mounting
/etc/netns/FOO/SOMESERVICE
to/etc/SOMESERVICE
to manage alternate service/daemon configurationsA feature which can be handy to easily run some (network related) daemons in an other network namespace but beside this being still part of the "host". You can check my answer at UL on a question about it there: Namespace management with ip netns (iproute2). Its use requires the same treatment as the following feature, so I won't talk about it anymore.
remounting
/sys
to expose new network namespace's network devices in its hierarchyThis one is a mandatory feature. Example exposing the problem:
From "initial host":
# ip link add dev dummy9 type dummy
# ip -br link show dummy9
dummy9 DOWN f6:f6:48:9c:12:b9 <BROADCAST,NOARP>
# ls -l /sys/class/net/dummy9
lrwxrwxrwx. 1 root root 0 Apr 4 22:09 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9Using a lower level tool to change to an other (ephemeral) network namespace:
# unshare --net ip -br link show dummy9
Device "dummy9" does not exist.
# unshare --net ls -l /sys/class/net/dummy9
lrwxrwxrwx. 1 root root 0 Apr 4 22:13 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9And that's the issue:
/sys
still exposes initial host's interfaces instead of the new network namespace's interface. That's where there is an interaction between network namespace and with mounting/sys
: if/sys
is mounted from the new network namespace, it will switch to exposing the new network interfaces in select directory hierarchies (eg/sys/class/net
and/sys/devices/virtual/net
). This is done at mount time only, not dynamically. Some advanced network settings are easily available by just reading or writing there, so they have to be provided, and the reverse is true: the isolated processes running in the new network environment shouldn't be able to see or alter the initial host's interfaces.
So ip netns exec FOO ...
(but not ip netns add FOO
) solves this by also unsharing the mount namespace and remounting /sys/
inside it, to not disrupt initial host's network namespace. But what is important is that this mount namespace is itself ephemeral: when you run separately two ip netns exec FOO ...
commands, they don't end up in the same mount namespace. They each have their own, with /sys
remounted there pointing to the same network namespace.
Until now, no problem. I'll call this an "ip netns namespace" when this happened since there are now two types of namespaces involved. We have so far:
term1:
# ip netns add FOO
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/net -> net:[4026531992]
# ip netns exec FOO bash
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/mnt -> mnt:[4026532618]
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/net -> net:[4026532520]
term2:
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/net -> net:[4026531992]
# ip netns exec FOO bash
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/mnt -> mnt:[4026532821]
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/net -> net:[4026532520]
Note how after changing ip netns namespaces, while the new network namespace is the same for term1 and term2, the new mount namespaces are different from each others (and from initial host).
Now what happens when in term1 you create a new ip netns namespace? Let's see:
term1:
# ip netns add BAR
# ip netns ls
BAR
FOO
term2:
# ip netns ls
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
BAR
FOO
That's because the newer namespace BAR, to be kept existing without a process, is, as others, mounted on (the newly created empty file) /var/run/netns/BAR
(again, see previous link for examples). While the mount namespaces are different, they have the same root directory: initial host's root. So of course this newly created empty file /var/run/netns/BAR
could be seen everywhere (initial, term1's mount ns, term2's mount ns) when it was created.
Alas, the mount over it, being done on term1's FOO's mount namespace, can only be seen on term1, not on term2 nor anywhere else, because it's a different mount namespace. So while in term1 ('s FOO ip netns namespace) /var/run/netns/BAR
is a pseudo-file belonging to the nsfs
pseudo-filesystem:
term1:
# stat -f -c %T /var/run/netns/BAR
nsfs
It's an empty file on tmpfs
(from the actual /run
mount) anywhere else:
term2:
# stat -f -c %T /var/run/netns/BAR
tmpfs
Any other terminal:
$ stat -f -c %T /var/run/netns/BAR
tmpfs
It can still be seen in term1 as long as one doesn't exit the current "ip netns namespace". If from term1 one still switches ip netns namespaces , it will still be fine, because the new unshared ephemeral mount namespace is a copy of the previous, including all the mounts.
If exited, that mount point is lost (and that means if there are no processes or file descriptors using it anymore, BAR's corresponding network namespace will disappear because it was held only by this mount point). After this any ip netns ls
command will complain, anywhere. You can just remove the stale and now useless file /run/netns/BAR
to fix it.
After this step-by-step explanation, what to remember is that you shouldn't create new namespaces with ip netns add
inside a namespace currently entered with ip netns exec
. You should create them all from the initial (host) namespace, then you can switch at will between them from any ip netns namespace.
Of course, if /var/run/netns/
(i.e. the mount point /run
) is distinct between (staying fuzzy) namespaces, then there is no interaction, and each ip netns
invocation will be isolated from others, not seing nor interacting with others. Where does this usually happen? In full containers, where both the mount and the network namespaces are separated and point to distinct resources from the start.
UPDATE: as asked in comments, I checked how to "repair" this problem, but couldn't find any easy solution.
First there's a prerequisite: as told above, once the new "ip netns" namespace BAR is created inside FOO, and FOO is left, the only reference to BAR will disappear, thus making BAR also disappear. Something more is needed.
Actually there are three ways to keep a reference to a namespace:
- process: that's the main method, and most of the time that's how the namespace is used at all
- mount point (that's the method used by
ip netns
): allows to keep a namespace without any process, fine to have a namespace with only network settings inside (interfaces, bridges, tc rules, firewall rules, ...) - open file descriptor: rare, used when creating the namespaces, but seldom kept, except for applications dealing with multiple namespaces at the same time and switching some of their threads using the file descriptor for easy reference.
We can use the 1st or 3rd method. Here are various failed attempts before finding something that works...
As told before, won't work:
# ip netns add FOO
# ip netns exec FOO ip netns add BAR
Just leave a process running temporarily in the first "ip netns" namespace, for its ephemeral mount namespace part, to keep the needed reference to the new "ip netns" namespace's network namespace and reuse it later from outside (from the initial namespace).
Won't work either:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; sleep 999 < /var/run/netns/BAR & echo $!'
28344
# strace -e trace=readlink,mount mount --bind /proc/6295/fd/0 /var/run/netns/BAR
readlink("/proc/6295/fd/0", "/run/netns/BAR", 4095) = 14
readlink("/var/run", "/run", 4095) = 4
mount("/run/netns/BAR", "/run/netns/BAR", 0x55c88c9cccb0, MS_BIND, NULL) = 0
+++ exited with 0 +++
# stat -f -c %T /run/netns/BAR
tmpfs
As seen with strace
the mount
command followed the symlink when it shouldn't have for this use case (note: the mount is still linked to the sleep process somehow which has to be killed to unmount it).
This (entering sleep
's mount namespace, to access the BAR's mounted network namespace hidden there) works but relies on the continued existence of sleep
or any process for continued use:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; sleep 999 & echo $!'
12916
# nsenter --target=12916 --mount ip -n -brief BAR link show
lo DOWN 00:00:00:00:00:00 <LOOPBACK>
dummy8 DOWN 8e:ce:b3:d1:9c:bb <BROADCAST,NOARP>
strangely this (using the mount namespace shortcut /proc/pid/root/
) doesn't work (I don't really know why):
# stat -f -c %T /proc/12916/root/var/run/netns/BAR
tmpfs
Finally what will work:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; ip netns exec BAR sh -c '''sleep 999 & echo $!''
14124
# mount --bind /proc/14124/ns/net /var/run/netns/BAR
# ip -n BAR -brief link show
lo DOWN 00:00:00:00:00:00 <LOOPBACK>
dummy8 DOWN 3a:48:65:20:68:c1 <BROADCAST,NOARP>
So something like this could be used in the end. There might be race conditions if you attempt to delete them right after, before the sleep command ends.
# ip netns add FOO
# mount --bind /proc/$(ip netns exec FOO sh -c 'ip netns add BAR; ip netns exec BAR bash -c '''sleep 5 </dev/null >/dev/null 2>&1 & echo $!; disown'')/ns/net /var/run/netns/BAR
How could such a construct be used? I have no idea because the original problem before encountering the nested "ip netns" problem was not given. Maybe easier solutions are available without ever trying to create "a nested network namespace".
Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?
– user98651
2 days ago
It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.
– A.B
2 days ago
add a comment |
TL;DR: As weird as it seems, this is actually not a network namespace issue, but a mount namespace issue and is to be expected.
You should create all new "ip netns namespaces" (see later for the meaning), i.e. run all ip netns add ...
commands from the initial (host) "ip netns namespace", not from inside an "ip netns namespace" having been entered with ip netns exec ...
. As long as you don't create them you're then free to switch between them at will including nesting commands from one to an other, with ip netns exec ...
.
Detailed explanation with step-by-step examples following...
ip netns
is specialized on network namespaces, but to handle all features, has also to mingle with mount namespaces for two reasons (at least, that I know of):
bind mounting
/etc/netns/FOO/SOMESERVICE
to/etc/SOMESERVICE
to manage alternate service/daemon configurationsA feature which can be handy to easily run some (network related) daemons in an other network namespace but beside this being still part of the "host". You can check my answer at UL on a question about it there: Namespace management with ip netns (iproute2). Its use requires the same treatment as the following feature, so I won't talk about it anymore.
remounting
/sys
to expose new network namespace's network devices in its hierarchyThis one is a mandatory feature. Example exposing the problem:
From "initial host":
# ip link add dev dummy9 type dummy
# ip -br link show dummy9
dummy9 DOWN f6:f6:48:9c:12:b9 <BROADCAST,NOARP>
# ls -l /sys/class/net/dummy9
lrwxrwxrwx. 1 root root 0 Apr 4 22:09 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9Using a lower level tool to change to an other (ephemeral) network namespace:
# unshare --net ip -br link show dummy9
Device "dummy9" does not exist.
# unshare --net ls -l /sys/class/net/dummy9
lrwxrwxrwx. 1 root root 0 Apr 4 22:13 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9And that's the issue:
/sys
still exposes initial host's interfaces instead of the new network namespace's interface. That's where there is an interaction between network namespace and with mounting/sys
: if/sys
is mounted from the new network namespace, it will switch to exposing the new network interfaces in select directory hierarchies (eg/sys/class/net
and/sys/devices/virtual/net
). This is done at mount time only, not dynamically. Some advanced network settings are easily available by just reading or writing there, so they have to be provided, and the reverse is true: the isolated processes running in the new network environment shouldn't be able to see or alter the initial host's interfaces.
So ip netns exec FOO ...
(but not ip netns add FOO
) solves this by also unsharing the mount namespace and remounting /sys/
inside it, to not disrupt initial host's network namespace. But what is important is that this mount namespace is itself ephemeral: when you run separately two ip netns exec FOO ...
commands, they don't end up in the same mount namespace. They each have their own, with /sys
remounted there pointing to the same network namespace.
Until now, no problem. I'll call this an "ip netns namespace" when this happened since there are now two types of namespaces involved. We have so far:
term1:
# ip netns add FOO
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/net -> net:[4026531992]
# ip netns exec FOO bash
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/mnt -> mnt:[4026532618]
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/net -> net:[4026532520]
term2:
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/net -> net:[4026531992]
# ip netns exec FOO bash
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/mnt -> mnt:[4026532821]
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/net -> net:[4026532520]
Note how after changing ip netns namespaces, while the new network namespace is the same for term1 and term2, the new mount namespaces are different from each others (and from initial host).
Now what happens when in term1 you create a new ip netns namespace? Let's see:
term1:
# ip netns add BAR
# ip netns ls
BAR
FOO
term2:
# ip netns ls
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
BAR
FOO
That's because the newer namespace BAR, to be kept existing without a process, is, as others, mounted on (the newly created empty file) /var/run/netns/BAR
(again, see previous link for examples). While the mount namespaces are different, they have the same root directory: initial host's root. So of course this newly created empty file /var/run/netns/BAR
could be seen everywhere (initial, term1's mount ns, term2's mount ns) when it was created.
Alas, the mount over it, being done on term1's FOO's mount namespace, can only be seen on term1, not on term2 nor anywhere else, because it's a different mount namespace. So while in term1 ('s FOO ip netns namespace) /var/run/netns/BAR
is a pseudo-file belonging to the nsfs
pseudo-filesystem:
term1:
# stat -f -c %T /var/run/netns/BAR
nsfs
It's an empty file on tmpfs
(from the actual /run
mount) anywhere else:
term2:
# stat -f -c %T /var/run/netns/BAR
tmpfs
Any other terminal:
$ stat -f -c %T /var/run/netns/BAR
tmpfs
It can still be seen in term1 as long as one doesn't exit the current "ip netns namespace". If from term1 one still switches ip netns namespaces , it will still be fine, because the new unshared ephemeral mount namespace is a copy of the previous, including all the mounts.
If exited, that mount point is lost (and that means if there are no processes or file descriptors using it anymore, BAR's corresponding network namespace will disappear because it was held only by this mount point). After this any ip netns ls
command will complain, anywhere. You can just remove the stale and now useless file /run/netns/BAR
to fix it.
After this step-by-step explanation, what to remember is that you shouldn't create new namespaces with ip netns add
inside a namespace currently entered with ip netns exec
. You should create them all from the initial (host) namespace, then you can switch at will between them from any ip netns namespace.
Of course, if /var/run/netns/
(i.e. the mount point /run
) is distinct between (staying fuzzy) namespaces, then there is no interaction, and each ip netns
invocation will be isolated from others, not seing nor interacting with others. Where does this usually happen? In full containers, where both the mount and the network namespaces are separated and point to distinct resources from the start.
UPDATE: as asked in comments, I checked how to "repair" this problem, but couldn't find any easy solution.
First there's a prerequisite: as told above, once the new "ip netns" namespace BAR is created inside FOO, and FOO is left, the only reference to BAR will disappear, thus making BAR also disappear. Something more is needed.
Actually there are three ways to keep a reference to a namespace:
- process: that's the main method, and most of the time that's how the namespace is used at all
- mount point (that's the method used by
ip netns
): allows to keep a namespace without any process, fine to have a namespace with only network settings inside (interfaces, bridges, tc rules, firewall rules, ...) - open file descriptor: rare, used when creating the namespaces, but seldom kept, except for applications dealing with multiple namespaces at the same time and switching some of their threads using the file descriptor for easy reference.
We can use the 1st or 3rd method. Here are various failed attempts before finding something that works...
As told before, won't work:
# ip netns add FOO
# ip netns exec FOO ip netns add BAR
Just leave a process running temporarily in the first "ip netns" namespace, for its ephemeral mount namespace part, to keep the needed reference to the new "ip netns" namespace's network namespace and reuse it later from outside (from the initial namespace).
Won't work either:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; sleep 999 < /var/run/netns/BAR & echo $!'
28344
# strace -e trace=readlink,mount mount --bind /proc/6295/fd/0 /var/run/netns/BAR
readlink("/proc/6295/fd/0", "/run/netns/BAR", 4095) = 14
readlink("/var/run", "/run", 4095) = 4
mount("/run/netns/BAR", "/run/netns/BAR", 0x55c88c9cccb0, MS_BIND, NULL) = 0
+++ exited with 0 +++
# stat -f -c %T /run/netns/BAR
tmpfs
As seen with strace
the mount
command followed the symlink when it shouldn't have for this use case (note: the mount is still linked to the sleep process somehow which has to be killed to unmount it).
This (entering sleep
's mount namespace, to access the BAR's mounted network namespace hidden there) works but relies on the continued existence of sleep
or any process for continued use:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; sleep 999 & echo $!'
12916
# nsenter --target=12916 --mount ip -n -brief BAR link show
lo DOWN 00:00:00:00:00:00 <LOOPBACK>
dummy8 DOWN 8e:ce:b3:d1:9c:bb <BROADCAST,NOARP>
strangely this (using the mount namespace shortcut /proc/pid/root/
) doesn't work (I don't really know why):
# stat -f -c %T /proc/12916/root/var/run/netns/BAR
tmpfs
Finally what will work:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; ip netns exec BAR sh -c '''sleep 999 & echo $!''
14124
# mount --bind /proc/14124/ns/net /var/run/netns/BAR
# ip -n BAR -brief link show
lo DOWN 00:00:00:00:00:00 <LOOPBACK>
dummy8 DOWN 3a:48:65:20:68:c1 <BROADCAST,NOARP>
So something like this could be used in the end. There might be race conditions if you attempt to delete them right after, before the sleep command ends.
# ip netns add FOO
# mount --bind /proc/$(ip netns exec FOO sh -c 'ip netns add BAR; ip netns exec BAR bash -c '''sleep 5 </dev/null >/dev/null 2>&1 & echo $!; disown'')/ns/net /var/run/netns/BAR
How could such a construct be used? I have no idea because the original problem before encountering the nested "ip netns" problem was not given. Maybe easier solutions are available without ever trying to create "a nested network namespace".
TL;DR: As weird as it seems, this is actually not a network namespace issue, but a mount namespace issue and is to be expected.
You should create all new "ip netns namespaces" (see later for the meaning), i.e. run all ip netns add ...
commands from the initial (host) "ip netns namespace", not from inside an "ip netns namespace" having been entered with ip netns exec ...
. As long as you don't create them you're then free to switch between them at will including nesting commands from one to an other, with ip netns exec ...
.
Detailed explanation with step-by-step examples following...
ip netns
is specialized on network namespaces, but to handle all features, has also to mingle with mount namespaces for two reasons (at least, that I know of):
bind mounting
/etc/netns/FOO/SOMESERVICE
to/etc/SOMESERVICE
to manage alternate service/daemon configurationsA feature which can be handy to easily run some (network related) daemons in an other network namespace but beside this being still part of the "host". You can check my answer at UL on a question about it there: Namespace management with ip netns (iproute2). Its use requires the same treatment as the following feature, so I won't talk about it anymore.
remounting
/sys
to expose new network namespace's network devices in its hierarchyThis one is a mandatory feature. Example exposing the problem:
From "initial host":
# ip link add dev dummy9 type dummy
# ip -br link show dummy9
dummy9 DOWN f6:f6:48:9c:12:b9 <BROADCAST,NOARP>
# ls -l /sys/class/net/dummy9
lrwxrwxrwx. 1 root root 0 Apr 4 22:09 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9Using a lower level tool to change to an other (ephemeral) network namespace:
# unshare --net ip -br link show dummy9
Device "dummy9" does not exist.
# unshare --net ls -l /sys/class/net/dummy9
lrwxrwxrwx. 1 root root 0 Apr 4 22:13 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9And that's the issue:
/sys
still exposes initial host's interfaces instead of the new network namespace's interface. That's where there is an interaction between network namespace and with mounting/sys
: if/sys
is mounted from the new network namespace, it will switch to exposing the new network interfaces in select directory hierarchies (eg/sys/class/net
and/sys/devices/virtual/net
). This is done at mount time only, not dynamically. Some advanced network settings are easily available by just reading or writing there, so they have to be provided, and the reverse is true: the isolated processes running in the new network environment shouldn't be able to see or alter the initial host's interfaces.
So ip netns exec FOO ...
(but not ip netns add FOO
) solves this by also unsharing the mount namespace and remounting /sys/
inside it, to not disrupt initial host's network namespace. But what is important is that this mount namespace is itself ephemeral: when you run separately two ip netns exec FOO ...
commands, they don't end up in the same mount namespace. They each have their own, with /sys
remounted there pointing to the same network namespace.
Until now, no problem. I'll call this an "ip netns namespace" when this happened since there are now two types of namespaces involved. We have so far:
term1:
# ip netns add FOO
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/net -> net:[4026531992]
# ip netns exec FOO bash
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/mnt -> mnt:[4026532618]
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/net -> net:[4026532520]
term2:
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/net -> net:[4026531992]
# ip netns exec FOO bash
# ls -l /proc/$$/ns/mnt,net
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/mnt -> mnt:[4026532821]
lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/net -> net:[4026532520]
Note how after changing ip netns namespaces, while the new network namespace is the same for term1 and term2, the new mount namespaces are different from each others (and from initial host).
Now what happens when in term1 you create a new ip netns namespace? Let's see:
term1:
# ip netns add BAR
# ip netns ls
BAR
FOO
term2:
# ip netns ls
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
BAR
FOO
That's because the newer namespace BAR, to be kept existing without a process, is, as others, mounted on (the newly created empty file) /var/run/netns/BAR
(again, see previous link for examples). While the mount namespaces are different, they have the same root directory: initial host's root. So of course this newly created empty file /var/run/netns/BAR
could be seen everywhere (initial, term1's mount ns, term2's mount ns) when it was created.
Alas, the mount over it, being done on term1's FOO's mount namespace, can only be seen on term1, not on term2 nor anywhere else, because it's a different mount namespace. So while in term1 ('s FOO ip netns namespace) /var/run/netns/BAR
is a pseudo-file belonging to the nsfs
pseudo-filesystem:
term1:
# stat -f -c %T /var/run/netns/BAR
nsfs
It's an empty file on tmpfs
(from the actual /run
mount) anywhere else:
term2:
# stat -f -c %T /var/run/netns/BAR
tmpfs
Any other terminal:
$ stat -f -c %T /var/run/netns/BAR
tmpfs
It can still be seen in term1 as long as one doesn't exit the current "ip netns namespace". If from term1 one still switches ip netns namespaces , it will still be fine, because the new unshared ephemeral mount namespace is a copy of the previous, including all the mounts.
If exited, that mount point is lost (and that means if there are no processes or file descriptors using it anymore, BAR's corresponding network namespace will disappear because it was held only by this mount point). After this any ip netns ls
command will complain, anywhere. You can just remove the stale and now useless file /run/netns/BAR
to fix it.
After this step-by-step explanation, what to remember is that you shouldn't create new namespaces with ip netns add
inside a namespace currently entered with ip netns exec
. You should create them all from the initial (host) namespace, then you can switch at will between them from any ip netns namespace.
Of course, if /var/run/netns/
(i.e. the mount point /run
) is distinct between (staying fuzzy) namespaces, then there is no interaction, and each ip netns
invocation will be isolated from others, not seing nor interacting with others. Where does this usually happen? In full containers, where both the mount and the network namespaces are separated and point to distinct resources from the start.
UPDATE: as asked in comments, I checked how to "repair" this problem, but couldn't find any easy solution.
First there's a prerequisite: as told above, once the new "ip netns" namespace BAR is created inside FOO, and FOO is left, the only reference to BAR will disappear, thus making BAR also disappear. Something more is needed.
Actually there are three ways to keep a reference to a namespace:
- process: that's the main method, and most of the time that's how the namespace is used at all
- mount point (that's the method used by
ip netns
): allows to keep a namespace without any process, fine to have a namespace with only network settings inside (interfaces, bridges, tc rules, firewall rules, ...) - open file descriptor: rare, used when creating the namespaces, but seldom kept, except for applications dealing with multiple namespaces at the same time and switching some of their threads using the file descriptor for easy reference.
We can use the 1st or 3rd method. Here are various failed attempts before finding something that works...
As told before, won't work:
# ip netns add FOO
# ip netns exec FOO ip netns add BAR
Just leave a process running temporarily in the first "ip netns" namespace, for its ephemeral mount namespace part, to keep the needed reference to the new "ip netns" namespace's network namespace and reuse it later from outside (from the initial namespace).
Won't work either:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; sleep 999 < /var/run/netns/BAR & echo $!'
28344
# strace -e trace=readlink,mount mount --bind /proc/6295/fd/0 /var/run/netns/BAR
readlink("/proc/6295/fd/0", "/run/netns/BAR", 4095) = 14
readlink("/var/run", "/run", 4095) = 4
mount("/run/netns/BAR", "/run/netns/BAR", 0x55c88c9cccb0, MS_BIND, NULL) = 0
+++ exited with 0 +++
# stat -f -c %T /run/netns/BAR
tmpfs
As seen with strace
the mount
command followed the symlink when it shouldn't have for this use case (note: the mount is still linked to the sleep process somehow which has to be killed to unmount it).
This (entering sleep
's mount namespace, to access the BAR's mounted network namespace hidden there) works but relies on the continued existence of sleep
or any process for continued use:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; sleep 999 & echo $!'
12916
# nsenter --target=12916 --mount ip -n -brief BAR link show
lo DOWN 00:00:00:00:00:00 <LOOPBACK>
dummy8 DOWN 8e:ce:b3:d1:9c:bb <BROADCAST,NOARP>
strangely this (using the mount namespace shortcut /proc/pid/root/
) doesn't work (I don't really know why):
# stat -f -c %T /proc/12916/root/var/run/netns/BAR
tmpfs
Finally what will work:
# ip netns add FOO
# ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; ip netns exec BAR sh -c '''sleep 999 & echo $!''
14124
# mount --bind /proc/14124/ns/net /var/run/netns/BAR
# ip -n BAR -brief link show
lo DOWN 00:00:00:00:00:00 <LOOPBACK>
dummy8 DOWN 3a:48:65:20:68:c1 <BROADCAST,NOARP>
So something like this could be used in the end. There might be race conditions if you attempt to delete them right after, before the sleep command ends.
# ip netns add FOO
# mount --bind /proc/$(ip netns exec FOO sh -c 'ip netns add BAR; ip netns exec BAR bash -c '''sleep 5 </dev/null >/dev/null 2>&1 & echo $!; disown'')/ns/net /var/run/netns/BAR
How could such a construct be used? I have no idea because the original problem before encountering the nested "ip netns" problem was not given. Maybe easier solutions are available without ever trying to create "a nested network namespace".
edited 2 days ago
answered Apr 4 at 21:43
A.BA.B
1,9342717
1,9342717
Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?
– user98651
2 days ago
It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.
– A.B
2 days ago
add a comment |
Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?
– user98651
2 days ago
It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.
– A.B
2 days ago
Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?
– user98651
2 days ago
Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?
– user98651
2 days ago
It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.
– A.B
2 days ago
It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.
– A.B
2 days ago
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f961504%2fcannot-create-nested-network-namespace%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown