Cannot create nested network namespaceHow can I switch from a custom linux network namespace back to the default one?Hosts cannot ping in two name spaces using Open VswitchHow to configure a Linux network namespace that allows UDP broadcastcannot ping linux network namespace within the same subnetHow can I move an interface out of a network namespace?VRF on Linux using network namespacesHow to create permanent linux network namespaceAdding vrf interface to a network namespaceRestore namespace for adapters lost to LXCHow to run a command in another process's network namespace?

Patience, young "Padovan"

Can I make popcorn with any corn?

Is it possible to make sharp wind that can cut stuff from afar?

Email Account under attack (really) - anything I can do?

Infinite past with a beginning?

Validation accuracy vs Testing accuracy

How is this relation reflexive?

I probably found a bug with the sudo apt install function

Is it tax fraud for an individual to declare non-taxable revenue as taxable income? (US tax laws)

Motorized valve interfering with button?

How to make payment on the internet without leaving a money trail?

Shell script can be run only with sh command

How do you conduct xenoanthropology after first contact?

Set-theoretical foundations of Mathematics with only bounded quantifiers

How does one intimidate enemies without having the capacity for violence?

New order #4: World

Accidentally leaked the solution to an assignment, what to do now? (I'm the prof)

What do you call a Matrix-like slowdown and camera movement effect?

Can an x86 CPU running in real mode be considered to be basically an 8086 CPU?

I’m planning on buying a laser printer but concerned about the life cycle of toner in the machine

Why was the small council so happy for Tyrion to become the Master of Coin?

A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?

What would the Romans have called "sorcery"?

How can the DM most effectively choose 1 out of an odd number of players to be targeted by an attack or effect?



Cannot create nested network namespace


How can I switch from a custom linux network namespace back to the default one?Hosts cannot ping in two name spaces using Open VswitchHow to configure a Linux network namespace that allows UDP broadcastcannot ping linux network namespace within the same subnetHow can I move an interface out of a network namespace?VRF on Linux using network namespacesHow to create permanent linux network namespaceAdding vrf interface to a network namespaceRestore namespace for adapters lost to LXCHow to run a command in another process's network namespace?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















Is seems that one is not able to create a network namespace from a network namespace. It results in "Error: Peer netns reference is invalid.".



Is this a bug or is there some kind of limitation that I am not aware of?



Below is my cmd trace of the error.



# ip netns add foo1
# ip netns exec foo1 ip netns add foo2
# ip netns
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
foo2
foo1
# ip netns exec foo2 /bin/bash
setting the network namespace "foo2" failed: Invalid argument










share|improve this question




























    1















    Is seems that one is not able to create a network namespace from a network namespace. It results in "Error: Peer netns reference is invalid.".



    Is this a bug or is there some kind of limitation that I am not aware of?



    Below is my cmd trace of the error.



    # ip netns add foo1
    # ip netns exec foo1 ip netns add foo2
    # ip netns
    Error: Peer netns reference is invalid.
    Error: Peer netns reference is invalid.
    foo2
    foo1
    # ip netns exec foo2 /bin/bash
    setting the network namespace "foo2" failed: Invalid argument










    share|improve this question
























      1












      1








      1








      Is seems that one is not able to create a network namespace from a network namespace. It results in "Error: Peer netns reference is invalid.".



      Is this a bug or is there some kind of limitation that I am not aware of?



      Below is my cmd trace of the error.



      # ip netns add foo1
      # ip netns exec foo1 ip netns add foo2
      # ip netns
      Error: Peer netns reference is invalid.
      Error: Peer netns reference is invalid.
      foo2
      foo1
      # ip netns exec foo2 /bin/bash
      setting the network namespace "foo2" failed: Invalid argument










      share|improve this question














      Is seems that one is not able to create a network namespace from a network namespace. It results in "Error: Peer netns reference is invalid.".



      Is this a bug or is there some kind of limitation that I am not aware of?



      Below is my cmd trace of the error.



      # ip netns add foo1
      # ip netns exec foo1 ip netns add foo2
      # ip netns
      Error: Peer netns reference is invalid.
      Error: Peer netns reference is invalid.
      foo2
      foo1
      # ip netns exec foo2 /bin/bash
      setting the network namespace "foo2" failed: Invalid argument







      linux ip linux-networking namespaces network-namespace






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 4 at 13:34









      user98651user98651

      84




      84




















          1 Answer
          1






          active

          oldest

          votes


















          1














          TL;DR: As weird as it seems, this is actually not a network namespace issue, but a mount namespace issue and is to be expected.



          You should create all new "ip netns namespaces" (see later for the meaning), i.e. run all ip netns add ... commands from the initial (host) "ip netns namespace", not from inside an "ip netns namespace" having been entered with ip netns exec .... As long as you don't create them you're then free to switch between them at will including nesting commands from one to an other, with ip netns exec ....



          Detailed explanation with step-by-step examples following...




          ip netns is specialized on network namespaces, but to handle all features, has also to mingle with mount namespaces for two reasons (at least, that I know of):




          • bind mounting /etc/netns/FOO/SOMESERVICE to /etc/SOMESERVICE to manage alternate service/daemon configurations



            A feature which can be handy to easily run some (network related) daemons in an other network namespace but beside this being still part of the "host". You can check my answer at UL on a question about it there: Namespace management with ip netns (iproute2). Its use requires the same treatment as the following feature, so I won't talk about it anymore.




          • remounting /sys to expose new network namespace's network devices in its hierarchy



            This one is a mandatory feature. Example exposing the problem:



            From "initial host":



            # ip link add dev dummy9 type dummy
            # ip -br link show dummy9
            dummy9 DOWN f6:f6:48:9c:12:b9 <BROADCAST,NOARP>
            # ls -l /sys/class/net/dummy9
            lrwxrwxrwx. 1 root root 0 Apr 4 22:09 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9


            Using a lower level tool to change to an other (ephemeral) network namespace:



            # unshare --net ip -br link show dummy9 
            Device "dummy9" does not exist.
            # unshare --net ls -l /sys/class/net/dummy9
            lrwxrwxrwx. 1 root root 0 Apr 4 22:13 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9


            And that's the issue: /sys still exposes initial host's interfaces instead of the new network namespace's interface. That's where there is an interaction between network namespace and with mounting /sys: if /sys is mounted from the new network namespace, it will switch to exposing the new network interfaces in select directory hierarchies (eg /sys/class/net and /sys/devices/virtual/net). This is done at mount time only, not dynamically. Some advanced network settings are easily available by just reading or writing there, so they have to be provided, and the reverse is true: the isolated processes running in the new network environment shouldn't be able to see or alter the initial host's interfaces.



          So ip netns exec FOO ... (but not ip netns add FOO) solves this by also unsharing the mount namespace and remounting /sys/ inside it, to not disrupt initial host's network namespace. But what is important is that this mount namespace is itself ephemeral: when you run separately two ip netns exec FOO ... commands, they don't end up in the same mount namespace. They each have their own, with /sys remounted there pointing to the same network namespace.



          Until now, no problem. I'll call this an "ip netns namespace" when this happened since there are now two types of namespaces involved. We have so far:



          term1:



          # ip netns add FOO
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/mnt -> mnt:[4026531840]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/net -> net:[4026531992]
          # ip netns exec FOO bash
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/mnt -> mnt:[4026532618]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/net -> net:[4026532520]


          term2:



          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/mnt -> mnt:[4026531840]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/net -> net:[4026531992]
          # ip netns exec FOO bash
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/mnt -> mnt:[4026532821]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/net -> net:[4026532520]


          Note how after changing ip netns namespaces, while the new network namespace is the same for term1 and term2, the new mount namespaces are different from each others (and from initial host).



          Now what happens when in term1 you create a new ip netns namespace? Let's see:



          term1:



          # ip netns add BAR
          # ip netns ls
          BAR
          FOO


          term2:



          # ip netns ls
          Error: Peer netns reference is invalid.
          Error: Peer netns reference is invalid.
          BAR
          FOO


          That's because the newer namespace BAR, to be kept existing without a process, is, as others, mounted on (the newly created empty file) /var/run/netns/BAR (again, see previous link for examples). While the mount namespaces are different, they have the same root directory: initial host's root. So of course this newly created empty file /var/run/netns/BAR could be seen everywhere (initial, term1's mount ns, term2's mount ns) when it was created.



          Alas, the mount over it, being done on term1's FOO's mount namespace, can only be seen on term1, not on term2 nor anywhere else, because it's a different mount namespace. So while in term1 ('s FOO ip netns namespace) /var/run/netns/BAR is a pseudo-file belonging to the nsfs pseudo-filesystem:



          term1:



          # stat -f -c %T /var/run/netns/BAR
          nsfs


          It's an empty file on tmpfs (from the actual /run mount) anywhere else:



          term2:



          # stat -f -c %T /var/run/netns/BAR
          tmpfs


          Any other terminal:



          $ stat -f -c %T /var/run/netns/BAR
          tmpfs


          It can still be seen in term1 as long as one doesn't exit the current "ip netns namespace". If from term1 one still switches ip netns namespaces , it will still be fine, because the new unshared ephemeral mount namespace is a copy of the previous, including all the mounts.



          If exited, that mount point is lost (and that means if there are no processes or file descriptors using it anymore, BAR's corresponding network namespace will disappear because it was held only by this mount point). After this any ip netns ls command will complain, anywhere. You can just remove the stale and now useless file /run/netns/BAR to fix it.



          After this step-by-step explanation, what to remember is that you shouldn't create new namespaces with ip netns add inside a namespace currently entered with ip netns exec. You should create them all from the initial (host) namespace, then you can switch at will between them from any ip netns namespace.



          Of course, if /var/run/netns/ (i.e. the mount point /run) is distinct between (staying fuzzy) namespaces, then there is no interaction, and each ip netns invocation will be isolated from others, not seing nor interacting with others. Where does this usually happen? In full containers, where both the mount and the network namespaces are separated and point to distinct resources from the start.




          UPDATE: as asked in comments, I checked how to "repair" this problem, but couldn't find any easy solution.



          First there's a prerequisite: as told above, once the new "ip netns" namespace BAR is created inside FOO, and FOO is left, the only reference to BAR will disappear, thus making BAR also disappear. Something more is needed.



          Actually there are three ways to keep a reference to a namespace:



          • process: that's the main method, and most of the time that's how the namespace is used at all

          • mount point (that's the method used by ip netns): allows to keep a namespace without any process, fine to have a namespace with only network settings inside (interfaces, bridges, tc rules, firewall rules, ...)

          • open file descriptor: rare, used when creating the namespaces, but seldom kept, except for applications dealing with multiple namespaces at the same time and switching some of their threads using the file descriptor for easy reference.

          We can use the 1st or 3rd method. Here are various failed attempts before finding something that works...



          As told before, won't work:



          # ip netns add FOO
          # ip netns exec FOO ip netns add BAR


          Just leave a process running temporarily in the first "ip netns" namespace, for its ephemeral mount namespace part, to keep the needed reference to the new "ip netns" namespace's network namespace and reuse it later from outside (from the initial namespace).



          Won't work either:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; sleep 999 < /var/run/netns/BAR & echo $!'
          28344
          # strace -e trace=readlink,mount mount --bind /proc/6295/fd/0 /var/run/netns/BAR
          readlink("/proc/6295/fd/0", "/run/netns/BAR", 4095) = 14
          readlink("/var/run", "/run", 4095) = 4
          mount("/run/netns/BAR", "/run/netns/BAR", 0x55c88c9cccb0, MS_BIND, NULL) = 0
          +++ exited with 0 +++
          # stat -f -c %T /run/netns/BAR
          tmpfs


          As seen with strace the mount command followed the symlink when it shouldn't have for this use case (note: the mount is still linked to the sleep process somehow which has to be killed to unmount it).



          This (entering sleep's mount namespace, to access the BAR's mounted network namespace hidden there) works but relies on the continued existence of sleep or any process for continued use:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; sleep 999 & echo $!'
          12916
          # nsenter --target=12916 --mount ip -n -brief BAR link show
          lo DOWN 00:00:00:00:00:00 <LOOPBACK>
          dummy8 DOWN 8e:ce:b3:d1:9c:bb <BROADCAST,NOARP>


          strangely this (using the mount namespace shortcut /proc/pid/root/) doesn't work (I don't really know why):



          # stat -f -c %T /proc/12916/root/var/run/netns/BAR 
          tmpfs


          Finally what will work:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; ip netns exec BAR sh -c '''sleep 999 & echo $!''
          14124
          # mount --bind /proc/14124/ns/net /var/run/netns/BAR
          # ip -n BAR -brief link show
          lo DOWN 00:00:00:00:00:00 <LOOPBACK>
          dummy8 DOWN 3a:48:65:20:68:c1 <BROADCAST,NOARP>


          So something like this could be used in the end. There might be race conditions if you attempt to delete them right after, before the sleep command ends.



          # ip netns add FOO
          # mount --bind /proc/$(ip netns exec FOO sh -c 'ip netns add BAR; ip netns exec BAR bash -c '''sleep 5 </dev/null >/dev/null 2>&1 & echo $!; disown'')/ns/net /var/run/netns/BAR


          How could such a construct be used? I have no idea because the original problem before encountering the nested "ip netns" problem was not given. Maybe easier solutions are available without ever trying to create "a nested network namespace".






          share|improve this answer

























          • Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?

            – user98651
            2 days ago












          • It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.

            – A.B
            2 days ago












          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "2"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f961504%2fcannot-create-nested-network-namespace%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          TL;DR: As weird as it seems, this is actually not a network namespace issue, but a mount namespace issue and is to be expected.



          You should create all new "ip netns namespaces" (see later for the meaning), i.e. run all ip netns add ... commands from the initial (host) "ip netns namespace", not from inside an "ip netns namespace" having been entered with ip netns exec .... As long as you don't create them you're then free to switch between them at will including nesting commands from one to an other, with ip netns exec ....



          Detailed explanation with step-by-step examples following...




          ip netns is specialized on network namespaces, but to handle all features, has also to mingle with mount namespaces for two reasons (at least, that I know of):




          • bind mounting /etc/netns/FOO/SOMESERVICE to /etc/SOMESERVICE to manage alternate service/daemon configurations



            A feature which can be handy to easily run some (network related) daemons in an other network namespace but beside this being still part of the "host". You can check my answer at UL on a question about it there: Namespace management with ip netns (iproute2). Its use requires the same treatment as the following feature, so I won't talk about it anymore.




          • remounting /sys to expose new network namespace's network devices in its hierarchy



            This one is a mandatory feature. Example exposing the problem:



            From "initial host":



            # ip link add dev dummy9 type dummy
            # ip -br link show dummy9
            dummy9 DOWN f6:f6:48:9c:12:b9 <BROADCAST,NOARP>
            # ls -l /sys/class/net/dummy9
            lrwxrwxrwx. 1 root root 0 Apr 4 22:09 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9


            Using a lower level tool to change to an other (ephemeral) network namespace:



            # unshare --net ip -br link show dummy9 
            Device "dummy9" does not exist.
            # unshare --net ls -l /sys/class/net/dummy9
            lrwxrwxrwx. 1 root root 0 Apr 4 22:13 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9


            And that's the issue: /sys still exposes initial host's interfaces instead of the new network namespace's interface. That's where there is an interaction between network namespace and with mounting /sys: if /sys is mounted from the new network namespace, it will switch to exposing the new network interfaces in select directory hierarchies (eg /sys/class/net and /sys/devices/virtual/net). This is done at mount time only, not dynamically. Some advanced network settings are easily available by just reading or writing there, so they have to be provided, and the reverse is true: the isolated processes running in the new network environment shouldn't be able to see or alter the initial host's interfaces.



          So ip netns exec FOO ... (but not ip netns add FOO) solves this by also unsharing the mount namespace and remounting /sys/ inside it, to not disrupt initial host's network namespace. But what is important is that this mount namespace is itself ephemeral: when you run separately two ip netns exec FOO ... commands, they don't end up in the same mount namespace. They each have their own, with /sys remounted there pointing to the same network namespace.



          Until now, no problem. I'll call this an "ip netns namespace" when this happened since there are now two types of namespaces involved. We have so far:



          term1:



          # ip netns add FOO
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/mnt -> mnt:[4026531840]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/net -> net:[4026531992]
          # ip netns exec FOO bash
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/mnt -> mnt:[4026532618]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/net -> net:[4026532520]


          term2:



          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/mnt -> mnt:[4026531840]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/net -> net:[4026531992]
          # ip netns exec FOO bash
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/mnt -> mnt:[4026532821]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/net -> net:[4026532520]


          Note how after changing ip netns namespaces, while the new network namespace is the same for term1 and term2, the new mount namespaces are different from each others (and from initial host).



          Now what happens when in term1 you create a new ip netns namespace? Let's see:



          term1:



          # ip netns add BAR
          # ip netns ls
          BAR
          FOO


          term2:



          # ip netns ls
          Error: Peer netns reference is invalid.
          Error: Peer netns reference is invalid.
          BAR
          FOO


          That's because the newer namespace BAR, to be kept existing without a process, is, as others, mounted on (the newly created empty file) /var/run/netns/BAR (again, see previous link for examples). While the mount namespaces are different, they have the same root directory: initial host's root. So of course this newly created empty file /var/run/netns/BAR could be seen everywhere (initial, term1's mount ns, term2's mount ns) when it was created.



          Alas, the mount over it, being done on term1's FOO's mount namespace, can only be seen on term1, not on term2 nor anywhere else, because it's a different mount namespace. So while in term1 ('s FOO ip netns namespace) /var/run/netns/BAR is a pseudo-file belonging to the nsfs pseudo-filesystem:



          term1:



          # stat -f -c %T /var/run/netns/BAR
          nsfs


          It's an empty file on tmpfs (from the actual /run mount) anywhere else:



          term2:



          # stat -f -c %T /var/run/netns/BAR
          tmpfs


          Any other terminal:



          $ stat -f -c %T /var/run/netns/BAR
          tmpfs


          It can still be seen in term1 as long as one doesn't exit the current "ip netns namespace". If from term1 one still switches ip netns namespaces , it will still be fine, because the new unshared ephemeral mount namespace is a copy of the previous, including all the mounts.



          If exited, that mount point is lost (and that means if there are no processes or file descriptors using it anymore, BAR's corresponding network namespace will disappear because it was held only by this mount point). After this any ip netns ls command will complain, anywhere. You can just remove the stale and now useless file /run/netns/BAR to fix it.



          After this step-by-step explanation, what to remember is that you shouldn't create new namespaces with ip netns add inside a namespace currently entered with ip netns exec. You should create them all from the initial (host) namespace, then you can switch at will between them from any ip netns namespace.



          Of course, if /var/run/netns/ (i.e. the mount point /run) is distinct between (staying fuzzy) namespaces, then there is no interaction, and each ip netns invocation will be isolated from others, not seing nor interacting with others. Where does this usually happen? In full containers, where both the mount and the network namespaces are separated and point to distinct resources from the start.




          UPDATE: as asked in comments, I checked how to "repair" this problem, but couldn't find any easy solution.



          First there's a prerequisite: as told above, once the new "ip netns" namespace BAR is created inside FOO, and FOO is left, the only reference to BAR will disappear, thus making BAR also disappear. Something more is needed.



          Actually there are three ways to keep a reference to a namespace:



          • process: that's the main method, and most of the time that's how the namespace is used at all

          • mount point (that's the method used by ip netns): allows to keep a namespace without any process, fine to have a namespace with only network settings inside (interfaces, bridges, tc rules, firewall rules, ...)

          • open file descriptor: rare, used when creating the namespaces, but seldom kept, except for applications dealing with multiple namespaces at the same time and switching some of their threads using the file descriptor for easy reference.

          We can use the 1st or 3rd method. Here are various failed attempts before finding something that works...



          As told before, won't work:



          # ip netns add FOO
          # ip netns exec FOO ip netns add BAR


          Just leave a process running temporarily in the first "ip netns" namespace, for its ephemeral mount namespace part, to keep the needed reference to the new "ip netns" namespace's network namespace and reuse it later from outside (from the initial namespace).



          Won't work either:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; sleep 999 < /var/run/netns/BAR & echo $!'
          28344
          # strace -e trace=readlink,mount mount --bind /proc/6295/fd/0 /var/run/netns/BAR
          readlink("/proc/6295/fd/0", "/run/netns/BAR", 4095) = 14
          readlink("/var/run", "/run", 4095) = 4
          mount("/run/netns/BAR", "/run/netns/BAR", 0x55c88c9cccb0, MS_BIND, NULL) = 0
          +++ exited with 0 +++
          # stat -f -c %T /run/netns/BAR
          tmpfs


          As seen with strace the mount command followed the symlink when it shouldn't have for this use case (note: the mount is still linked to the sleep process somehow which has to be killed to unmount it).



          This (entering sleep's mount namespace, to access the BAR's mounted network namespace hidden there) works but relies on the continued existence of sleep or any process for continued use:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; sleep 999 & echo $!'
          12916
          # nsenter --target=12916 --mount ip -n -brief BAR link show
          lo DOWN 00:00:00:00:00:00 <LOOPBACK>
          dummy8 DOWN 8e:ce:b3:d1:9c:bb <BROADCAST,NOARP>


          strangely this (using the mount namespace shortcut /proc/pid/root/) doesn't work (I don't really know why):



          # stat -f -c %T /proc/12916/root/var/run/netns/BAR 
          tmpfs


          Finally what will work:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; ip netns exec BAR sh -c '''sleep 999 & echo $!''
          14124
          # mount --bind /proc/14124/ns/net /var/run/netns/BAR
          # ip -n BAR -brief link show
          lo DOWN 00:00:00:00:00:00 <LOOPBACK>
          dummy8 DOWN 3a:48:65:20:68:c1 <BROADCAST,NOARP>


          So something like this could be used in the end. There might be race conditions if you attempt to delete them right after, before the sleep command ends.



          # ip netns add FOO
          # mount --bind /proc/$(ip netns exec FOO sh -c 'ip netns add BAR; ip netns exec BAR bash -c '''sleep 5 </dev/null >/dev/null 2>&1 & echo $!; disown'')/ns/net /var/run/netns/BAR


          How could such a construct be used? I have no idea because the original problem before encountering the nested "ip netns" problem was not given. Maybe easier solutions are available without ever trying to create "a nested network namespace".






          share|improve this answer

























          • Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?

            – user98651
            2 days ago












          • It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.

            – A.B
            2 days ago
















          1














          TL;DR: As weird as it seems, this is actually not a network namespace issue, but a mount namespace issue and is to be expected.



          You should create all new "ip netns namespaces" (see later for the meaning), i.e. run all ip netns add ... commands from the initial (host) "ip netns namespace", not from inside an "ip netns namespace" having been entered with ip netns exec .... As long as you don't create them you're then free to switch between them at will including nesting commands from one to an other, with ip netns exec ....



          Detailed explanation with step-by-step examples following...




          ip netns is specialized on network namespaces, but to handle all features, has also to mingle with mount namespaces for two reasons (at least, that I know of):




          • bind mounting /etc/netns/FOO/SOMESERVICE to /etc/SOMESERVICE to manage alternate service/daemon configurations



            A feature which can be handy to easily run some (network related) daemons in an other network namespace but beside this being still part of the "host". You can check my answer at UL on a question about it there: Namespace management with ip netns (iproute2). Its use requires the same treatment as the following feature, so I won't talk about it anymore.




          • remounting /sys to expose new network namespace's network devices in its hierarchy



            This one is a mandatory feature. Example exposing the problem:



            From "initial host":



            # ip link add dev dummy9 type dummy
            # ip -br link show dummy9
            dummy9 DOWN f6:f6:48:9c:12:b9 <BROADCAST,NOARP>
            # ls -l /sys/class/net/dummy9
            lrwxrwxrwx. 1 root root 0 Apr 4 22:09 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9


            Using a lower level tool to change to an other (ephemeral) network namespace:



            # unshare --net ip -br link show dummy9 
            Device "dummy9" does not exist.
            # unshare --net ls -l /sys/class/net/dummy9
            lrwxrwxrwx. 1 root root 0 Apr 4 22:13 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9


            And that's the issue: /sys still exposes initial host's interfaces instead of the new network namespace's interface. That's where there is an interaction between network namespace and with mounting /sys: if /sys is mounted from the new network namespace, it will switch to exposing the new network interfaces in select directory hierarchies (eg /sys/class/net and /sys/devices/virtual/net). This is done at mount time only, not dynamically. Some advanced network settings are easily available by just reading or writing there, so they have to be provided, and the reverse is true: the isolated processes running in the new network environment shouldn't be able to see or alter the initial host's interfaces.



          So ip netns exec FOO ... (but not ip netns add FOO) solves this by also unsharing the mount namespace and remounting /sys/ inside it, to not disrupt initial host's network namespace. But what is important is that this mount namespace is itself ephemeral: when you run separately two ip netns exec FOO ... commands, they don't end up in the same mount namespace. They each have their own, with /sys remounted there pointing to the same network namespace.



          Until now, no problem. I'll call this an "ip netns namespace" when this happened since there are now two types of namespaces involved. We have so far:



          term1:



          # ip netns add FOO
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/mnt -> mnt:[4026531840]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/net -> net:[4026531992]
          # ip netns exec FOO bash
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/mnt -> mnt:[4026532618]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/net -> net:[4026532520]


          term2:



          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/mnt -> mnt:[4026531840]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/net -> net:[4026531992]
          # ip netns exec FOO bash
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/mnt -> mnt:[4026532821]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/net -> net:[4026532520]


          Note how after changing ip netns namespaces, while the new network namespace is the same for term1 and term2, the new mount namespaces are different from each others (and from initial host).



          Now what happens when in term1 you create a new ip netns namespace? Let's see:



          term1:



          # ip netns add BAR
          # ip netns ls
          BAR
          FOO


          term2:



          # ip netns ls
          Error: Peer netns reference is invalid.
          Error: Peer netns reference is invalid.
          BAR
          FOO


          That's because the newer namespace BAR, to be kept existing without a process, is, as others, mounted on (the newly created empty file) /var/run/netns/BAR (again, see previous link for examples). While the mount namespaces are different, they have the same root directory: initial host's root. So of course this newly created empty file /var/run/netns/BAR could be seen everywhere (initial, term1's mount ns, term2's mount ns) when it was created.



          Alas, the mount over it, being done on term1's FOO's mount namespace, can only be seen on term1, not on term2 nor anywhere else, because it's a different mount namespace. So while in term1 ('s FOO ip netns namespace) /var/run/netns/BAR is a pseudo-file belonging to the nsfs pseudo-filesystem:



          term1:



          # stat -f -c %T /var/run/netns/BAR
          nsfs


          It's an empty file on tmpfs (from the actual /run mount) anywhere else:



          term2:



          # stat -f -c %T /var/run/netns/BAR
          tmpfs


          Any other terminal:



          $ stat -f -c %T /var/run/netns/BAR
          tmpfs


          It can still be seen in term1 as long as one doesn't exit the current "ip netns namespace". If from term1 one still switches ip netns namespaces , it will still be fine, because the new unshared ephemeral mount namespace is a copy of the previous, including all the mounts.



          If exited, that mount point is lost (and that means if there are no processes or file descriptors using it anymore, BAR's corresponding network namespace will disappear because it was held only by this mount point). After this any ip netns ls command will complain, anywhere. You can just remove the stale and now useless file /run/netns/BAR to fix it.



          After this step-by-step explanation, what to remember is that you shouldn't create new namespaces with ip netns add inside a namespace currently entered with ip netns exec. You should create them all from the initial (host) namespace, then you can switch at will between them from any ip netns namespace.



          Of course, if /var/run/netns/ (i.e. the mount point /run) is distinct between (staying fuzzy) namespaces, then there is no interaction, and each ip netns invocation will be isolated from others, not seing nor interacting with others. Where does this usually happen? In full containers, where both the mount and the network namespaces are separated and point to distinct resources from the start.




          UPDATE: as asked in comments, I checked how to "repair" this problem, but couldn't find any easy solution.



          First there's a prerequisite: as told above, once the new "ip netns" namespace BAR is created inside FOO, and FOO is left, the only reference to BAR will disappear, thus making BAR also disappear. Something more is needed.



          Actually there are three ways to keep a reference to a namespace:



          • process: that's the main method, and most of the time that's how the namespace is used at all

          • mount point (that's the method used by ip netns): allows to keep a namespace without any process, fine to have a namespace with only network settings inside (interfaces, bridges, tc rules, firewall rules, ...)

          • open file descriptor: rare, used when creating the namespaces, but seldom kept, except for applications dealing with multiple namespaces at the same time and switching some of their threads using the file descriptor for easy reference.

          We can use the 1st or 3rd method. Here are various failed attempts before finding something that works...



          As told before, won't work:



          # ip netns add FOO
          # ip netns exec FOO ip netns add BAR


          Just leave a process running temporarily in the first "ip netns" namespace, for its ephemeral mount namespace part, to keep the needed reference to the new "ip netns" namespace's network namespace and reuse it later from outside (from the initial namespace).



          Won't work either:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; sleep 999 < /var/run/netns/BAR & echo $!'
          28344
          # strace -e trace=readlink,mount mount --bind /proc/6295/fd/0 /var/run/netns/BAR
          readlink("/proc/6295/fd/0", "/run/netns/BAR", 4095) = 14
          readlink("/var/run", "/run", 4095) = 4
          mount("/run/netns/BAR", "/run/netns/BAR", 0x55c88c9cccb0, MS_BIND, NULL) = 0
          +++ exited with 0 +++
          # stat -f -c %T /run/netns/BAR
          tmpfs


          As seen with strace the mount command followed the symlink when it shouldn't have for this use case (note: the mount is still linked to the sleep process somehow which has to be killed to unmount it).



          This (entering sleep's mount namespace, to access the BAR's mounted network namespace hidden there) works but relies on the continued existence of sleep or any process for continued use:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; sleep 999 & echo $!'
          12916
          # nsenter --target=12916 --mount ip -n -brief BAR link show
          lo DOWN 00:00:00:00:00:00 <LOOPBACK>
          dummy8 DOWN 8e:ce:b3:d1:9c:bb <BROADCAST,NOARP>


          strangely this (using the mount namespace shortcut /proc/pid/root/) doesn't work (I don't really know why):



          # stat -f -c %T /proc/12916/root/var/run/netns/BAR 
          tmpfs


          Finally what will work:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; ip netns exec BAR sh -c '''sleep 999 & echo $!''
          14124
          # mount --bind /proc/14124/ns/net /var/run/netns/BAR
          # ip -n BAR -brief link show
          lo DOWN 00:00:00:00:00:00 <LOOPBACK>
          dummy8 DOWN 3a:48:65:20:68:c1 <BROADCAST,NOARP>


          So something like this could be used in the end. There might be race conditions if you attempt to delete them right after, before the sleep command ends.



          # ip netns add FOO
          # mount --bind /proc/$(ip netns exec FOO sh -c 'ip netns add BAR; ip netns exec BAR bash -c '''sleep 5 </dev/null >/dev/null 2>&1 & echo $!; disown'')/ns/net /var/run/netns/BAR


          How could such a construct be used? I have no idea because the original problem before encountering the nested "ip netns" problem was not given. Maybe easier solutions are available without ever trying to create "a nested network namespace".






          share|improve this answer

























          • Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?

            – user98651
            2 days ago












          • It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.

            – A.B
            2 days ago














          1












          1








          1







          TL;DR: As weird as it seems, this is actually not a network namespace issue, but a mount namespace issue and is to be expected.



          You should create all new "ip netns namespaces" (see later for the meaning), i.e. run all ip netns add ... commands from the initial (host) "ip netns namespace", not from inside an "ip netns namespace" having been entered with ip netns exec .... As long as you don't create them you're then free to switch between them at will including nesting commands from one to an other, with ip netns exec ....



          Detailed explanation with step-by-step examples following...




          ip netns is specialized on network namespaces, but to handle all features, has also to mingle with mount namespaces for two reasons (at least, that I know of):




          • bind mounting /etc/netns/FOO/SOMESERVICE to /etc/SOMESERVICE to manage alternate service/daemon configurations



            A feature which can be handy to easily run some (network related) daemons in an other network namespace but beside this being still part of the "host". You can check my answer at UL on a question about it there: Namespace management with ip netns (iproute2). Its use requires the same treatment as the following feature, so I won't talk about it anymore.




          • remounting /sys to expose new network namespace's network devices in its hierarchy



            This one is a mandatory feature. Example exposing the problem:



            From "initial host":



            # ip link add dev dummy9 type dummy
            # ip -br link show dummy9
            dummy9 DOWN f6:f6:48:9c:12:b9 <BROADCAST,NOARP>
            # ls -l /sys/class/net/dummy9
            lrwxrwxrwx. 1 root root 0 Apr 4 22:09 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9


            Using a lower level tool to change to an other (ephemeral) network namespace:



            # unshare --net ip -br link show dummy9 
            Device "dummy9" does not exist.
            # unshare --net ls -l /sys/class/net/dummy9
            lrwxrwxrwx. 1 root root 0 Apr 4 22:13 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9


            And that's the issue: /sys still exposes initial host's interfaces instead of the new network namespace's interface. That's where there is an interaction between network namespace and with mounting /sys: if /sys is mounted from the new network namespace, it will switch to exposing the new network interfaces in select directory hierarchies (eg /sys/class/net and /sys/devices/virtual/net). This is done at mount time only, not dynamically. Some advanced network settings are easily available by just reading or writing there, so they have to be provided, and the reverse is true: the isolated processes running in the new network environment shouldn't be able to see or alter the initial host's interfaces.



          So ip netns exec FOO ... (but not ip netns add FOO) solves this by also unsharing the mount namespace and remounting /sys/ inside it, to not disrupt initial host's network namespace. But what is important is that this mount namespace is itself ephemeral: when you run separately two ip netns exec FOO ... commands, they don't end up in the same mount namespace. They each have their own, with /sys remounted there pointing to the same network namespace.



          Until now, no problem. I'll call this an "ip netns namespace" when this happened since there are now two types of namespaces involved. We have so far:



          term1:



          # ip netns add FOO
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/mnt -> mnt:[4026531840]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/net -> net:[4026531992]
          # ip netns exec FOO bash
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/mnt -> mnt:[4026532618]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/net -> net:[4026532520]


          term2:



          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/mnt -> mnt:[4026531840]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/net -> net:[4026531992]
          # ip netns exec FOO bash
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/mnt -> mnt:[4026532821]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/net -> net:[4026532520]


          Note how after changing ip netns namespaces, while the new network namespace is the same for term1 and term2, the new mount namespaces are different from each others (and from initial host).



          Now what happens when in term1 you create a new ip netns namespace? Let's see:



          term1:



          # ip netns add BAR
          # ip netns ls
          BAR
          FOO


          term2:



          # ip netns ls
          Error: Peer netns reference is invalid.
          Error: Peer netns reference is invalid.
          BAR
          FOO


          That's because the newer namespace BAR, to be kept existing without a process, is, as others, mounted on (the newly created empty file) /var/run/netns/BAR (again, see previous link for examples). While the mount namespaces are different, they have the same root directory: initial host's root. So of course this newly created empty file /var/run/netns/BAR could be seen everywhere (initial, term1's mount ns, term2's mount ns) when it was created.



          Alas, the mount over it, being done on term1's FOO's mount namespace, can only be seen on term1, not on term2 nor anywhere else, because it's a different mount namespace. So while in term1 ('s FOO ip netns namespace) /var/run/netns/BAR is a pseudo-file belonging to the nsfs pseudo-filesystem:



          term1:



          # stat -f -c %T /var/run/netns/BAR
          nsfs


          It's an empty file on tmpfs (from the actual /run mount) anywhere else:



          term2:



          # stat -f -c %T /var/run/netns/BAR
          tmpfs


          Any other terminal:



          $ stat -f -c %T /var/run/netns/BAR
          tmpfs


          It can still be seen in term1 as long as one doesn't exit the current "ip netns namespace". If from term1 one still switches ip netns namespaces , it will still be fine, because the new unshared ephemeral mount namespace is a copy of the previous, including all the mounts.



          If exited, that mount point is lost (and that means if there are no processes or file descriptors using it anymore, BAR's corresponding network namespace will disappear because it was held only by this mount point). After this any ip netns ls command will complain, anywhere. You can just remove the stale and now useless file /run/netns/BAR to fix it.



          After this step-by-step explanation, what to remember is that you shouldn't create new namespaces with ip netns add inside a namespace currently entered with ip netns exec. You should create them all from the initial (host) namespace, then you can switch at will between them from any ip netns namespace.



          Of course, if /var/run/netns/ (i.e. the mount point /run) is distinct between (staying fuzzy) namespaces, then there is no interaction, and each ip netns invocation will be isolated from others, not seing nor interacting with others. Where does this usually happen? In full containers, where both the mount and the network namespaces are separated and point to distinct resources from the start.




          UPDATE: as asked in comments, I checked how to "repair" this problem, but couldn't find any easy solution.



          First there's a prerequisite: as told above, once the new "ip netns" namespace BAR is created inside FOO, and FOO is left, the only reference to BAR will disappear, thus making BAR also disappear. Something more is needed.



          Actually there are three ways to keep a reference to a namespace:



          • process: that's the main method, and most of the time that's how the namespace is used at all

          • mount point (that's the method used by ip netns): allows to keep a namespace without any process, fine to have a namespace with only network settings inside (interfaces, bridges, tc rules, firewall rules, ...)

          • open file descriptor: rare, used when creating the namespaces, but seldom kept, except for applications dealing with multiple namespaces at the same time and switching some of their threads using the file descriptor for easy reference.

          We can use the 1st or 3rd method. Here are various failed attempts before finding something that works...



          As told before, won't work:



          # ip netns add FOO
          # ip netns exec FOO ip netns add BAR


          Just leave a process running temporarily in the first "ip netns" namespace, for its ephemeral mount namespace part, to keep the needed reference to the new "ip netns" namespace's network namespace and reuse it later from outside (from the initial namespace).



          Won't work either:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; sleep 999 < /var/run/netns/BAR & echo $!'
          28344
          # strace -e trace=readlink,mount mount --bind /proc/6295/fd/0 /var/run/netns/BAR
          readlink("/proc/6295/fd/0", "/run/netns/BAR", 4095) = 14
          readlink("/var/run", "/run", 4095) = 4
          mount("/run/netns/BAR", "/run/netns/BAR", 0x55c88c9cccb0, MS_BIND, NULL) = 0
          +++ exited with 0 +++
          # stat -f -c %T /run/netns/BAR
          tmpfs


          As seen with strace the mount command followed the symlink when it shouldn't have for this use case (note: the mount is still linked to the sleep process somehow which has to be killed to unmount it).



          This (entering sleep's mount namespace, to access the BAR's mounted network namespace hidden there) works but relies on the continued existence of sleep or any process for continued use:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; sleep 999 & echo $!'
          12916
          # nsenter --target=12916 --mount ip -n -brief BAR link show
          lo DOWN 00:00:00:00:00:00 <LOOPBACK>
          dummy8 DOWN 8e:ce:b3:d1:9c:bb <BROADCAST,NOARP>


          strangely this (using the mount namespace shortcut /proc/pid/root/) doesn't work (I don't really know why):



          # stat -f -c %T /proc/12916/root/var/run/netns/BAR 
          tmpfs


          Finally what will work:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; ip netns exec BAR sh -c '''sleep 999 & echo $!''
          14124
          # mount --bind /proc/14124/ns/net /var/run/netns/BAR
          # ip -n BAR -brief link show
          lo DOWN 00:00:00:00:00:00 <LOOPBACK>
          dummy8 DOWN 3a:48:65:20:68:c1 <BROADCAST,NOARP>


          So something like this could be used in the end. There might be race conditions if you attempt to delete them right after, before the sleep command ends.



          # ip netns add FOO
          # mount --bind /proc/$(ip netns exec FOO sh -c 'ip netns add BAR; ip netns exec BAR bash -c '''sleep 5 </dev/null >/dev/null 2>&1 & echo $!; disown'')/ns/net /var/run/netns/BAR


          How could such a construct be used? I have no idea because the original problem before encountering the nested "ip netns" problem was not given. Maybe easier solutions are available without ever trying to create "a nested network namespace".






          share|improve this answer















          TL;DR: As weird as it seems, this is actually not a network namespace issue, but a mount namespace issue and is to be expected.



          You should create all new "ip netns namespaces" (see later for the meaning), i.e. run all ip netns add ... commands from the initial (host) "ip netns namespace", not from inside an "ip netns namespace" having been entered with ip netns exec .... As long as you don't create them you're then free to switch between them at will including nesting commands from one to an other, with ip netns exec ....



          Detailed explanation with step-by-step examples following...




          ip netns is specialized on network namespaces, but to handle all features, has also to mingle with mount namespaces for two reasons (at least, that I know of):




          • bind mounting /etc/netns/FOO/SOMESERVICE to /etc/SOMESERVICE to manage alternate service/daemon configurations



            A feature which can be handy to easily run some (network related) daemons in an other network namespace but beside this being still part of the "host". You can check my answer at UL on a question about it there: Namespace management with ip netns (iproute2). Its use requires the same treatment as the following feature, so I won't talk about it anymore.




          • remounting /sys to expose new network namespace's network devices in its hierarchy



            This one is a mandatory feature. Example exposing the problem:



            From "initial host":



            # ip link add dev dummy9 type dummy
            # ip -br link show dummy9
            dummy9 DOWN f6:f6:48:9c:12:b9 <BROADCAST,NOARP>
            # ls -l /sys/class/net/dummy9
            lrwxrwxrwx. 1 root root 0 Apr 4 22:09 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9


            Using a lower level tool to change to an other (ephemeral) network namespace:



            # unshare --net ip -br link show dummy9 
            Device "dummy9" does not exist.
            # unshare --net ls -l /sys/class/net/dummy9
            lrwxrwxrwx. 1 root root 0 Apr 4 22:13 /sys/class/net/dummy9 -> ../../devices/virtual/net/dummy9


            And that's the issue: /sys still exposes initial host's interfaces instead of the new network namespace's interface. That's where there is an interaction between network namespace and with mounting /sys: if /sys is mounted from the new network namespace, it will switch to exposing the new network interfaces in select directory hierarchies (eg /sys/class/net and /sys/devices/virtual/net). This is done at mount time only, not dynamically. Some advanced network settings are easily available by just reading or writing there, so they have to be provided, and the reverse is true: the isolated processes running in the new network environment shouldn't be able to see or alter the initial host's interfaces.



          So ip netns exec FOO ... (but not ip netns add FOO) solves this by also unsharing the mount namespace and remounting /sys/ inside it, to not disrupt initial host's network namespace. But what is important is that this mount namespace is itself ephemeral: when you run separately two ip netns exec FOO ... commands, they don't end up in the same mount namespace. They each have their own, with /sys remounted there pointing to the same network namespace.



          Until now, no problem. I'll call this an "ip netns namespace" when this happened since there are now two types of namespaces involved. We have so far:



          term1:



          # ip netns add FOO
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/mnt -> mnt:[4026531840]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:28 /proc/1712/ns/net -> net:[4026531992]
          # ip netns exec FOO bash
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/mnt -> mnt:[4026532618]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1864/ns/net -> net:[4026532520]


          term2:



          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/mnt -> mnt:[4026531840]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:32 /proc/1761/ns/net -> net:[4026531992]
          # ip netns exec FOO bash
          # ls -l /proc/$$/ns/mnt,net
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/mnt -> mnt:[4026532821]
          lrwxrwxrwx. 1 root root 0 Apr 4 22:33 /proc/1866/ns/net -> net:[4026532520]


          Note how after changing ip netns namespaces, while the new network namespace is the same for term1 and term2, the new mount namespaces are different from each others (and from initial host).



          Now what happens when in term1 you create a new ip netns namespace? Let's see:



          term1:



          # ip netns add BAR
          # ip netns ls
          BAR
          FOO


          term2:



          # ip netns ls
          Error: Peer netns reference is invalid.
          Error: Peer netns reference is invalid.
          BAR
          FOO


          That's because the newer namespace BAR, to be kept existing without a process, is, as others, mounted on (the newly created empty file) /var/run/netns/BAR (again, see previous link for examples). While the mount namespaces are different, they have the same root directory: initial host's root. So of course this newly created empty file /var/run/netns/BAR could be seen everywhere (initial, term1's mount ns, term2's mount ns) when it was created.



          Alas, the mount over it, being done on term1's FOO's mount namespace, can only be seen on term1, not on term2 nor anywhere else, because it's a different mount namespace. So while in term1 ('s FOO ip netns namespace) /var/run/netns/BAR is a pseudo-file belonging to the nsfs pseudo-filesystem:



          term1:



          # stat -f -c %T /var/run/netns/BAR
          nsfs


          It's an empty file on tmpfs (from the actual /run mount) anywhere else:



          term2:



          # stat -f -c %T /var/run/netns/BAR
          tmpfs


          Any other terminal:



          $ stat -f -c %T /var/run/netns/BAR
          tmpfs


          It can still be seen in term1 as long as one doesn't exit the current "ip netns namespace". If from term1 one still switches ip netns namespaces , it will still be fine, because the new unshared ephemeral mount namespace is a copy of the previous, including all the mounts.



          If exited, that mount point is lost (and that means if there are no processes or file descriptors using it anymore, BAR's corresponding network namespace will disappear because it was held only by this mount point). After this any ip netns ls command will complain, anywhere. You can just remove the stale and now useless file /run/netns/BAR to fix it.



          After this step-by-step explanation, what to remember is that you shouldn't create new namespaces with ip netns add inside a namespace currently entered with ip netns exec. You should create them all from the initial (host) namespace, then you can switch at will between them from any ip netns namespace.



          Of course, if /var/run/netns/ (i.e. the mount point /run) is distinct between (staying fuzzy) namespaces, then there is no interaction, and each ip netns invocation will be isolated from others, not seing nor interacting with others. Where does this usually happen? In full containers, where both the mount and the network namespaces are separated and point to distinct resources from the start.




          UPDATE: as asked in comments, I checked how to "repair" this problem, but couldn't find any easy solution.



          First there's a prerequisite: as told above, once the new "ip netns" namespace BAR is created inside FOO, and FOO is left, the only reference to BAR will disappear, thus making BAR also disappear. Something more is needed.



          Actually there are three ways to keep a reference to a namespace:



          • process: that's the main method, and most of the time that's how the namespace is used at all

          • mount point (that's the method used by ip netns): allows to keep a namespace without any process, fine to have a namespace with only network settings inside (interfaces, bridges, tc rules, firewall rules, ...)

          • open file descriptor: rare, used when creating the namespaces, but seldom kept, except for applications dealing with multiple namespaces at the same time and switching some of their threads using the file descriptor for easy reference.

          We can use the 1st or 3rd method. Here are various failed attempts before finding something that works...



          As told before, won't work:



          # ip netns add FOO
          # ip netns exec FOO ip netns add BAR


          Just leave a process running temporarily in the first "ip netns" namespace, for its ephemeral mount namespace part, to keep the needed reference to the new "ip netns" namespace's network namespace and reuse it later from outside (from the initial namespace).



          Won't work either:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; sleep 999 < /var/run/netns/BAR & echo $!'
          28344
          # strace -e trace=readlink,mount mount --bind /proc/6295/fd/0 /var/run/netns/BAR
          readlink("/proc/6295/fd/0", "/run/netns/BAR", 4095) = 14
          readlink("/var/run", "/run", 4095) = 4
          mount("/run/netns/BAR", "/run/netns/BAR", 0x55c88c9cccb0, MS_BIND, NULL) = 0
          +++ exited with 0 +++
          # stat -f -c %T /run/netns/BAR
          tmpfs


          As seen with strace the mount command followed the symlink when it shouldn't have for this use case (note: the mount is still linked to the sleep process somehow which has to be killed to unmount it).



          This (entering sleep's mount namespace, to access the BAR's mounted network namespace hidden there) works but relies on the continued existence of sleep or any process for continued use:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; sleep 999 & echo $!'
          12916
          # nsenter --target=12916 --mount ip -n -brief BAR link show
          lo DOWN 00:00:00:00:00:00 <LOOPBACK>
          dummy8 DOWN 8e:ce:b3:d1:9c:bb <BROADCAST,NOARP>


          strangely this (using the mount namespace shortcut /proc/pid/root/) doesn't work (I don't really know why):



          # stat -f -c %T /proc/12916/root/var/run/netns/BAR 
          tmpfs


          Finally what will work:



          # ip netns add FOO
          # ip netns exec FOO sh -c 'ip netns add BAR; ip -n BAR link add dummy8 type dummy; ip netns exec BAR sh -c '''sleep 999 & echo $!''
          14124
          # mount --bind /proc/14124/ns/net /var/run/netns/BAR
          # ip -n BAR -brief link show
          lo DOWN 00:00:00:00:00:00 <LOOPBACK>
          dummy8 DOWN 3a:48:65:20:68:c1 <BROADCAST,NOARP>


          So something like this could be used in the end. There might be race conditions if you attempt to delete them right after, before the sleep command ends.



          # ip netns add FOO
          # mount --bind /proc/$(ip netns exec FOO sh -c 'ip netns add BAR; ip netns exec BAR bash -c '''sleep 5 </dev/null >/dev/null 2>&1 & echo $!; disown'')/ns/net /var/run/netns/BAR


          How could such a construct be used? I have no idea because the original problem before encountering the nested "ip netns" problem was not given. Maybe easier solutions are available without ever trying to create "a nested network namespace".







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 2 days ago

























          answered Apr 4 at 21:43









          A.BA.B

          1,9342717




          1,9342717












          • Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?

            – user98651
            2 days ago












          • It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.

            – A.B
            2 days ago


















          • Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?

            – user98651
            2 days ago












          • It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.

            – A.B
            2 days ago

















          Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?

          – user98651
          2 days ago






          Great answer, thanks. Is there a way to create a new netns safely while inside a netfs. i.e ip netfs exec foo1 /bin/bash.... ip netns exec <something> ip netns add foo2?

          – user98651
          2 days ago














          It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.

          – A.B
          2 days ago






          It appears much more difficult than it seemed, and I don't see how to use the result in an actual use case. Perhaps you should ask an other question, about the original problem which forced you to try creating "nested network namespaces". Anyway I'm updating the answer.

          – A.B
          2 days ago


















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Server Fault!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f961504%2fcannot-create-nested-network-namespace%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

          Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

          What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company