Python Pandas Expand a Column of List of Lists to Two New ColumnPandas split column of lists into multiple columnsHow to unnest (explode) a column in a pandas DataFrame?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonPython join: why is it string.join(list) instead of list.join(string)?Getting the last element of a list in PythonHow do I get the number of elements in a list in Python?How do I concatenate two lists in Python?Renaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameSelect rows from a DataFrame based on values in a column in pandas

Why does Mjolnir fall down in Age of Ultron but not in Endgame?

How to let other coworkers know that I don't share my coworker's political views?

Ingress filtering on edge routers and performance concerns

Is Jon Snow the last of his House?

Where's this lookout in Nova Scotia?

How to cut a climbing rope?

Why did Jon Snow do this immoral act if he is so honorable?

How to patch glass cuts in a bicycle tire?

Find the three digit Prime number P from the given unusual relationships

Value of a binomial series

Can a person survive on blood in place of water?

How to respond to upset student?

Popcorn is the only acceptable snack to consume while watching a movie

First Match - awk

How to attach cable mounting points to a bicycle frame?

Is it legal to have an abortion in another state or abroad?

Is the Unsullied name meant to be ironic? How did it come to be?

Is it possible to remotely hack the GPS system and disable GPS service worldwide?

Could a 19.25mm revolver actually exist?

Why are GND pads often only connected by four traces?

What was the idiom for something that we take without a doubt?

Why did Theresa May offer a vote on a second Brexit referendum?

Why were helmets and other body armour not commonplace in the 1800s?

Why most published works in medical imaging try reducing false positives?



Python Pandas Expand a Column of List of Lists to Two New Column


Pandas split column of lists into multiple columnsHow to unnest (explode) a column in a pandas DataFrame?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonPython join: why is it string.join(list) instead of list.join(string)?Getting the last element of a list in PythonHow do I get the number of elements in a list in Python?How do I concatenate two lists in Python?Renaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameSelect rows from a DataFrame based on values in a column in pandas






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








10















I have a DF which looks like this.



name id apps
john 1 [[app1, v1], [app2, v2], [app3,v3]]
smith 2 [[app1, v1], [app4, v4]]


I want to expand the apps column such that it looks like this.



name id app_name app_version
john 1 app1 v1
john 1 app2 v2
john 1 app3 v3
smith 2 app1 v1
smith 2 app4 v4


Any help is appreciated










share|improve this question




























    10















    I have a DF which looks like this.



    name id apps
    john 1 [[app1, v1], [app2, v2], [app3,v3]]
    smith 2 [[app1, v1], [app4, v4]]


    I want to expand the apps column such that it looks like this.



    name id app_name app_version
    john 1 app1 v1
    john 1 app2 v2
    john 1 app3 v3
    smith 2 app1 v1
    smith 2 app4 v4


    Any help is appreciated










    share|improve this question
























      10












      10








      10








      I have a DF which looks like this.



      name id apps
      john 1 [[app1, v1], [app2, v2], [app3,v3]]
      smith 2 [[app1, v1], [app4, v4]]


      I want to expand the apps column such that it looks like this.



      name id app_name app_version
      john 1 app1 v1
      john 1 app2 v2
      john 1 app3 v3
      smith 2 app1 v1
      smith 2 app4 v4


      Any help is appreciated










      share|improve this question














      I have a DF which looks like this.



      name id apps
      john 1 [[app1, v1], [app2, v2], [app3,v3]]
      smith 2 [[app1, v1], [app4, v4]]


      I want to expand the apps column such that it looks like this.



      name id app_name app_version
      john 1 app1 v1
      john 1 app2 v2
      john 1 app3 v3
      smith 2 app1 v1
      smith 2 app4 v4


      Any help is appreciated







      python pandas list






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked May 11 at 23:52









      ImsaImsa

      417525




      417525






















          5 Answers
          5






          active

          oldest

          votes


















          4














          You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.



          import pandas as pd

          df = pd.DataFrame(
          'name': ['john', 'smith'],
          'id': [1, 2],
          'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
          [['app1', 'v1'], ['app4', 'v4']]]
          )

          dftmp = df.apps.apply(pd.Series).T.melt().dropna()
          dfapp = (dftmp.value
          .apply(pd.Series)
          .set_index(dftmp.variable)
          .rename(columns=0:'app_name', 1:'app_version')
          )

          df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
          # returns:
          name id app_name app_version
          0 john 1 app1 v1
          0 john 1 app2 v2
          0 john 1 app3 v3
          1 smith 2 app1 v1
          1 smith 2 app4 v4





          share|improve this answer























          • Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

            – rafaelc
            May 12 at 1:13











          • Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

            – James
            May 12 at 1:21











          • I have, that's why I commented.

            – rafaelc
            May 12 at 2:19






          • 1





            Wow, thanks. That is like 30% faster.

            – James
            May 12 at 2:22






          • 1





            @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

            – Quang Hoang
            May 12 at 2:35


















          3














          You can always have a brute force solution. Something like:



          name, id, app_name, app_version = [], [], [], []
          for i in range(len(df)):
          for v in df.loc[i,'apps']:
          app_name.append(v[0])
          app_version.append(v[1])
          name.append(df.loc[i, 'name'])
          id.append(df.loc[i, 'id'])
          df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)


          will do the work.



          Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']






          share|improve this answer




















          • 2





            Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

            – rafaelc
            May 12 at 1:15


















          3














          Another approach would be (should be quite fast too):



          #Repeat the columns without the list by the str length of the list
          m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
          #creating a df exploding the list to 2 columns
          n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
          #concat them together
          df_new=pd.concat([m,n],axis=1)



           name id app_name app_version
          0 john 1 app1 v1
          1 john 1 app2 v2
          2 john 1 app3 v3
          3 smith 2 app1 v1
          4 smith 2 app4 v4





          share|improve this answer
































            3














            Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting



            df.set_index(['name','id']).apps.apply(pd.Series).
            stack().apply(pd.Series).
            reset_index(level=[0,1]).
            rename(columns=0:'app_name',1:'app_version')
            Out[541]:
            name id app_name app_version
            0 john 1 app1 v1
            1 john 1 app2 v2
            2 john 1 app3 v3
            0 smith 2 app1 v1
            1 smith 2 app4 v4



            Method two slightly modify the function I write



            def unnesting(df, explode):
            idx = df.index.repeat(df[explode[0]].str.len())
            df1 = pd.concat([
            pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
            df1.index = idx
            return df1.join(df.drop(explode, 1), how='left')



            Then



            yourdf=unnesting(df,['apps'])

            yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
            yourdf
            Out[548]:
            apps id name app_name app_version
            0 [app1, v1] 1 john app1 v1
            0 [app2, v2] 1 john app2 v2
            0 [app3, v3] 1 john app3 v3
            1 [app1, v1] 2 smith app1 v1
            1 [app4, v4] 2 smith app4 v4


            Or



            yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
            yourdf[['app_name','app_version']]=yourdf.apps.tolist()
            yourdf
            Out[567]:
            apps id name app_name app_version
            0 [app1, v1] 1 john app1 v1
            0 [app2, v2] 1 john app2 v2
            0 [app3, v3] 1 john app3 v3
            1 [app1, v1] 2 smith app1 v1
            1 [app4, v4] 2 smith app4 v4





            share|improve this answer
































              1














              My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:



              def expand_row(row):
              return pd.DataFrame(
              'name': row['name'], # row.name is the name of the series
              'id': row['id'],
              'app_name': [app[0] for app in row.apps],
              'app_version': [app[1] for app in row.apps]
              )

              temp_dfs = df.apply(expand_row, axis=1).tolist()
              expanded = pd.concat(temp_dfs)
              expanded = expanded.reset_index() # put index in the correct order

              print(expanded)

              # name id app_name app_version
              # 0 john 1 app1 v1
              # 1 john 1 app2 v2
              # 2 john 1 app3 v3
              # 3 smith 2 app1 v1
              # 4 smith 2 app4 v4


              Also, here is a solution using python only, which, if my intuition is correct, should be fast:



              rows = df.values.tolist()
              expanded = [[row[0], row[1], app[0], app[1]]
              for row in rows
              for app in row[2]]
              df = pd.DataFrame(
              expanded, columns=['name', 'id', 'app_name', 'app_version'])

              # name id app_name app_version
              # 0 john 1 app1 v1
              # 1 john 1 app2 v2
              # 2 john 1 app3 v3
              # 3 smith 2 app1 v1
              # 4 smith 2 app4 v4





              share|improve this answer

























                Your Answer






                StackExchange.ifUsing("editor", function ()
                StackExchange.using("externalEditor", function ()
                StackExchange.using("snippets", function ()
                StackExchange.snippets.init();
                );
                );
                , "code-snippets");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "1"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56095142%2fpython-pandas-expand-a-column-of-list-of-lists-to-two-new-column%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                4














                You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.



                import pandas as pd

                df = pd.DataFrame(
                'name': ['john', 'smith'],
                'id': [1, 2],
                'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
                [['app1', 'v1'], ['app4', 'v4']]]
                )

                dftmp = df.apps.apply(pd.Series).T.melt().dropna()
                dfapp = (dftmp.value
                .apply(pd.Series)
                .set_index(dftmp.variable)
                .rename(columns=0:'app_name', 1:'app_version')
                )

                df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
                # returns:
                name id app_name app_version
                0 john 1 app1 v1
                0 john 1 app2 v2
                0 john 1 app3 v3
                1 smith 2 app1 v1
                1 smith 2 app4 v4





                share|improve this answer























                • Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

                  – rafaelc
                  May 12 at 1:13











                • Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

                  – James
                  May 12 at 1:21











                • I have, that's why I commented.

                  – rafaelc
                  May 12 at 2:19






                • 1





                  Wow, thanks. That is like 30% faster.

                  – James
                  May 12 at 2:22






                • 1





                  @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

                  – Quang Hoang
                  May 12 at 2:35















                4














                You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.



                import pandas as pd

                df = pd.DataFrame(
                'name': ['john', 'smith'],
                'id': [1, 2],
                'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
                [['app1', 'v1'], ['app4', 'v4']]]
                )

                dftmp = df.apps.apply(pd.Series).T.melt().dropna()
                dfapp = (dftmp.value
                .apply(pd.Series)
                .set_index(dftmp.variable)
                .rename(columns=0:'app_name', 1:'app_version')
                )

                df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
                # returns:
                name id app_name app_version
                0 john 1 app1 v1
                0 john 1 app2 v2
                0 john 1 app3 v3
                1 smith 2 app1 v1
                1 smith 2 app4 v4





                share|improve this answer























                • Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

                  – rafaelc
                  May 12 at 1:13











                • Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

                  – James
                  May 12 at 1:21











                • I have, that's why I commented.

                  – rafaelc
                  May 12 at 2:19






                • 1





                  Wow, thanks. That is like 30% faster.

                  – James
                  May 12 at 2:22






                • 1





                  @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

                  – Quang Hoang
                  May 12 at 2:35













                4












                4








                4







                You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.



                import pandas as pd

                df = pd.DataFrame(
                'name': ['john', 'smith'],
                'id': [1, 2],
                'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
                [['app1', 'v1'], ['app4', 'v4']]]
                )

                dftmp = df.apps.apply(pd.Series).T.melt().dropna()
                dfapp = (dftmp.value
                .apply(pd.Series)
                .set_index(dftmp.variable)
                .rename(columns=0:'app_name', 1:'app_version')
                )

                df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
                # returns:
                name id app_name app_version
                0 john 1 app1 v1
                0 john 1 app2 v2
                0 john 1 app3 v3
                1 smith 2 app1 v1
                1 smith 2 app4 v4





                share|improve this answer













                You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.



                import pandas as pd

                df = pd.DataFrame(
                'name': ['john', 'smith'],
                'id': [1, 2],
                'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
                [['app1', 'v1'], ['app4', 'v4']]]
                )

                dftmp = df.apps.apply(pd.Series).T.melt().dropna()
                dfapp = (dftmp.value
                .apply(pd.Series)
                .set_index(dftmp.variable)
                .rename(columns=0:'app_name', 1:'app_version')
                )

                df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
                # returns:
                name id app_name app_version
                0 john 1 app1 v1
                0 john 1 app2 v2
                0 john 1 app3 v3
                1 smith 2 app1 v1
                1 smith 2 app4 v4






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered May 12 at 1:06









                JamesJames

                14.7k21734




                14.7k21734












                • Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

                  – rafaelc
                  May 12 at 1:13











                • Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

                  – James
                  May 12 at 1:21











                • I have, that's why I commented.

                  – rafaelc
                  May 12 at 2:19






                • 1





                  Wow, thanks. That is like 30% faster.

                  – James
                  May 12 at 2:22






                • 1





                  @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

                  – Quang Hoang
                  May 12 at 2:35

















                • Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

                  – rafaelc
                  May 12 at 1:13











                • Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

                  – James
                  May 12 at 1:21











                • I have, that's why I commented.

                  – rafaelc
                  May 12 at 2:19






                • 1





                  Wow, thanks. That is like 30% faster.

                  – James
                  May 12 at 2:22






                • 1





                  @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

                  – Quang Hoang
                  May 12 at 2:35
















                Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

                – rafaelc
                May 12 at 1:13





                Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

                – rafaelc
                May 12 at 1:13













                Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

                – James
                May 12 at 1:21





                Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

                – James
                May 12 at 1:21













                I have, that's why I commented.

                – rafaelc
                May 12 at 2:19





                I have, that's why I commented.

                – rafaelc
                May 12 at 2:19




                1




                1





                Wow, thanks. That is like 30% faster.

                – James
                May 12 at 2:22





                Wow, thanks. That is like 30% faster.

                – James
                May 12 at 2:22




                1




                1





                @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

                – Quang Hoang
                May 12 at 2:35





                @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

                – Quang Hoang
                May 12 at 2:35













                3














                You can always have a brute force solution. Something like:



                name, id, app_name, app_version = [], [], [], []
                for i in range(len(df)):
                for v in df.loc[i,'apps']:
                app_name.append(v[0])
                app_version.append(v[1])
                name.append(df.loc[i, 'name'])
                id.append(df.loc[i, 'id'])
                df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)


                will do the work.



                Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']






                share|improve this answer




















                • 2





                  Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

                  – rafaelc
                  May 12 at 1:15















                3














                You can always have a brute force solution. Something like:



                name, id, app_name, app_version = [], [], [], []
                for i in range(len(df)):
                for v in df.loc[i,'apps']:
                app_name.append(v[0])
                app_version.append(v[1])
                name.append(df.loc[i, 'name'])
                id.append(df.loc[i, 'id'])
                df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)


                will do the work.



                Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']






                share|improve this answer




















                • 2





                  Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

                  – rafaelc
                  May 12 at 1:15













                3












                3








                3







                You can always have a brute force solution. Something like:



                name, id, app_name, app_version = [], [], [], []
                for i in range(len(df)):
                for v in df.loc[i,'apps']:
                app_name.append(v[0])
                app_version.append(v[1])
                name.append(df.loc[i, 'name'])
                id.append(df.loc[i, 'id'])
                df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)


                will do the work.



                Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']






                share|improve this answer















                You can always have a brute force solution. Something like:



                name, id, app_name, app_version = [], [], [], []
                for i in range(len(df)):
                for v in df.loc[i,'apps']:
                app_name.append(v[0])
                app_version.append(v[1])
                name.append(df.loc[i, 'name'])
                id.append(df.loc[i, 'id'])
                df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)


                will do the work.



                Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited May 12 at 1:14

























                answered May 12 at 1:05









                MaPyMaPy

                32236




                32236







                • 2





                  Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

                  – rafaelc
                  May 12 at 1:15












                • 2





                  Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

                  – rafaelc
                  May 12 at 1:15







                2




                2





                Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

                – rafaelc
                May 12 at 1:15





                Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

                – rafaelc
                May 12 at 1:15











                3














                Another approach would be (should be quite fast too):



                #Repeat the columns without the list by the str length of the list
                m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
                #creating a df exploding the list to 2 columns
                n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
                #concat them together
                df_new=pd.concat([m,n],axis=1)



                 name id app_name app_version
                0 john 1 app1 v1
                1 john 1 app2 v2
                2 john 1 app3 v3
                3 smith 2 app1 v1
                4 smith 2 app4 v4





                share|improve this answer





























                  3














                  Another approach would be (should be quite fast too):



                  #Repeat the columns without the list by the str length of the list
                  m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
                  #creating a df exploding the list to 2 columns
                  n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
                  #concat them together
                  df_new=pd.concat([m,n],axis=1)



                   name id app_name app_version
                  0 john 1 app1 v1
                  1 john 1 app2 v2
                  2 john 1 app3 v3
                  3 smith 2 app1 v1
                  4 smith 2 app4 v4





                  share|improve this answer



























                    3












                    3








                    3







                    Another approach would be (should be quite fast too):



                    #Repeat the columns without the list by the str length of the list
                    m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
                    #creating a df exploding the list to 2 columns
                    n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
                    #concat them together
                    df_new=pd.concat([m,n],axis=1)



                     name id app_name app_version
                    0 john 1 app1 v1
                    1 john 1 app2 v2
                    2 john 1 app3 v3
                    3 smith 2 app1 v1
                    4 smith 2 app4 v4





                    share|improve this answer















                    Another approach would be (should be quite fast too):



                    #Repeat the columns without the list by the str length of the list
                    m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
                    #creating a df exploding the list to 2 columns
                    n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
                    #concat them together
                    df_new=pd.concat([m,n],axis=1)



                     name id app_name app_version
                    0 john 1 app1 v1
                    1 john 1 app2 v2
                    2 john 1 app3 v3
                    3 smith 2 app1 v1
                    4 smith 2 app4 v4






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited May 12 at 4:25

























                    answered May 12 at 4:10









                    anky_91anky_91

                    14k3922




                    14k3922





















                        3














                        Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting



                        df.set_index(['name','id']).apps.apply(pd.Series).
                        stack().apply(pd.Series).
                        reset_index(level=[0,1]).
                        rename(columns=0:'app_name',1:'app_version')
                        Out[541]:
                        name id app_name app_version
                        0 john 1 app1 v1
                        1 john 1 app2 v2
                        2 john 1 app3 v3
                        0 smith 2 app1 v1
                        1 smith 2 app4 v4



                        Method two slightly modify the function I write



                        def unnesting(df, explode):
                        idx = df.index.repeat(df[explode[0]].str.len())
                        df1 = pd.concat([
                        pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
                        df1.index = idx
                        return df1.join(df.drop(explode, 1), how='left')



                        Then



                        yourdf=unnesting(df,['apps'])

                        yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
                        yourdf
                        Out[548]:
                        apps id name app_name app_version
                        0 [app1, v1] 1 john app1 v1
                        0 [app2, v2] 1 john app2 v2
                        0 [app3, v3] 1 john app3 v3
                        1 [app1, v1] 2 smith app1 v1
                        1 [app4, v4] 2 smith app4 v4


                        Or



                        yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
                        yourdf[['app_name','app_version']]=yourdf.apps.tolist()
                        yourdf
                        Out[567]:
                        apps id name app_name app_version
                        0 [app1, v1] 1 john app1 v1
                        0 [app2, v2] 1 john app2 v2
                        0 [app3, v3] 1 john app3 v3
                        1 [app1, v1] 2 smith app1 v1
                        1 [app4, v4] 2 smith app4 v4





                        share|improve this answer





























                          3














                          Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting



                          df.set_index(['name','id']).apps.apply(pd.Series).
                          stack().apply(pd.Series).
                          reset_index(level=[0,1]).
                          rename(columns=0:'app_name',1:'app_version')
                          Out[541]:
                          name id app_name app_version
                          0 john 1 app1 v1
                          1 john 1 app2 v2
                          2 john 1 app3 v3
                          0 smith 2 app1 v1
                          1 smith 2 app4 v4



                          Method two slightly modify the function I write



                          def unnesting(df, explode):
                          idx = df.index.repeat(df[explode[0]].str.len())
                          df1 = pd.concat([
                          pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
                          df1.index = idx
                          return df1.join(df.drop(explode, 1), how='left')



                          Then



                          yourdf=unnesting(df,['apps'])

                          yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
                          yourdf
                          Out[548]:
                          apps id name app_name app_version
                          0 [app1, v1] 1 john app1 v1
                          0 [app2, v2] 1 john app2 v2
                          0 [app3, v3] 1 john app3 v3
                          1 [app1, v1] 2 smith app1 v1
                          1 [app4, v4] 2 smith app4 v4


                          Or



                          yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
                          yourdf[['app_name','app_version']]=yourdf.apps.tolist()
                          yourdf
                          Out[567]:
                          apps id name app_name app_version
                          0 [app1, v1] 1 john app1 v1
                          0 [app2, v2] 1 john app2 v2
                          0 [app3, v3] 1 john app3 v3
                          1 [app1, v1] 2 smith app1 v1
                          1 [app4, v4] 2 smith app4 v4





                          share|improve this answer



























                            3












                            3








                            3







                            Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting



                            df.set_index(['name','id']).apps.apply(pd.Series).
                            stack().apply(pd.Series).
                            reset_index(level=[0,1]).
                            rename(columns=0:'app_name',1:'app_version')
                            Out[541]:
                            name id app_name app_version
                            0 john 1 app1 v1
                            1 john 1 app2 v2
                            2 john 1 app3 v3
                            0 smith 2 app1 v1
                            1 smith 2 app4 v4



                            Method two slightly modify the function I write



                            def unnesting(df, explode):
                            idx = df.index.repeat(df[explode[0]].str.len())
                            df1 = pd.concat([
                            pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
                            df1.index = idx
                            return df1.join(df.drop(explode, 1), how='left')



                            Then



                            yourdf=unnesting(df,['apps'])

                            yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
                            yourdf
                            Out[548]:
                            apps id name app_name app_version
                            0 [app1, v1] 1 john app1 v1
                            0 [app2, v2] 1 john app2 v2
                            0 [app3, v3] 1 john app3 v3
                            1 [app1, v1] 2 smith app1 v1
                            1 [app4, v4] 2 smith app4 v4


                            Or



                            yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
                            yourdf[['app_name','app_version']]=yourdf.apps.tolist()
                            yourdf
                            Out[567]:
                            apps id name app_name app_version
                            0 [app1, v1] 1 john app1 v1
                            0 [app2, v2] 1 john app2 v2
                            0 [app3, v3] 1 john app3 v3
                            1 [app1, v1] 2 smith app1 v1
                            1 [app4, v4] 2 smith app4 v4





                            share|improve this answer















                            Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting



                            df.set_index(['name','id']).apps.apply(pd.Series).
                            stack().apply(pd.Series).
                            reset_index(level=[0,1]).
                            rename(columns=0:'app_name',1:'app_version')
                            Out[541]:
                            name id app_name app_version
                            0 john 1 app1 v1
                            1 john 1 app2 v2
                            2 john 1 app3 v3
                            0 smith 2 app1 v1
                            1 smith 2 app4 v4



                            Method two slightly modify the function I write



                            def unnesting(df, explode):
                            idx = df.index.repeat(df[explode[0]].str.len())
                            df1 = pd.concat([
                            pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
                            df1.index = idx
                            return df1.join(df.drop(explode, 1), how='left')



                            Then



                            yourdf=unnesting(df,['apps'])

                            yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
                            yourdf
                            Out[548]:
                            apps id name app_name app_version
                            0 [app1, v1] 1 john app1 v1
                            0 [app2, v2] 1 john app2 v2
                            0 [app3, v3] 1 john app3 v3
                            1 [app1, v1] 2 smith app1 v1
                            1 [app4, v4] 2 smith app4 v4


                            Or



                            yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
                            yourdf[['app_name','app_version']]=yourdf.apps.tolist()
                            yourdf
                            Out[567]:
                            apps id name app_name app_version
                            0 [app1, v1] 1 john app1 v1
                            0 [app2, v2] 1 john app2 v2
                            0 [app3, v3] 1 john app3 v3
                            1 [app1, v1] 2 smith app1 v1
                            1 [app4, v4] 2 smith app4 v4






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited May 12 at 4:43

























                            answered May 12 at 4:29









                            WeNYoBenWeNYoBen

                            136k84574




                            136k84574





















                                1














                                My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:



                                def expand_row(row):
                                return pd.DataFrame(
                                'name': row['name'], # row.name is the name of the series
                                'id': row['id'],
                                'app_name': [app[0] for app in row.apps],
                                'app_version': [app[1] for app in row.apps]
                                )

                                temp_dfs = df.apply(expand_row, axis=1).tolist()
                                expanded = pd.concat(temp_dfs)
                                expanded = expanded.reset_index() # put index in the correct order

                                print(expanded)

                                # name id app_name app_version
                                # 0 john 1 app1 v1
                                # 1 john 1 app2 v2
                                # 2 john 1 app3 v3
                                # 3 smith 2 app1 v1
                                # 4 smith 2 app4 v4


                                Also, here is a solution using python only, which, if my intuition is correct, should be fast:



                                rows = df.values.tolist()
                                expanded = [[row[0], row[1], app[0], app[1]]
                                for row in rows
                                for app in row[2]]
                                df = pd.DataFrame(
                                expanded, columns=['name', 'id', 'app_name', 'app_version'])

                                # name id app_name app_version
                                # 0 john 1 app1 v1
                                # 1 john 1 app2 v2
                                # 2 john 1 app3 v3
                                # 3 smith 2 app1 v1
                                # 4 smith 2 app4 v4





                                share|improve this answer





























                                  1














                                  My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:



                                  def expand_row(row):
                                  return pd.DataFrame(
                                  'name': row['name'], # row.name is the name of the series
                                  'id': row['id'],
                                  'app_name': [app[0] for app in row.apps],
                                  'app_version': [app[1] for app in row.apps]
                                  )

                                  temp_dfs = df.apply(expand_row, axis=1).tolist()
                                  expanded = pd.concat(temp_dfs)
                                  expanded = expanded.reset_index() # put index in the correct order

                                  print(expanded)

                                  # name id app_name app_version
                                  # 0 john 1 app1 v1
                                  # 1 john 1 app2 v2
                                  # 2 john 1 app3 v3
                                  # 3 smith 2 app1 v1
                                  # 4 smith 2 app4 v4


                                  Also, here is a solution using python only, which, if my intuition is correct, should be fast:



                                  rows = df.values.tolist()
                                  expanded = [[row[0], row[1], app[0], app[1]]
                                  for row in rows
                                  for app in row[2]]
                                  df = pd.DataFrame(
                                  expanded, columns=['name', 'id', 'app_name', 'app_version'])

                                  # name id app_name app_version
                                  # 0 john 1 app1 v1
                                  # 1 john 1 app2 v2
                                  # 2 john 1 app3 v3
                                  # 3 smith 2 app1 v1
                                  # 4 smith 2 app4 v4





                                  share|improve this answer



























                                    1












                                    1








                                    1







                                    My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:



                                    def expand_row(row):
                                    return pd.DataFrame(
                                    'name': row['name'], # row.name is the name of the series
                                    'id': row['id'],
                                    'app_name': [app[0] for app in row.apps],
                                    'app_version': [app[1] for app in row.apps]
                                    )

                                    temp_dfs = df.apply(expand_row, axis=1).tolist()
                                    expanded = pd.concat(temp_dfs)
                                    expanded = expanded.reset_index() # put index in the correct order

                                    print(expanded)

                                    # name id app_name app_version
                                    # 0 john 1 app1 v1
                                    # 1 john 1 app2 v2
                                    # 2 john 1 app3 v3
                                    # 3 smith 2 app1 v1
                                    # 4 smith 2 app4 v4


                                    Also, here is a solution using python only, which, if my intuition is correct, should be fast:



                                    rows = df.values.tolist()
                                    expanded = [[row[0], row[1], app[0], app[1]]
                                    for row in rows
                                    for app in row[2]]
                                    df = pd.DataFrame(
                                    expanded, columns=['name', 'id', 'app_name', 'app_version'])

                                    # name id app_name app_version
                                    # 0 john 1 app1 v1
                                    # 1 john 1 app2 v2
                                    # 2 john 1 app3 v3
                                    # 3 smith 2 app1 v1
                                    # 4 smith 2 app4 v4





                                    share|improve this answer















                                    My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:



                                    def expand_row(row):
                                    return pd.DataFrame(
                                    'name': row['name'], # row.name is the name of the series
                                    'id': row['id'],
                                    'app_name': [app[0] for app in row.apps],
                                    'app_version': [app[1] for app in row.apps]
                                    )

                                    temp_dfs = df.apply(expand_row, axis=1).tolist()
                                    expanded = pd.concat(temp_dfs)
                                    expanded = expanded.reset_index() # put index in the correct order

                                    print(expanded)

                                    # name id app_name app_version
                                    # 0 john 1 app1 v1
                                    # 1 john 1 app2 v2
                                    # 2 john 1 app3 v3
                                    # 3 smith 2 app1 v1
                                    # 4 smith 2 app4 v4


                                    Also, here is a solution using python only, which, if my intuition is correct, should be fast:



                                    rows = df.values.tolist()
                                    expanded = [[row[0], row[1], app[0], app[1]]
                                    for row in rows
                                    for app in row[2]]
                                    df = pd.DataFrame(
                                    expanded, columns=['name', 'id', 'app_name', 'app_version'])

                                    # name id app_name app_version
                                    # 0 john 1 app1 v1
                                    # 1 john 1 app2 v2
                                    # 2 john 1 app3 v3
                                    # 3 smith 2 app1 v1
                                    # 4 smith 2 app4 v4






                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited May 12 at 13:11

























                                    answered May 12 at 1:14









                                    araraonlineararaonline

                                    705313




                                    705313



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56095142%2fpython-pandas-expand-a-column-of-list-of-lists-to-two-new-column%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

                                        Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

                                        Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020