Python Pandas Expand a Column of List of Lists to Two New ColumnPandas split column of lists into multiple columnsHow to unnest (explode) a column in a pandas DataFrame?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonPython join: why is it string.join(list) instead of list.join(string)?Getting the last element of a list in PythonHow do I get the number of elements in a list in Python?How do I concatenate two lists in Python?Renaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameSelect rows from a DataFrame based on values in a column in pandas

Why does Mjolnir fall down in Age of Ultron but not in Endgame?

How to let other coworkers know that I don't share my coworker's political views?

Ingress filtering on edge routers and performance concerns

Is Jon Snow the last of his House?

Where's this lookout in Nova Scotia?

How to cut a climbing rope?

Why did Jon Snow do this immoral act if he is so honorable?

How to patch glass cuts in a bicycle tire?

Find the three digit Prime number P from the given unusual relationships

Value of a binomial series

Can a person survive on blood in place of water?

How to respond to upset student?

Popcorn is the only acceptable snack to consume while watching a movie

First Match - awk

How to attach cable mounting points to a bicycle frame?

Is it legal to have an abortion in another state or abroad?

Is the Unsullied name meant to be ironic? How did it come to be?

Is it possible to remotely hack the GPS system and disable GPS service worldwide?

Could a 19.25mm revolver actually exist?

Why are GND pads often only connected by four traces?

What was the idiom for something that we take without a doubt?

Why did Theresa May offer a vote on a second Brexit referendum?

Why were helmets and other body armour not commonplace in the 1800s?

Why most published works in medical imaging try reducing false positives?



Python Pandas Expand a Column of List of Lists to Two New Column


Pandas split column of lists into multiple columnsHow to unnest (explode) a column in a pandas DataFrame?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonPython join: why is it string.join(list) instead of list.join(string)?Getting the last element of a list in PythonHow do I get the number of elements in a list in Python?How do I concatenate two lists in Python?Renaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameSelect rows from a DataFrame based on values in a column in pandas






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








10















I have a DF which looks like this.



name id apps
john 1 [[app1, v1], [app2, v2], [app3,v3]]
smith 2 [[app1, v1], [app4, v4]]


I want to expand the apps column such that it looks like this.



name id app_name app_version
john 1 app1 v1
john 1 app2 v2
john 1 app3 v3
smith 2 app1 v1
smith 2 app4 v4


Any help is appreciated










share|improve this question




























    10















    I have a DF which looks like this.



    name id apps
    john 1 [[app1, v1], [app2, v2], [app3,v3]]
    smith 2 [[app1, v1], [app4, v4]]


    I want to expand the apps column such that it looks like this.



    name id app_name app_version
    john 1 app1 v1
    john 1 app2 v2
    john 1 app3 v3
    smith 2 app1 v1
    smith 2 app4 v4


    Any help is appreciated










    share|improve this question
























      10












      10








      10








      I have a DF which looks like this.



      name id apps
      john 1 [[app1, v1], [app2, v2], [app3,v3]]
      smith 2 [[app1, v1], [app4, v4]]


      I want to expand the apps column such that it looks like this.



      name id app_name app_version
      john 1 app1 v1
      john 1 app2 v2
      john 1 app3 v3
      smith 2 app1 v1
      smith 2 app4 v4


      Any help is appreciated










      share|improve this question














      I have a DF which looks like this.



      name id apps
      john 1 [[app1, v1], [app2, v2], [app3,v3]]
      smith 2 [[app1, v1], [app4, v4]]


      I want to expand the apps column such that it looks like this.



      name id app_name app_version
      john 1 app1 v1
      john 1 app2 v2
      john 1 app3 v3
      smith 2 app1 v1
      smith 2 app4 v4


      Any help is appreciated







      python pandas list






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked May 11 at 23:52









      ImsaImsa

      417525




      417525






















          5 Answers
          5






          active

          oldest

          votes


















          4














          You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.



          import pandas as pd

          df = pd.DataFrame(
          'name': ['john', 'smith'],
          'id': [1, 2],
          'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
          [['app1', 'v1'], ['app4', 'v4']]]
          )

          dftmp = df.apps.apply(pd.Series).T.melt().dropna()
          dfapp = (dftmp.value
          .apply(pd.Series)
          .set_index(dftmp.variable)
          .rename(columns=0:'app_name', 1:'app_version')
          )

          df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
          # returns:
          name id app_name app_version
          0 john 1 app1 v1
          0 john 1 app2 v2
          0 john 1 app3 v3
          1 smith 2 app1 v1
          1 smith 2 app4 v4





          share|improve this answer























          • Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

            – rafaelc
            May 12 at 1:13











          • Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

            – James
            May 12 at 1:21











          • I have, that's why I commented.

            – rafaelc
            May 12 at 2:19






          • 1





            Wow, thanks. That is like 30% faster.

            – James
            May 12 at 2:22






          • 1





            @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

            – Quang Hoang
            May 12 at 2:35


















          3














          You can always have a brute force solution. Something like:



          name, id, app_name, app_version = [], [], [], []
          for i in range(len(df)):
          for v in df.loc[i,'apps']:
          app_name.append(v[0])
          app_version.append(v[1])
          name.append(df.loc[i, 'name'])
          id.append(df.loc[i, 'id'])
          df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)


          will do the work.



          Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']






          share|improve this answer




















          • 2





            Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

            – rafaelc
            May 12 at 1:15


















          3














          Another approach would be (should be quite fast too):



          #Repeat the columns without the list by the str length of the list
          m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
          #creating a df exploding the list to 2 columns
          n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
          #concat them together
          df_new=pd.concat([m,n],axis=1)



           name id app_name app_version
          0 john 1 app1 v1
          1 john 1 app2 v2
          2 john 1 app3 v3
          3 smith 2 app1 v1
          4 smith 2 app4 v4





          share|improve this answer
































            3














            Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting



            df.set_index(['name','id']).apps.apply(pd.Series).
            stack().apply(pd.Series).
            reset_index(level=[0,1]).
            rename(columns=0:'app_name',1:'app_version')
            Out[541]:
            name id app_name app_version
            0 john 1 app1 v1
            1 john 1 app2 v2
            2 john 1 app3 v3
            0 smith 2 app1 v1
            1 smith 2 app4 v4



            Method two slightly modify the function I write



            def unnesting(df, explode):
            idx = df.index.repeat(df[explode[0]].str.len())
            df1 = pd.concat([
            pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
            df1.index = idx
            return df1.join(df.drop(explode, 1), how='left')



            Then



            yourdf=unnesting(df,['apps'])

            yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
            yourdf
            Out[548]:
            apps id name app_name app_version
            0 [app1, v1] 1 john app1 v1
            0 [app2, v2] 1 john app2 v2
            0 [app3, v3] 1 john app3 v3
            1 [app1, v1] 2 smith app1 v1
            1 [app4, v4] 2 smith app4 v4


            Or



            yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
            yourdf[['app_name','app_version']]=yourdf.apps.tolist()
            yourdf
            Out[567]:
            apps id name app_name app_version
            0 [app1, v1] 1 john app1 v1
            0 [app2, v2] 1 john app2 v2
            0 [app3, v3] 1 john app3 v3
            1 [app1, v1] 2 smith app1 v1
            1 [app4, v4] 2 smith app4 v4





            share|improve this answer
































              1














              My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:



              def expand_row(row):
              return pd.DataFrame(
              'name': row['name'], # row.name is the name of the series
              'id': row['id'],
              'app_name': [app[0] for app in row.apps],
              'app_version': [app[1] for app in row.apps]
              )

              temp_dfs = df.apply(expand_row, axis=1).tolist()
              expanded = pd.concat(temp_dfs)
              expanded = expanded.reset_index() # put index in the correct order

              print(expanded)

              # name id app_name app_version
              # 0 john 1 app1 v1
              # 1 john 1 app2 v2
              # 2 john 1 app3 v3
              # 3 smith 2 app1 v1
              # 4 smith 2 app4 v4


              Also, here is a solution using python only, which, if my intuition is correct, should be fast:



              rows = df.values.tolist()
              expanded = [[row[0], row[1], app[0], app[1]]
              for row in rows
              for app in row[2]]
              df = pd.DataFrame(
              expanded, columns=['name', 'id', 'app_name', 'app_version'])

              # name id app_name app_version
              # 0 john 1 app1 v1
              # 1 john 1 app2 v2
              # 2 john 1 app3 v3
              # 3 smith 2 app1 v1
              # 4 smith 2 app4 v4





              share|improve this answer

























                Your Answer






                StackExchange.ifUsing("editor", function ()
                StackExchange.using("externalEditor", function ()
                StackExchange.using("snippets", function ()
                StackExchange.snippets.init();
                );
                );
                , "code-snippets");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "1"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56095142%2fpython-pandas-expand-a-column-of-list-of-lists-to-two-new-column%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                4














                You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.



                import pandas as pd

                df = pd.DataFrame(
                'name': ['john', 'smith'],
                'id': [1, 2],
                'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
                [['app1', 'v1'], ['app4', 'v4']]]
                )

                dftmp = df.apps.apply(pd.Series).T.melt().dropna()
                dfapp = (dftmp.value
                .apply(pd.Series)
                .set_index(dftmp.variable)
                .rename(columns=0:'app_name', 1:'app_version')
                )

                df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
                # returns:
                name id app_name app_version
                0 john 1 app1 v1
                0 john 1 app2 v2
                0 john 1 app3 v3
                1 smith 2 app1 v1
                1 smith 2 app4 v4





                share|improve this answer























                • Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

                  – rafaelc
                  May 12 at 1:13











                • Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

                  – James
                  May 12 at 1:21











                • I have, that's why I commented.

                  – rafaelc
                  May 12 at 2:19






                • 1





                  Wow, thanks. That is like 30% faster.

                  – James
                  May 12 at 2:22






                • 1





                  @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

                  – Quang Hoang
                  May 12 at 2:35















                4














                You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.



                import pandas as pd

                df = pd.DataFrame(
                'name': ['john', 'smith'],
                'id': [1, 2],
                'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
                [['app1', 'v1'], ['app4', 'v4']]]
                )

                dftmp = df.apps.apply(pd.Series).T.melt().dropna()
                dfapp = (dftmp.value
                .apply(pd.Series)
                .set_index(dftmp.variable)
                .rename(columns=0:'app_name', 1:'app_version')
                )

                df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
                # returns:
                name id app_name app_version
                0 john 1 app1 v1
                0 john 1 app2 v2
                0 john 1 app3 v3
                1 smith 2 app1 v1
                1 smith 2 app4 v4





                share|improve this answer























                • Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

                  – rafaelc
                  May 12 at 1:13











                • Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

                  – James
                  May 12 at 1:21











                • I have, that's why I commented.

                  – rafaelc
                  May 12 at 2:19






                • 1





                  Wow, thanks. That is like 30% faster.

                  – James
                  May 12 at 2:22






                • 1





                  @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

                  – Quang Hoang
                  May 12 at 2:35













                4












                4








                4







                You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.



                import pandas as pd

                df = pd.DataFrame(
                'name': ['john', 'smith'],
                'id': [1, 2],
                'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
                [['app1', 'v1'], ['app4', 'v4']]]
                )

                dftmp = df.apps.apply(pd.Series).T.melt().dropna()
                dfapp = (dftmp.value
                .apply(pd.Series)
                .set_index(dftmp.variable)
                .rename(columns=0:'app_name', 1:'app_version')
                )

                df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
                # returns:
                name id app_name app_version
                0 john 1 app1 v1
                0 john 1 app2 v2
                0 john 1 app3 v3
                1 smith 2 app1 v1
                1 smith 2 app4 v4





                share|improve this answer













                You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.



                import pandas as pd

                df = pd.DataFrame(
                'name': ['john', 'smith'],
                'id': [1, 2],
                'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
                [['app1', 'v1'], ['app4', 'v4']]]
                )

                dftmp = df.apps.apply(pd.Series).T.melt().dropna()
                dfapp = (dftmp.value
                .apply(pd.Series)
                .set_index(dftmp.variable)
                .rename(columns=0:'app_name', 1:'app_version')
                )

                df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
                # returns:
                name id app_name app_version
                0 john 1 app1 v1
                0 john 1 app2 v2
                0 john 1 app3 v3
                1 smith 2 app1 v1
                1 smith 2 app4 v4






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered May 12 at 1:06









                JamesJames

                14.7k21734




                14.7k21734












                • Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

                  – rafaelc
                  May 12 at 1:13











                • Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

                  – James
                  May 12 at 1:21











                • I have, that's why I commented.

                  – rafaelc
                  May 12 at 2:19






                • 1





                  Wow, thanks. That is like 30% faster.

                  – James
                  May 12 at 2:22






                • 1





                  @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

                  – Quang Hoang
                  May 12 at 2:35

















                • Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

                  – rafaelc
                  May 12 at 1:13











                • Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

                  – James
                  May 12 at 1:21











                • I have, that's why I commented.

                  – rafaelc
                  May 12 at 2:19






                • 1





                  Wow, thanks. That is like 30% faster.

                  – James
                  May 12 at 2:22






                • 1





                  @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

                  – Quang Hoang
                  May 12 at 2:35
















                Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

                – rafaelc
                May 12 at 1:13





                Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

                – rafaelc
                May 12 at 1:13













                Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

                – James
                May 12 at 1:21





                Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

                – James
                May 12 at 1:21













                I have, that's why I commented.

                – rafaelc
                May 12 at 2:19





                I have, that's why I commented.

                – rafaelc
                May 12 at 2:19




                1




                1





                Wow, thanks. That is like 30% faster.

                – James
                May 12 at 2:22





                Wow, thanks. That is like 30% faster.

                – James
                May 12 at 2:22




                1




                1





                @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

                – Quang Hoang
                May 12 at 2:35





                @James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

                – Quang Hoang
                May 12 at 2:35













                3














                You can always have a brute force solution. Something like:



                name, id, app_name, app_version = [], [], [], []
                for i in range(len(df)):
                for v in df.loc[i,'apps']:
                app_name.append(v[0])
                app_version.append(v[1])
                name.append(df.loc[i, 'name'])
                id.append(df.loc[i, 'id'])
                df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)


                will do the work.



                Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']






                share|improve this answer




















                • 2





                  Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

                  – rafaelc
                  May 12 at 1:15















                3














                You can always have a brute force solution. Something like:



                name, id, app_name, app_version = [], [], [], []
                for i in range(len(df)):
                for v in df.loc[i,'apps']:
                app_name.append(v[0])
                app_version.append(v[1])
                name.append(df.loc[i, 'name'])
                id.append(df.loc[i, 'id'])
                df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)


                will do the work.



                Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']






                share|improve this answer




















                • 2





                  Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

                  – rafaelc
                  May 12 at 1:15













                3












                3








                3







                You can always have a brute force solution. Something like:



                name, id, app_name, app_version = [], [], [], []
                for i in range(len(df)):
                for v in df.loc[i,'apps']:
                app_name.append(v[0])
                app_version.append(v[1])
                name.append(df.loc[i, 'name'])
                id.append(df.loc[i, 'id'])
                df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)


                will do the work.



                Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']






                share|improve this answer















                You can always have a brute force solution. Something like:



                name, id, app_name, app_version = [], [], [], []
                for i in range(len(df)):
                for v in df.loc[i,'apps']:
                app_name.append(v[0])
                app_version.append(v[1])
                name.append(df.loc[i, 'name'])
                id.append(df.loc[i, 'id'])
                df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)


                will do the work.



                Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited May 12 at 1:14

























                answered May 12 at 1:05









                MaPyMaPy

                32236




                32236







                • 2





                  Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

                  – rafaelc
                  May 12 at 1:15












                • 2





                  Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

                  – rafaelc
                  May 12 at 1:15







                2




                2





                Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

                – rafaelc
                May 12 at 1:15





                Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

                – rafaelc
                May 12 at 1:15











                3














                Another approach would be (should be quite fast too):



                #Repeat the columns without the list by the str length of the list
                m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
                #creating a df exploding the list to 2 columns
                n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
                #concat them together
                df_new=pd.concat([m,n],axis=1)



                 name id app_name app_version
                0 john 1 app1 v1
                1 john 1 app2 v2
                2 john 1 app3 v3
                3 smith 2 app1 v1
                4 smith 2 app4 v4





                share|improve this answer





























                  3














                  Another approach would be (should be quite fast too):



                  #Repeat the columns without the list by the str length of the list
                  m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
                  #creating a df exploding the list to 2 columns
                  n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
                  #concat them together
                  df_new=pd.concat([m,n],axis=1)



                   name id app_name app_version
                  0 john 1 app1 v1
                  1 john 1 app2 v2
                  2 john 1 app3 v3
                  3 smith 2 app1 v1
                  4 smith 2 app4 v4





                  share|improve this answer



























                    3












                    3








                    3







                    Another approach would be (should be quite fast too):



                    #Repeat the columns without the list by the str length of the list
                    m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
                    #creating a df exploding the list to 2 columns
                    n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
                    #concat them together
                    df_new=pd.concat([m,n],axis=1)



                     name id app_name app_version
                    0 john 1 app1 v1
                    1 john 1 app2 v2
                    2 john 1 app3 v3
                    3 smith 2 app1 v1
                    4 smith 2 app4 v4





                    share|improve this answer















                    Another approach would be (should be quite fast too):



                    #Repeat the columns without the list by the str length of the list
                    m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
                    #creating a df exploding the list to 2 columns
                    n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
                    #concat them together
                    df_new=pd.concat([m,n],axis=1)



                     name id app_name app_version
                    0 john 1 app1 v1
                    1 john 1 app2 v2
                    2 john 1 app3 v3
                    3 smith 2 app1 v1
                    4 smith 2 app4 v4






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited May 12 at 4:25

























                    answered May 12 at 4:10









                    anky_91anky_91

                    14k3922




                    14k3922





















                        3














                        Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting



                        df.set_index(['name','id']).apps.apply(pd.Series).
                        stack().apply(pd.Series).
                        reset_index(level=[0,1]).
                        rename(columns=0:'app_name',1:'app_version')
                        Out[541]:
                        name id app_name app_version
                        0 john 1 app1 v1
                        1 john 1 app2 v2
                        2 john 1 app3 v3
                        0 smith 2 app1 v1
                        1 smith 2 app4 v4



                        Method two slightly modify the function I write



                        def unnesting(df, explode):
                        idx = df.index.repeat(df[explode[0]].str.len())
                        df1 = pd.concat([
                        pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
                        df1.index = idx
                        return df1.join(df.drop(explode, 1), how='left')



                        Then



                        yourdf=unnesting(df,['apps'])

                        yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
                        yourdf
                        Out[548]:
                        apps id name app_name app_version
                        0 [app1, v1] 1 john app1 v1
                        0 [app2, v2] 1 john app2 v2
                        0 [app3, v3] 1 john app3 v3
                        1 [app1, v1] 2 smith app1 v1
                        1 [app4, v4] 2 smith app4 v4


                        Or



                        yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
                        yourdf[['app_name','app_version']]=yourdf.apps.tolist()
                        yourdf
                        Out[567]:
                        apps id name app_name app_version
                        0 [app1, v1] 1 john app1 v1
                        0 [app2, v2] 1 john app2 v2
                        0 [app3, v3] 1 john app3 v3
                        1 [app1, v1] 2 smith app1 v1
                        1 [app4, v4] 2 smith app4 v4





                        share|improve this answer





























                          3














                          Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting



                          df.set_index(['name','id']).apps.apply(pd.Series).
                          stack().apply(pd.Series).
                          reset_index(level=[0,1]).
                          rename(columns=0:'app_name',1:'app_version')
                          Out[541]:
                          name id app_name app_version
                          0 john 1 app1 v1
                          1 john 1 app2 v2
                          2 john 1 app3 v3
                          0 smith 2 app1 v1
                          1 smith 2 app4 v4



                          Method two slightly modify the function I write



                          def unnesting(df, explode):
                          idx = df.index.repeat(df[explode[0]].str.len())
                          df1 = pd.concat([
                          pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
                          df1.index = idx
                          return df1.join(df.drop(explode, 1), how='left')



                          Then



                          yourdf=unnesting(df,['apps'])

                          yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
                          yourdf
                          Out[548]:
                          apps id name app_name app_version
                          0 [app1, v1] 1 john app1 v1
                          0 [app2, v2] 1 john app2 v2
                          0 [app3, v3] 1 john app3 v3
                          1 [app1, v1] 2 smith app1 v1
                          1 [app4, v4] 2 smith app4 v4


                          Or



                          yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
                          yourdf[['app_name','app_version']]=yourdf.apps.tolist()
                          yourdf
                          Out[567]:
                          apps id name app_name app_version
                          0 [app1, v1] 1 john app1 v1
                          0 [app2, v2] 1 john app2 v2
                          0 [app3, v3] 1 john app3 v3
                          1 [app1, v1] 2 smith app1 v1
                          1 [app4, v4] 2 smith app4 v4





                          share|improve this answer



























                            3












                            3








                            3







                            Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting



                            df.set_index(['name','id']).apps.apply(pd.Series).
                            stack().apply(pd.Series).
                            reset_index(level=[0,1]).
                            rename(columns=0:'app_name',1:'app_version')
                            Out[541]:
                            name id app_name app_version
                            0 john 1 app1 v1
                            1 john 1 app2 v2
                            2 john 1 app3 v3
                            0 smith 2 app1 v1
                            1 smith 2 app4 v4



                            Method two slightly modify the function I write



                            def unnesting(df, explode):
                            idx = df.index.repeat(df[explode[0]].str.len())
                            df1 = pd.concat([
                            pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
                            df1.index = idx
                            return df1.join(df.drop(explode, 1), how='left')



                            Then



                            yourdf=unnesting(df,['apps'])

                            yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
                            yourdf
                            Out[548]:
                            apps id name app_name app_version
                            0 [app1, v1] 1 john app1 v1
                            0 [app2, v2] 1 john app2 v2
                            0 [app3, v3] 1 john app3 v3
                            1 [app1, v1] 2 smith app1 v1
                            1 [app4, v4] 2 smith app4 v4


                            Or



                            yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
                            yourdf[['app_name','app_version']]=yourdf.apps.tolist()
                            yourdf
                            Out[567]:
                            apps id name app_name app_version
                            0 [app1, v1] 1 john app1 v1
                            0 [app2, v2] 1 john app2 v2
                            0 [app3, v3] 1 john app3 v3
                            1 [app1, v1] 2 smith app1 v1
                            1 [app4, v4] 2 smith app4 v4





                            share|improve this answer















                            Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting



                            df.set_index(['name','id']).apps.apply(pd.Series).
                            stack().apply(pd.Series).
                            reset_index(level=[0,1]).
                            rename(columns=0:'app_name',1:'app_version')
                            Out[541]:
                            name id app_name app_version
                            0 john 1 app1 v1
                            1 john 1 app2 v2
                            2 john 1 app3 v3
                            0 smith 2 app1 v1
                            1 smith 2 app4 v4



                            Method two slightly modify the function I write



                            def unnesting(df, explode):
                            idx = df.index.repeat(df[explode[0]].str.len())
                            df1 = pd.concat([
                            pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
                            df1.index = idx
                            return df1.join(df.drop(explode, 1), how='left')



                            Then



                            yourdf=unnesting(df,['apps'])

                            yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
                            yourdf
                            Out[548]:
                            apps id name app_name app_version
                            0 [app1, v1] 1 john app1 v1
                            0 [app2, v2] 1 john app2 v2
                            0 [app3, v3] 1 john app3 v3
                            1 [app1, v1] 2 smith app1 v1
                            1 [app4, v4] 2 smith app4 v4


                            Or



                            yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
                            yourdf[['app_name','app_version']]=yourdf.apps.tolist()
                            yourdf
                            Out[567]:
                            apps id name app_name app_version
                            0 [app1, v1] 1 john app1 v1
                            0 [app2, v2] 1 john app2 v2
                            0 [app3, v3] 1 john app3 v3
                            1 [app1, v1] 2 smith app1 v1
                            1 [app4, v4] 2 smith app4 v4






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited May 12 at 4:43

























                            answered May 12 at 4:29









                            WeNYoBenWeNYoBen

                            136k84574




                            136k84574





















                                1














                                My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:



                                def expand_row(row):
                                return pd.DataFrame(
                                'name': row['name'], # row.name is the name of the series
                                'id': row['id'],
                                'app_name': [app[0] for app in row.apps],
                                'app_version': [app[1] for app in row.apps]
                                )

                                temp_dfs = df.apply(expand_row, axis=1).tolist()
                                expanded = pd.concat(temp_dfs)
                                expanded = expanded.reset_index() # put index in the correct order

                                print(expanded)

                                # name id app_name app_version
                                # 0 john 1 app1 v1
                                # 1 john 1 app2 v2
                                # 2 john 1 app3 v3
                                # 3 smith 2 app1 v1
                                # 4 smith 2 app4 v4


                                Also, here is a solution using python only, which, if my intuition is correct, should be fast:



                                rows = df.values.tolist()
                                expanded = [[row[0], row[1], app[0], app[1]]
                                for row in rows
                                for app in row[2]]
                                df = pd.DataFrame(
                                expanded, columns=['name', 'id', 'app_name', 'app_version'])

                                # name id app_name app_version
                                # 0 john 1 app1 v1
                                # 1 john 1 app2 v2
                                # 2 john 1 app3 v3
                                # 3 smith 2 app1 v1
                                # 4 smith 2 app4 v4





                                share|improve this answer





























                                  1














                                  My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:



                                  def expand_row(row):
                                  return pd.DataFrame(
                                  'name': row['name'], # row.name is the name of the series
                                  'id': row['id'],
                                  'app_name': [app[0] for app in row.apps],
                                  'app_version': [app[1] for app in row.apps]
                                  )

                                  temp_dfs = df.apply(expand_row, axis=1).tolist()
                                  expanded = pd.concat(temp_dfs)
                                  expanded = expanded.reset_index() # put index in the correct order

                                  print(expanded)

                                  # name id app_name app_version
                                  # 0 john 1 app1 v1
                                  # 1 john 1 app2 v2
                                  # 2 john 1 app3 v3
                                  # 3 smith 2 app1 v1
                                  # 4 smith 2 app4 v4


                                  Also, here is a solution using python only, which, if my intuition is correct, should be fast:



                                  rows = df.values.tolist()
                                  expanded = [[row[0], row[1], app[0], app[1]]
                                  for row in rows
                                  for app in row[2]]
                                  df = pd.DataFrame(
                                  expanded, columns=['name', 'id', 'app_name', 'app_version'])

                                  # name id app_name app_version
                                  # 0 john 1 app1 v1
                                  # 1 john 1 app2 v2
                                  # 2 john 1 app3 v3
                                  # 3 smith 2 app1 v1
                                  # 4 smith 2 app4 v4





                                  share|improve this answer



























                                    1












                                    1








                                    1







                                    My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:



                                    def expand_row(row):
                                    return pd.DataFrame(
                                    'name': row['name'], # row.name is the name of the series
                                    'id': row['id'],
                                    'app_name': [app[0] for app in row.apps],
                                    'app_version': [app[1] for app in row.apps]
                                    )

                                    temp_dfs = df.apply(expand_row, axis=1).tolist()
                                    expanded = pd.concat(temp_dfs)
                                    expanded = expanded.reset_index() # put index in the correct order

                                    print(expanded)

                                    # name id app_name app_version
                                    # 0 john 1 app1 v1
                                    # 1 john 1 app2 v2
                                    # 2 john 1 app3 v3
                                    # 3 smith 2 app1 v1
                                    # 4 smith 2 app4 v4


                                    Also, here is a solution using python only, which, if my intuition is correct, should be fast:



                                    rows = df.values.tolist()
                                    expanded = [[row[0], row[1], app[0], app[1]]
                                    for row in rows
                                    for app in row[2]]
                                    df = pd.DataFrame(
                                    expanded, columns=['name', 'id', 'app_name', 'app_version'])

                                    # name id app_name app_version
                                    # 0 john 1 app1 v1
                                    # 1 john 1 app2 v2
                                    # 2 john 1 app3 v3
                                    # 3 smith 2 app1 v1
                                    # 4 smith 2 app4 v4





                                    share|improve this answer















                                    My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:



                                    def expand_row(row):
                                    return pd.DataFrame(
                                    'name': row['name'], # row.name is the name of the series
                                    'id': row['id'],
                                    'app_name': [app[0] for app in row.apps],
                                    'app_version': [app[1] for app in row.apps]
                                    )

                                    temp_dfs = df.apply(expand_row, axis=1).tolist()
                                    expanded = pd.concat(temp_dfs)
                                    expanded = expanded.reset_index() # put index in the correct order

                                    print(expanded)

                                    # name id app_name app_version
                                    # 0 john 1 app1 v1
                                    # 1 john 1 app2 v2
                                    # 2 john 1 app3 v3
                                    # 3 smith 2 app1 v1
                                    # 4 smith 2 app4 v4


                                    Also, here is a solution using python only, which, if my intuition is correct, should be fast:



                                    rows = df.values.tolist()
                                    expanded = [[row[0], row[1], app[0], app[1]]
                                    for row in rows
                                    for app in row[2]]
                                    df = pd.DataFrame(
                                    expanded, columns=['name', 'id', 'app_name', 'app_version'])

                                    # name id app_name app_version
                                    # 0 john 1 app1 v1
                                    # 1 john 1 app2 v2
                                    # 2 john 1 app3 v3
                                    # 3 smith 2 app1 v1
                                    # 4 smith 2 app4 v4






                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited May 12 at 13:11

























                                    answered May 12 at 1:14









                                    araraonlineararaonline

                                    705313




                                    705313



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56095142%2fpython-pandas-expand-a-column-of-list-of-lists-to-two-new-column%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

                                        Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

                                        What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company