Python Pandas Expand a Column of List of Lists to Two New ColumnPandas split column of lists into multiple columnsHow to unnest (explode) a column in a pandas DataFrame?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonPython join: why is it string.join(list) instead of list.join(string)?Getting the last element of a list in PythonHow do I get the number of elements in a list in Python?How do I concatenate two lists in Python?Renaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameSelect rows from a DataFrame based on values in a column in pandas

Why does Mjolnir fall down in Age of Ultron but not in Endgame?

How to let other coworkers know that I don't share my coworker's political views?

Ingress filtering on edge routers and performance concerns

Is Jon Snow the last of his House?

Where's this lookout in Nova Scotia?

How to cut a climbing rope?

Why did Jon Snow do this immoral act if he is so honorable?

How to patch glass cuts in a bicycle tire?

Find the three digit Prime number P from the given unusual relationships

Value of a binomial series

Can a person survive on blood in place of water?

How to respond to upset student?

Popcorn is the only acceptable snack to consume while watching a movie

First Match - awk

How to attach cable mounting points to a bicycle frame?

Is it legal to have an abortion in another state or abroad?

Is the Unsullied name meant to be ironic? How did it come to be?

Is it possible to remotely hack the GPS system and disable GPS service worldwide?

Could a 19.25mm revolver actually exist?

Why are GND pads often only connected by four traces?

What was the idiom for something that we take without a doubt?

Why did Theresa May offer a vote on a second Brexit referendum?

Why were helmets and other body armour not commonplace in the 1800s?

Why most published works in medical imaging try reducing false positives?

Python Pandas Expand a Column of List of Lists to Two New Column

Pandas split column of lists into multiple columnsHow to unnest (explode) a column in a pandas DataFrame?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonPython join: why is it string.join(list) instead of list.join(string)?Getting the last element of a list in PythonHow do I get the number of elements in a list in Python?How do I concatenate two lists in Python?Renaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameSelect rows from a DataFrame based on values in a column in pandas

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I have a DF which looks like this.

name id apps
john 1 [[app1, v1], [app2, v2], [app3,v3]]
smith 2 [[app1, v1], [app4, v4]]

I want to expand the apps column such that it looks like this.

name id app_name app_version
john 1 app1 v1
john 1 app2 v2
john 1 app3 v3
smith 2 app1 v1
smith 2 app4 v4

Any help is appreciated

asked May 11 at 23:52

Imsa

417525

add a comment |

I have a DF which looks like this.

name id apps
john 1 [[app1, v1], [app2, v2], [app3,v3]]
smith 2 [[app1, v1], [app4, v4]]

I want to expand the apps column such that it looks like this.

name id app_name app_version
john 1 app1 v1
john 1 app2 v2
john 1 app3 v3
smith 2 app1 v1
smith 2 app4 v4

Any help is appreciated

asked May 11 at 23:52

Imsa

417525

add a comment |

I have a DF which looks like this.

name id apps
john 1 [[app1, v1], [app2, v2], [app3,v3]]
smith 2 [[app1, v1], [app4, v4]]

I want to expand the apps column such that it looks like this.

name id app_name app_version
john 1 app1 v1
john 1 app2 v2
john 1 app3 v3
smith 2 app1 v1
smith 2 app4 v4

Any help is appreciated

asked May 11 at 23:52

Imsa

417525

I have a DF which looks like this.

name id apps
john 1 [[app1, v1], [app2, v2], [app3,v3]]
smith 2 [[app1, v1], [app4, v4]]

I want to expand the apps column such that it looks like this.

name id app_name app_version
john 1 app1 v1
john 1 app2 v2
john 1 app3 v3
smith 2 app1 v1
smith 2 app4 v4

Any help is appreciated

python pandas list

asked May 11 at 23:52

Imsa

417525

asked May 11 at 23:52

Imsa

417525

asked May 11 at 23:52

Imsa

417525

asked May 11 at 23:52

Imsa

417525

asked May 11 at 23:52

Imsa

417525

add a comment |

5 Answers
5

active

oldest

votes

You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.

import pandas as pd

df = pd.DataFrame(
 'name': ['john', 'smith'],
 'id': [1, 2],
 'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']], 
 [['app1', 'v1'], ['app4', 'v4']]]
)

dftmp = df.apps.apply(pd.Series).T.melt().dropna()
dfapp = (dftmp.value
 .apply(pd.Series)
 .set_index(dftmp.variable)
 .rename(columns=0:'app_name', 1:'app_version')
 )

df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
# returns:
 name id app_name app_version
0 john 1 app1 v1
0 john 1 app2 v2
0 john 1 app3 v3
1 smith 2 app1 v1
1 smith 2 app4 v4

answered May 12 at 1:06

James

14.7k21734

Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

– rafaelc
May 12 at 1:13

Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

– James
May 12 at 1:21

I have, that's why I commented.

– rafaelc
May 12 at 2:19

1

Wow, thanks. That is like 30% faster.

– James
May 12 at 2:22

1

@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

– Quang Hoang
May 12 at 2:35

|
show 2 more comments

You can always have a brute force solution. Something like:

name, id, app_name, app_version = [], [], [], []
for i in range(len(df)):
 for v in df.loc[i,'apps']:
 app_name.append(v[0])
 app_version.append(v[1])
 name.append(df.loc[i, 'name'])
 id.append(df.loc[i, 'id'])
df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)

will do the work.

Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']

edited May 12 at 1:14

answered May 12 at 1:05

MaPy

32236

2

Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

– rafaelc
May 12 at 1:15

add a comment |

Another approach would be (should be quite fast too):

#Repeat the columns without the list by the str length of the list
m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
#creating a df exploding the list to 2 columns
n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
#concat them together
df_new=pd.concat([m,n],axis=1)

 name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
3 smith 2 app1 v1
4 smith 2 app4 v4

edited May 12 at 4:25

answered May 12 at 4:10

anky_91

14k3922

add a comment |

Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting

df.set_index(['name','id']).apps.apply(pd.Series).
 stack().apply(pd.Series).
 reset_index(level=[0,1]).
 rename(columns=0:'app_name',1:'app_version')
Out[541]: 
 name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
0 smith 2 app1 v1
1 smith 2 app4 v4

Method two slightly modify the function I write

def unnesting(df, explode):
 idx = df.index.repeat(df[explode[0]].str.len())
 df1 = pd.concat([
 pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
 df1.index = idx
 return df1.join(df.drop(explode, 1), how='left')

Then

yourdf=unnesting(df,['apps'])

yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
yourdf
Out[548]: 
 apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4

yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
yourdf[['app_name','app_version']]=yourdf.apps.tolist()
yourdf
Out[567]: 
 apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4

edited May 12 at 4:43

answered May 12 at 4:29

WeNYoBen

136k84574

add a comment |

My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:

def expand_row(row):
 return pd.DataFrame(
 'name': row['name'], # row.name is the name of the series
 'id': row['id'],
 'app_name': [app[0] for app in row.apps],
 'app_version': [app[1] for app in row.apps]
 )

temp_dfs = df.apply(expand_row, axis=1).tolist()
expanded = pd.concat(temp_dfs)
expanded = expanded.reset_index() # put index in the correct order

print(expanded)

# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4

Also, here is a solution using python only, which, if my intuition is correct, should be fast:

rows = df.values.tolist()
expanded = [[row[0], row[1], app[0], app[1]]
 for row in rows
 for app in row[2]]
df = pd.DataFrame(
 expanded, columns=['name', 'id', 'app_name', 'app_version'])

# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4

edited May 12 at 13:11

answered May 12 at 1:14

araraonline

705313

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56095142%2fpython-pandas-expand-a-column-of-list-of-lists-to-two-new-column%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.

import pandas as pd

df = pd.DataFrame(
 'name': ['john', 'smith'],
 'id': [1, 2],
 'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']], 
 [['app1', 'v1'], ['app4', 'v4']]]
)

dftmp = df.apps.apply(pd.Series).T.melt().dropna()
dfapp = (dftmp.value
 .apply(pd.Series)
 .set_index(dftmp.variable)
 .rename(columns=0:'app_name', 1:'app_version')
 )

df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
# returns:
 name id app_name app_version
0 john 1 app1 v1
0 john 1 app2 v2
0 john 1 app3 v3
1 smith 2 app1 v1
1 smith 2 app4 v4

answered May 12 at 1:06

James

14.7k21734

Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

– rafaelc
May 12 at 1:13

Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

– James
May 12 at 1:21

I have, that's why I commented.

– rafaelc
May 12 at 2:19

1

Wow, thanks. That is like 30% faster.

– James
May 12 at 2:22

1

@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

– Quang Hoang
May 12 at 2:35

|
show 2 more comments

You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.

import pandas as pd

df = pd.DataFrame(
 'name': ['john', 'smith'],
 'id': [1, 2],
 'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']], 
 [['app1', 'v1'], ['app4', 'v4']]]
)

dftmp = df.apps.apply(pd.Series).T.melt().dropna()
dfapp = (dftmp.value
 .apply(pd.Series)
 .set_index(dftmp.variable)
 .rename(columns=0:'app_name', 1:'app_version')
 )

df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
# returns:
 name id app_name app_version
0 john 1 app1 v1
0 john 1 app2 v2
0 john 1 app3 v3
1 smith 2 app1 v1
1 smith 2 app4 v4

answered May 12 at 1:06

James

14.7k21734

Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

– rafaelc
May 12 at 1:13

Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

– James
May 12 at 1:21

I have, that's why I commented.

– rafaelc
May 12 at 2:19

1

Wow, thanks. That is like 30% faster.

– James
May 12 at 2:22

1

@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

– Quang Hoang
May 12 at 2:35

|
show 2 more comments

You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.

import pandas as pd

df = pd.DataFrame(
 'name': ['john', 'smith'],
 'id': [1, 2],
 'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']], 
 [['app1', 'v1'], ['app4', 'v4']]]
)

dftmp = df.apps.apply(pd.Series).T.melt().dropna()
dfapp = (dftmp.value
 .apply(pd.Series)
 .set_index(dftmp.variable)
 .rename(columns=0:'app_name', 1:'app_version')
 )

df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
# returns:
 name id app_name app_version
0 john 1 app1 v1
0 john 1 app2 v2
0 john 1 app3 v3
1 smith 2 app1 v1
1 smith 2 app4 v4

answered May 12 at 1:06

James

14.7k21734

You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.

import pandas as pd

df = pd.DataFrame(
 'name': ['john', 'smith'],
 'id': [1, 2],
 'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']], 
 [['app1', 'v1'], ['app4', 'v4']]]
)

dftmp = df.apps.apply(pd.Series).T.melt().dropna()
dfapp = (dftmp.value
 .apply(pd.Series)
 .set_index(dftmp.variable)
 .rename(columns=0:'app_name', 1:'app_version')
 )

df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
# returns:
 name id app_name app_version
0 john 1 app1 v1
0 john 1 app2 v2
0 john 1 app3 v3
1 smith 2 app1 v1
1 smith 2 app4 v4

answered May 12 at 1:06

James

14.7k21734

answered May 12 at 1:06

James

14.7k21734

answered May 12 at 1:06

James

14.7k21734

answered May 12 at 1:06

James

14.7k21734

Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

– rafaelc
May 12 at 1:13

Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

– James
May 12 at 1:21

I have, that's why I commented.

– rafaelc
May 12 at 2:19

1

Wow, thanks. That is like 30% faster.

– James
May 12 at 2:22

1

@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

– Quang Hoang
May 12 at 2:35

|
show 2 more comments

Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

– rafaelc
May 12 at 1:13

Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

– James
May 12 at 1:21

I have, that's why I commented.

– rafaelc
May 12 at 2:19

1

Wow, thanks. That is like 30% faster.

– James
May 12 at 2:22

1

@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

– Quang Hoang
May 12 at 2:35

Instead of .apply(pd.Series) (which is awfully slow), use pd.DataFrame(df.apps.tolist())

– rafaelc
May 12 at 1:13

Either way you are pulling it out of the C-backed API into Python. .apply hides a for loop, while tolist pushes the encapsulated object back to Python. I have not done any tests to see which is faster.

– James
May 12 at 1:21

I have, that's why I commented.

– rafaelc
May 12 at 2:19

Wow, thanks. That is like 30% faster.

– James
May 12 at 2:22

@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.

– Quang Hoang
May 12 at 2:35

|
show 2 more comments

You can always have a brute force solution. Something like:

name, id, app_name, app_version = [], [], [], []
for i in range(len(df)):
 for v in df.loc[i,'apps']:
 app_name.append(v[0])
 app_version.append(v[1])
 name.append(df.loc[i, 'name'])
 id.append(df.loc[i, 'id'])
df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)

will do the work.

Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']

edited May 12 at 1:14

answered May 12 at 1:05

MaPy

32236

2

Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

– rafaelc
May 12 at 1:15

add a comment |

You can always have a brute force solution. Something like:

name, id, app_name, app_version = [], [], [], []
for i in range(len(df)):
 for v in df.loc[i,'apps']:
 app_name.append(v[0])
 app_version.append(v[1])
 name.append(df.loc[i, 'name'])
 id.append(df.loc[i, 'id'])
df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)

will do the work.

Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']

edited May 12 at 1:14

answered May 12 at 1:05

MaPy

32236

2

Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

– rafaelc
May 12 at 1:15

add a comment |

You can always have a brute force solution. Something like:

name, id, app_name, app_version = [], [], [], []
for i in range(len(df)):
 for v in df.loc[i,'apps']:
 app_name.append(v[0])
 app_version.append(v[1])
 name.append(df.loc[i, 'name'])
 id.append(df.loc[i, 'id'])
df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)

will do the work.

Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']

edited May 12 at 1:14

answered May 12 at 1:05

MaPy

32236

You can always have a brute force solution. Something like:

name, id, app_name, app_version = [], [], [], []
for i in range(len(df)):
 for v in df.loc[i,'apps']:
 app_name.append(v[0])
 app_version.append(v[1])
 name.append(df.loc[i, 'name'])
 id.append(df.loc[i, 'id'])
df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)

will do the work.

Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps']) instead of df.loc[i,'apps']

edited May 12 at 1:14

answered May 12 at 1:05

MaPy

32236

edited May 12 at 1:14

answered May 12 at 1:05

MaPy

32236

answered May 12 at 1:05

MaPy

32236

answered May 12 at 1:05

MaPy

32236

2

Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

– rafaelc
May 12 at 1:15

add a comment |

2

Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

– rafaelc
May 12 at 1:15

Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !

– rafaelc
May 12 at 1:15

add a comment |

Another approach would be (should be quite fast too):

#Repeat the columns without the list by the str length of the list
m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
#creating a df exploding the list to 2 columns
n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
#concat them together
df_new=pd.concat([m,n],axis=1)

 name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
3 smith 2 app1 v1
4 smith 2 app4 v4

edited May 12 at 4:25

answered May 12 at 4:10

anky_91

14k3922

add a comment |

Another approach would be (should be quite fast too):

#Repeat the columns without the list by the str length of the list
m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
#creating a df exploding the list to 2 columns
n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
#concat them together
df_new=pd.concat([m,n],axis=1)

 name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
3 smith 2 app1 v1
4 smith 2 app4 v4

edited May 12 at 4:25

answered May 12 at 4:10

anky_91

14k3922

add a comment |

Another approach would be (should be quite fast too):

#Repeat the columns without the list by the str length of the list
m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
#creating a df exploding the list to 2 columns
n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
#concat them together
df_new=pd.concat([m,n],axis=1)

 name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
3 smith 2 app1 v1
4 smith 2 app4 v4

edited May 12 at 4:25

answered May 12 at 4:10

anky_91

14k3922

Another approach would be (should be quite fast too):

#Repeat the columns without the list by the str length of the list
m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
#creating a df exploding the list to 2 columns
n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
#concat them together
df_new=pd.concat([m,n],axis=1)

 name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
3 smith 2 app1 v1
4 smith 2 app4 v4

edited May 12 at 4:25

answered May 12 at 4:10

anky_91

14k3922

edited May 12 at 4:25

answered May 12 at 4:10

anky_91

14k3922

answered May 12 at 4:10

anky_91

14k3922

answered May 12 at 4:10

anky_91

14k3922

add a comment |

Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting

df.set_index(['name','id']).apps.apply(pd.Series).
 stack().apply(pd.Series).
 reset_index(level=[0,1]).
 rename(columns=0:'app_name',1:'app_version')
Out[541]: 
 name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
0 smith 2 app1 v1
1 smith 2 app4 v4

Method two slightly modify the function I write

def unnesting(df, explode):
 idx = df.index.repeat(df[explode[0]].str.len())
 df1 = pd.concat([
 pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
 df1.index = idx
 return df1.join(df.drop(explode, 1), how='left')

Then

yourdf=unnesting(df,['apps'])

yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
yourdf
Out[548]: 
 apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4

yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
yourdf[['app_name','app_version']]=yourdf.apps.tolist()
yourdf
Out[567]: 
 apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4

edited May 12 at 4:43

answered May 12 at 4:29

WeNYoBen

136k84574

add a comment |

Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting

df.set_index(['name','id']).apps.apply(pd.Series).
 stack().apply(pd.Series).
 reset_index(level=[0,1]).
 rename(columns=0:'app_name',1:'app_version')
Out[541]: 
 name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
0 smith 2 app1 v1
1 smith 2 app4 v4

Method two slightly modify the function I write

def unnesting(df, explode):
 idx = df.index.repeat(df[explode[0]].str.len())
 df1 = pd.concat([
 pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
 df1.index = idx
 return df1.join(df.drop(explode, 1), how='left')

Then

yourdf=unnesting(df,['apps'])

yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
yourdf
Out[548]: 
 apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4

yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
yourdf[['app_name','app_version']]=yourdf.apps.tolist()
yourdf
Out[567]: 
 apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4

edited May 12 at 4:43

answered May 12 at 4:29

WeNYoBen

136k84574

add a comment |

Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting

df.set_index(['name','id']).apps.apply(pd.Series).
 stack().apply(pd.Series).
 reset_index(level=[0,1]).
 rename(columns=0:'app_name',1:'app_version')
Out[541]: 
 name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
0 smith 2 app1 v1
1 smith 2 app4 v4

Method two slightly modify the function I write

def unnesting(df, explode):
 idx = df.index.repeat(df[explode[0]].str.len())
 df1 = pd.concat([
 pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
 df1.index = idx
 return df1.join(df.drop(explode, 1), how='left')

Then

yourdf=unnesting(df,['apps'])

yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
yourdf
Out[548]: 
 apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4

yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
yourdf[['app_name','app_version']]=yourdf.apps.tolist()
yourdf
Out[567]: 
 apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4

edited May 12 at 4:43

answered May 12 at 4:29

WeNYoBen

136k84574

Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting

df.set_index(['name','id']).apps.apply(pd.Series).
 stack().apply(pd.Series).
 reset_index(level=[0,1]).
 rename(columns=0:'app_name',1:'app_version')
Out[541]: 
 name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
0 smith 2 app1 v1
1 smith 2 app4 v4

Method two slightly modify the function I write

def unnesting(df, explode):
 idx = df.index.repeat(df[explode[0]].str.len())
 df1 = pd.concat([
 pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
 df1.index = idx
 return df1.join(df.drop(explode, 1), how='left')

Then

yourdf=unnesting(df,['apps'])

yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
yourdf
Out[548]: 
 apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4

yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
yourdf[['app_name','app_version']]=yourdf.apps.tolist()
yourdf
Out[567]: 
 apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4

edited May 12 at 4:43

answered May 12 at 4:29

WeNYoBen

136k84574

edited May 12 at 4:43

answered May 12 at 4:29

WeNYoBen

136k84574

answered May 12 at 4:29

WeNYoBen

136k84574

answered May 12 at 4:29

WeNYoBen

136k84574

add a comment |

My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:

def expand_row(row):
 return pd.DataFrame(
 'name': row['name'], # row.name is the name of the series
 'id': row['id'],
 'app_name': [app[0] for app in row.apps],
 'app_version': [app[1] for app in row.apps]
 )

temp_dfs = df.apply(expand_row, axis=1).tolist()
expanded = pd.concat(temp_dfs)
expanded = expanded.reset_index() # put index in the correct order

print(expanded)

# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4

Also, here is a solution using python only, which, if my intuition is correct, should be fast:

rows = df.values.tolist()
expanded = [[row[0], row[1], app[0], app[1]]
 for row in rows
 for app in row[2]]
df = pd.DataFrame(
 expanded, columns=['name', 'id', 'app_name', 'app_version'])

# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4

edited May 12 at 13:11

answered May 12 at 1:14

araraonline

705313

add a comment |

My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:

def expand_row(row):
 return pd.DataFrame(
 'name': row['name'], # row.name is the name of the series
 'id': row['id'],
 'app_name': [app[0] for app in row.apps],
 'app_version': [app[1] for app in row.apps]
 )

temp_dfs = df.apply(expand_row, axis=1).tolist()
expanded = pd.concat(temp_dfs)
expanded = expanded.reset_index() # put index in the correct order

print(expanded)

# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4

Also, here is a solution using python only, which, if my intuition is correct, should be fast:

rows = df.values.tolist()
expanded = [[row[0], row[1], app[0], app[1]]
 for row in rows
 for app in row[2]]
df = pd.DataFrame(
 expanded, columns=['name', 'id', 'app_name', 'app_version'])

# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4

edited May 12 at 13:11

answered May 12 at 1:14

araraonline

705313

add a comment |

My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:

def expand_row(row):
 return pd.DataFrame(
 'name': row['name'], # row.name is the name of the series
 'id': row['id'],
 'app_name': [app[0] for app in row.apps],
 'app_version': [app[1] for app in row.apps]
 )

temp_dfs = df.apply(expand_row, axis=1).tolist()
expanded = pd.concat(temp_dfs)
expanded = expanded.reset_index() # put index in the correct order

print(expanded)

# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4

Also, here is a solution using python only, which, if my intuition is correct, should be fast:

rows = df.values.tolist()
expanded = [[row[0], row[1], app[0], app[1]]
 for row in rows
 for app in row[2]]
df = pd.DataFrame(
 expanded, columns=['name', 'id', 'app_name', 'app_version'])

# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4

edited May 12 at 13:11

answered May 12 at 1:14

araraonline

705313

My suggestion (there may be easier ways) is using DataFrame.apply alongside pd.concat:

def expand_row(row):
 return pd.DataFrame(
 'name': row['name'], # row.name is the name of the series
 'id': row['id'],
 'app_name': [app[0] for app in row.apps],
 'app_version': [app[1] for app in row.apps]
 )

temp_dfs = df.apply(expand_row, axis=1).tolist()
expanded = pd.concat(temp_dfs)
expanded = expanded.reset_index() # put index in the correct order

print(expanded)

# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4

Also, here is a solution using python only, which, if my intuition is correct, should be fast:

rows = df.values.tolist()
expanded = [[row[0], row[1], app[0], app[1]]
 for row in rows
 for app in row[2]]
df = pd.DataFrame(
 expanded, columns=['name', 'id', 'app_name', 'app_version'])

# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4

edited May 12 at 13:11

answered May 12 at 1:14

araraonline

705313

edited May 12 at 13:11

answered May 12 at 1:14

araraonline

705313

answered May 12 at 1:14

araraonline

705313

answered May 12 at 1:14

araraonline

705313

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Otdfbt

5 Answers
5

Your Answer

Post as a guest

5 Answers
5

5 Answers
5

Post as a guest

Popular posts from this blog

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

5 Answers 5

5 Answers 5

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

5 Answers
5

5 Answers
5

5 Answers
5