Python Pandas Expand a Column of List of Lists to Two New ColumnPandas split column of lists into multiple columnsHow to unnest (explode) a column in a pandas DataFrame?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonPython join: why is it string.join(list) instead of list.join(string)?Getting the last element of a list in PythonHow do I get the number of elements in a list in Python?How do I concatenate two lists in Python?Renaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameSelect rows from a DataFrame based on values in a column in pandas
Why does Mjolnir fall down in Age of Ultron but not in Endgame?
How to let other coworkers know that I don't share my coworker's political views?
Ingress filtering on edge routers and performance concerns
Is Jon Snow the last of his House?
Where's this lookout in Nova Scotia?
How to cut a climbing rope?
Why did Jon Snow do this immoral act if he is so honorable?
How to patch glass cuts in a bicycle tire?
Find the three digit Prime number P from the given unusual relationships
Value of a binomial series
Can a person survive on blood in place of water?
How to respond to upset student?
Popcorn is the only acceptable snack to consume while watching a movie
First Match - awk
How to attach cable mounting points to a bicycle frame?
Is it legal to have an abortion in another state or abroad?
Is the Unsullied name meant to be ironic? How did it come to be?
Is it possible to remotely hack the GPS system and disable GPS service worldwide?
Could a 19.25mm revolver actually exist?
Why are GND pads often only connected by four traces?
What was the idiom for something that we take without a doubt?
Why did Theresa May offer a vote on a second Brexit referendum?
Why were helmets and other body armour not commonplace in the 1800s?
Why most published works in medical imaging try reducing false positives?
Python Pandas Expand a Column of List of Lists to Two New Column
Pandas split column of lists into multiple columnsHow to unnest (explode) a column in a pandas DataFrame?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonPython join: why is it string.join(list) instead of list.join(string)?Getting the last element of a list in PythonHow do I get the number of elements in a list in Python?How do I concatenate two lists in Python?Renaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameSelect rows from a DataFrame based on values in a column in pandas
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a DF which looks like this.
name id apps
john 1 [[app1, v1], [app2, v2], [app3,v3]]
smith 2 [[app1, v1], [app4, v4]]
I want to expand the apps column such that it looks like this.
name id app_name app_version
john 1 app1 v1
john 1 app2 v2
john 1 app3 v3
smith 2 app1 v1
smith 2 app4 v4
Any help is appreciated
python pandas list
add a comment |
I have a DF which looks like this.
name id apps
john 1 [[app1, v1], [app2, v2], [app3,v3]]
smith 2 [[app1, v1], [app4, v4]]
I want to expand the apps column such that it looks like this.
name id app_name app_version
john 1 app1 v1
john 1 app2 v2
john 1 app3 v3
smith 2 app1 v1
smith 2 app4 v4
Any help is appreciated
python pandas list
add a comment |
I have a DF which looks like this.
name id apps
john 1 [[app1, v1], [app2, v2], [app3,v3]]
smith 2 [[app1, v1], [app4, v4]]
I want to expand the apps column such that it looks like this.
name id app_name app_version
john 1 app1 v1
john 1 app2 v2
john 1 app3 v3
smith 2 app1 v1
smith 2 app4 v4
Any help is appreciated
python pandas list
I have a DF which looks like this.
name id apps
john 1 [[app1, v1], [app2, v2], [app3,v3]]
smith 2 [[app1, v1], [app4, v4]]
I want to expand the apps column such that it looks like this.
name id app_name app_version
john 1 app1 v1
john 1 app2 v2
john 1 app3 v3
smith 2 app1 v1
smith 2 app4 v4
Any help is appreciated
python pandas list
python pandas list
asked May 11 at 23:52
ImsaImsa
417525
417525
add a comment |
add a comment |
5 Answers
5
active
oldest
votes
You can .apply(pd.Series)
twice to get what you need as an intermediate step, then merge back to the original dataframe.
import pandas as pd
df = pd.DataFrame(
'name': ['john', 'smith'],
'id': [1, 2],
'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
[['app1', 'v1'], ['app4', 'v4']]]
)
dftmp = df.apps.apply(pd.Series).T.melt().dropna()
dfapp = (dftmp.value
.apply(pd.Series)
.set_index(dftmp.variable)
.rename(columns=0:'app_name', 1:'app_version')
)
df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
# returns:
name id app_name app_version
0 john 1 app1 v1
0 john 1 app2 v2
0 john 1 app3 v3
1 smith 2 app1 v1
1 smith 2 app4 v4
Instead of.apply(pd.Series)
(which is awfully slow), usepd.DataFrame(df.apps.tolist())
– rafaelc
May 12 at 1:13
Either way you are pulling it out of the C-backed API into Python..apply
hides afor
loop, whiletolist
pushes the encapsulated object back to Python. I have not done any tests to see which is faster.
– James
May 12 at 1:21
I have, that's why I commented.
– rafaelc
May 12 at 2:19
1
Wow, thanks. That is like 30% faster.
– James
May 12 at 2:22
1
@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.
– Quang Hoang
May 12 at 2:35
|
show 2 more comments
You can always have a brute force solution. Something like:
name, id, app_name, app_version = [], [], [], []
for i in range(len(df)):
for v in df.loc[i,'apps']:
app_name.append(v[0])
app_version.append(v[1])
name.append(df.loc[i, 'name'])
id.append(df.loc[i, 'id'])
df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)
will do the work.
Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps'])
instead of df.loc[i,'apps']
2
Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !
– rafaelc
May 12 at 1:15
add a comment |
Another approach would be (should be quite fast too):
#Repeat the columns without the list by the str length of the list
m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
#creating a df exploding the list to 2 columns
n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
#concat them together
df_new=pd.concat([m,n],axis=1)
name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
3 smith 2 app1 v1
4 smith 2 app4 v4
add a comment |
Chain of pd.Series
easy to understand, also if you would like know more methods ,check unnesting
df.set_index(['name','id']).apps.apply(pd.Series).
stack().apply(pd.Series).
reset_index(level=[0,1]).
rename(columns=0:'app_name',1:'app_version')
Out[541]:
name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
0 smith 2 app1 v1
1 smith 2 app4 v4
Method two slightly modify the function I write
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
Then
yourdf=unnesting(df,['apps'])
yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
yourdf
Out[548]:
apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4
Or
yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
yourdf[['app_name','app_version']]=yourdf.apps.tolist()
yourdf
Out[567]:
apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4
add a comment |
My suggestion (there may be easier ways) is using DataFrame.apply
alongside pd.concat
:
def expand_row(row):
return pd.DataFrame(
'name': row['name'], # row.name is the name of the series
'id': row['id'],
'app_name': [app[0] for app in row.apps],
'app_version': [app[1] for app in row.apps]
)
temp_dfs = df.apply(expand_row, axis=1).tolist()
expanded = pd.concat(temp_dfs)
expanded = expanded.reset_index() # put index in the correct order
print(expanded)
# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4
Also, here is a solution using python only, which, if my intuition is correct, should be fast:
rows = df.values.tolist()
expanded = [[row[0], row[1], app[0], app[1]]
for row in rows
for app in row[2]]
df = pd.DataFrame(
expanded, columns=['name', 'id', 'app_name', 'app_version'])
# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56095142%2fpython-pandas-expand-a-column-of-list-of-lists-to-two-new-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can .apply(pd.Series)
twice to get what you need as an intermediate step, then merge back to the original dataframe.
import pandas as pd
df = pd.DataFrame(
'name': ['john', 'smith'],
'id': [1, 2],
'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
[['app1', 'v1'], ['app4', 'v4']]]
)
dftmp = df.apps.apply(pd.Series).T.melt().dropna()
dfapp = (dftmp.value
.apply(pd.Series)
.set_index(dftmp.variable)
.rename(columns=0:'app_name', 1:'app_version')
)
df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
# returns:
name id app_name app_version
0 john 1 app1 v1
0 john 1 app2 v2
0 john 1 app3 v3
1 smith 2 app1 v1
1 smith 2 app4 v4
Instead of.apply(pd.Series)
(which is awfully slow), usepd.DataFrame(df.apps.tolist())
– rafaelc
May 12 at 1:13
Either way you are pulling it out of the C-backed API into Python..apply
hides afor
loop, whiletolist
pushes the encapsulated object back to Python. I have not done any tests to see which is faster.
– James
May 12 at 1:21
I have, that's why I commented.
– rafaelc
May 12 at 2:19
1
Wow, thanks. That is like 30% faster.
– James
May 12 at 2:22
1
@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.
– Quang Hoang
May 12 at 2:35
|
show 2 more comments
You can .apply(pd.Series)
twice to get what you need as an intermediate step, then merge back to the original dataframe.
import pandas as pd
df = pd.DataFrame(
'name': ['john', 'smith'],
'id': [1, 2],
'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
[['app1', 'v1'], ['app4', 'v4']]]
)
dftmp = df.apps.apply(pd.Series).T.melt().dropna()
dfapp = (dftmp.value
.apply(pd.Series)
.set_index(dftmp.variable)
.rename(columns=0:'app_name', 1:'app_version')
)
df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
# returns:
name id app_name app_version
0 john 1 app1 v1
0 john 1 app2 v2
0 john 1 app3 v3
1 smith 2 app1 v1
1 smith 2 app4 v4
Instead of.apply(pd.Series)
(which is awfully slow), usepd.DataFrame(df.apps.tolist())
– rafaelc
May 12 at 1:13
Either way you are pulling it out of the C-backed API into Python..apply
hides afor
loop, whiletolist
pushes the encapsulated object back to Python. I have not done any tests to see which is faster.
– James
May 12 at 1:21
I have, that's why I commented.
– rafaelc
May 12 at 2:19
1
Wow, thanks. That is like 30% faster.
– James
May 12 at 2:22
1
@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.
– Quang Hoang
May 12 at 2:35
|
show 2 more comments
You can .apply(pd.Series)
twice to get what you need as an intermediate step, then merge back to the original dataframe.
import pandas as pd
df = pd.DataFrame(
'name': ['john', 'smith'],
'id': [1, 2],
'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
[['app1', 'v1'], ['app4', 'v4']]]
)
dftmp = df.apps.apply(pd.Series).T.melt().dropna()
dfapp = (dftmp.value
.apply(pd.Series)
.set_index(dftmp.variable)
.rename(columns=0:'app_name', 1:'app_version')
)
df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
# returns:
name id app_name app_version
0 john 1 app1 v1
0 john 1 app2 v2
0 john 1 app3 v3
1 smith 2 app1 v1
1 smith 2 app4 v4
You can .apply(pd.Series)
twice to get what you need as an intermediate step, then merge back to the original dataframe.
import pandas as pd
df = pd.DataFrame(
'name': ['john', 'smith'],
'id': [1, 2],
'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
[['app1', 'v1'], ['app4', 'v4']]]
)
dftmp = df.apps.apply(pd.Series).T.melt().dropna()
dfapp = (dftmp.value
.apply(pd.Series)
.set_index(dftmp.variable)
.rename(columns=0:'app_name', 1:'app_version')
)
df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
# returns:
name id app_name app_version
0 john 1 app1 v1
0 john 1 app2 v2
0 john 1 app3 v3
1 smith 2 app1 v1
1 smith 2 app4 v4
answered May 12 at 1:06
JamesJames
14.7k21734
14.7k21734
Instead of.apply(pd.Series)
(which is awfully slow), usepd.DataFrame(df.apps.tolist())
– rafaelc
May 12 at 1:13
Either way you are pulling it out of the C-backed API into Python..apply
hides afor
loop, whiletolist
pushes the encapsulated object back to Python. I have not done any tests to see which is faster.
– James
May 12 at 1:21
I have, that's why I commented.
– rafaelc
May 12 at 2:19
1
Wow, thanks. That is like 30% faster.
– James
May 12 at 2:22
1
@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.
– Quang Hoang
May 12 at 2:35
|
show 2 more comments
Instead of.apply(pd.Series)
(which is awfully slow), usepd.DataFrame(df.apps.tolist())
– rafaelc
May 12 at 1:13
Either way you are pulling it out of the C-backed API into Python..apply
hides afor
loop, whiletolist
pushes the encapsulated object back to Python. I have not done any tests to see which is faster.
– James
May 12 at 1:21
I have, that's why I commented.
– rafaelc
May 12 at 2:19
1
Wow, thanks. That is like 30% faster.
– James
May 12 at 2:22
1
@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.
– Quang Hoang
May 12 at 2:35
Instead of
.apply(pd.Series)
(which is awfully slow), use pd.DataFrame(df.apps.tolist())
– rafaelc
May 12 at 1:13
Instead of
.apply(pd.Series)
(which is awfully slow), use pd.DataFrame(df.apps.tolist())
– rafaelc
May 12 at 1:13
Either way you are pulling it out of the C-backed API into Python.
.apply
hides a for
loop, while tolist
pushes the encapsulated object back to Python. I have not done any tests to see which is faster.– James
May 12 at 1:21
Either way you are pulling it out of the C-backed API into Python.
.apply
hides a for
loop, while tolist
pushes the encapsulated object back to Python. I have not done any tests to see which is faster.– James
May 12 at 1:21
I have, that's why I commented.
– rafaelc
May 12 at 2:19
I have, that's why I commented.
– rafaelc
May 12 at 2:19
1
1
Wow, thanks. That is like 30% faster.
– James
May 12 at 2:22
Wow, thanks. That is like 30% faster.
– James
May 12 at 2:22
1
1
@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.
– Quang Hoang
May 12 at 2:35
@James it's 1.1s vs 900 microseconds, so its like 1000 times faster, which is amazing.
– Quang Hoang
May 12 at 2:35
|
show 2 more comments
You can always have a brute force solution. Something like:
name, id, app_name, app_version = [], [], [], []
for i in range(len(df)):
for v in df.loc[i,'apps']:
app_name.append(v[0])
app_version.append(v[1])
name.append(df.loc[i, 'name'])
id.append(df.loc[i, 'id'])
df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)
will do the work.
Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps'])
instead of df.loc[i,'apps']
2
Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !
– rafaelc
May 12 at 1:15
add a comment |
You can always have a brute force solution. Something like:
name, id, app_name, app_version = [], [], [], []
for i in range(len(df)):
for v in df.loc[i,'apps']:
app_name.append(v[0])
app_version.append(v[1])
name.append(df.loc[i, 'name'])
id.append(df.loc[i, 'id'])
df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)
will do the work.
Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps'])
instead of df.loc[i,'apps']
2
Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !
– rafaelc
May 12 at 1:15
add a comment |
You can always have a brute force solution. Something like:
name, id, app_name, app_version = [], [], [], []
for i in range(len(df)):
for v in df.loc[i,'apps']:
app_name.append(v[0])
app_version.append(v[1])
name.append(df.loc[i, 'name'])
id.append(df.loc[i, 'id'])
df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)
will do the work.
Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps'])
instead of df.loc[i,'apps']
You can always have a brute force solution. Something like:
name, id, app_name, app_version = [], [], [], []
for i in range(len(df)):
for v in df.loc[i,'apps']:
app_name.append(v[0])
app_version.append(v[1])
name.append(df.loc[i, 'name'])
id.append(df.loc[i, 'id'])
df = pd.DataFrame('name': name, 'id': id, 'app_name': app_name, 'app_version': app_version)
will do the work.
Note that I assumed df['apps'] is lists of strings if df['apps'] is strings then you need: eval(df.loc[i,'apps'])
instead of df.loc[i,'apps']
edited May 12 at 1:14
answered May 12 at 1:05
MaPyMaPy
32236
32236
2
Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !
– rafaelc
May 12 at 1:15
add a comment |
2
Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !
– rafaelc
May 12 at 1:15
2
2
Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !
– rafaelc
May 12 at 1:15
Even though this works, it is probably unfeasible for large data frames. In pandas, one for loop is already bad enough, so imagine two nested for loops ;} Always try to avoid direct iteration !
– rafaelc
May 12 at 1:15
add a comment |
Another approach would be (should be quite fast too):
#Repeat the columns without the list by the str length of the list
m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
#creating a df exploding the list to 2 columns
n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
#concat them together
df_new=pd.concat([m,n],axis=1)
name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
3 smith 2 app1 v1
4 smith 2 app4 v4
add a comment |
Another approach would be (should be quite fast too):
#Repeat the columns without the list by the str length of the list
m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
#creating a df exploding the list to 2 columns
n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
#concat them together
df_new=pd.concat([m,n],axis=1)
name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
3 smith 2 app1 v1
4 smith 2 app4 v4
add a comment |
Another approach would be (should be quite fast too):
#Repeat the columns without the list by the str length of the list
m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
#creating a df exploding the list to 2 columns
n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
#concat them together
df_new=pd.concat([m,n],axis=1)
name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
3 smith 2 app1 v1
4 smith 2 app4 v4
Another approach would be (should be quite fast too):
#Repeat the columns without the list by the str length of the list
m=df.drop('apps',1).loc[df.index.repeat(df.apps.str.len())].reset_index(drop=True)
#creating a df exploding the list to 2 columns
n=pd.DataFrame(np.concatenate(df.apps.values),columns=['app_name','app_version'])
#concat them together
df_new=pd.concat([m,n],axis=1)
name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
3 smith 2 app1 v1
4 smith 2 app4 v4
edited May 12 at 4:25
answered May 12 at 4:10
anky_91anky_91
14k3922
14k3922
add a comment |
add a comment |
Chain of pd.Series
easy to understand, also if you would like know more methods ,check unnesting
df.set_index(['name','id']).apps.apply(pd.Series).
stack().apply(pd.Series).
reset_index(level=[0,1]).
rename(columns=0:'app_name',1:'app_version')
Out[541]:
name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
0 smith 2 app1 v1
1 smith 2 app4 v4
Method two slightly modify the function I write
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
Then
yourdf=unnesting(df,['apps'])
yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
yourdf
Out[548]:
apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4
Or
yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
yourdf[['app_name','app_version']]=yourdf.apps.tolist()
yourdf
Out[567]:
apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4
add a comment |
Chain of pd.Series
easy to understand, also if you would like know more methods ,check unnesting
df.set_index(['name','id']).apps.apply(pd.Series).
stack().apply(pd.Series).
reset_index(level=[0,1]).
rename(columns=0:'app_name',1:'app_version')
Out[541]:
name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
0 smith 2 app1 v1
1 smith 2 app4 v4
Method two slightly modify the function I write
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
Then
yourdf=unnesting(df,['apps'])
yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
yourdf
Out[548]:
apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4
Or
yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
yourdf[['app_name','app_version']]=yourdf.apps.tolist()
yourdf
Out[567]:
apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4
add a comment |
Chain of pd.Series
easy to understand, also if you would like know more methods ,check unnesting
df.set_index(['name','id']).apps.apply(pd.Series).
stack().apply(pd.Series).
reset_index(level=[0,1]).
rename(columns=0:'app_name',1:'app_version')
Out[541]:
name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
0 smith 2 app1 v1
1 smith 2 app4 v4
Method two slightly modify the function I write
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
Then
yourdf=unnesting(df,['apps'])
yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
yourdf
Out[548]:
apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4
Or
yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
yourdf[['app_name','app_version']]=yourdf.apps.tolist()
yourdf
Out[567]:
apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4
Chain of pd.Series
easy to understand, also if you would like know more methods ,check unnesting
df.set_index(['name','id']).apps.apply(pd.Series).
stack().apply(pd.Series).
reset_index(level=[0,1]).
rename(columns=0:'app_name',1:'app_version')
Out[541]:
name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
0 smith 2 app1 v1
1 smith 2 app4 v4
Method two slightly modify the function I write
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame(x: sum(df[x].tolist(),[])) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
Then
yourdf=unnesting(df,['apps'])
yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
yourdf
Out[548]:
apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4
Or
yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
yourdf[['app_name','app_version']]=yourdf.apps.tolist()
yourdf
Out[567]:
apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4
edited May 12 at 4:43
answered May 12 at 4:29
WeNYoBenWeNYoBen
136k84574
136k84574
add a comment |
add a comment |
My suggestion (there may be easier ways) is using DataFrame.apply
alongside pd.concat
:
def expand_row(row):
return pd.DataFrame(
'name': row['name'], # row.name is the name of the series
'id': row['id'],
'app_name': [app[0] for app in row.apps],
'app_version': [app[1] for app in row.apps]
)
temp_dfs = df.apply(expand_row, axis=1).tolist()
expanded = pd.concat(temp_dfs)
expanded = expanded.reset_index() # put index in the correct order
print(expanded)
# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4
Also, here is a solution using python only, which, if my intuition is correct, should be fast:
rows = df.values.tolist()
expanded = [[row[0], row[1], app[0], app[1]]
for row in rows
for app in row[2]]
df = pd.DataFrame(
expanded, columns=['name', 'id', 'app_name', 'app_version'])
# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4
add a comment |
My suggestion (there may be easier ways) is using DataFrame.apply
alongside pd.concat
:
def expand_row(row):
return pd.DataFrame(
'name': row['name'], # row.name is the name of the series
'id': row['id'],
'app_name': [app[0] for app in row.apps],
'app_version': [app[1] for app in row.apps]
)
temp_dfs = df.apply(expand_row, axis=1).tolist()
expanded = pd.concat(temp_dfs)
expanded = expanded.reset_index() # put index in the correct order
print(expanded)
# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4
Also, here is a solution using python only, which, if my intuition is correct, should be fast:
rows = df.values.tolist()
expanded = [[row[0], row[1], app[0], app[1]]
for row in rows
for app in row[2]]
df = pd.DataFrame(
expanded, columns=['name', 'id', 'app_name', 'app_version'])
# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4
add a comment |
My suggestion (there may be easier ways) is using DataFrame.apply
alongside pd.concat
:
def expand_row(row):
return pd.DataFrame(
'name': row['name'], # row.name is the name of the series
'id': row['id'],
'app_name': [app[0] for app in row.apps],
'app_version': [app[1] for app in row.apps]
)
temp_dfs = df.apply(expand_row, axis=1).tolist()
expanded = pd.concat(temp_dfs)
expanded = expanded.reset_index() # put index in the correct order
print(expanded)
# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4
Also, here is a solution using python only, which, if my intuition is correct, should be fast:
rows = df.values.tolist()
expanded = [[row[0], row[1], app[0], app[1]]
for row in rows
for app in row[2]]
df = pd.DataFrame(
expanded, columns=['name', 'id', 'app_name', 'app_version'])
# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4
My suggestion (there may be easier ways) is using DataFrame.apply
alongside pd.concat
:
def expand_row(row):
return pd.DataFrame(
'name': row['name'], # row.name is the name of the series
'id': row['id'],
'app_name': [app[0] for app in row.apps],
'app_version': [app[1] for app in row.apps]
)
temp_dfs = df.apply(expand_row, axis=1).tolist()
expanded = pd.concat(temp_dfs)
expanded = expanded.reset_index() # put index in the correct order
print(expanded)
# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4
Also, here is a solution using python only, which, if my intuition is correct, should be fast:
rows = df.values.tolist()
expanded = [[row[0], row[1], app[0], app[1]]
for row in rows
for app in row[2]]
df = pd.DataFrame(
expanded, columns=['name', 'id', 'app_name', 'app_version'])
# name id app_name app_version
# 0 john 1 app1 v1
# 1 john 1 app2 v2
# 2 john 1 app3 v3
# 3 smith 2 app1 v1
# 4 smith 2 app4 v4
edited May 12 at 13:11
answered May 12 at 1:14
araraonlineararaonline
705313
705313
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56095142%2fpython-pandas-expand-a-column-of-list-of-lists-to-two-new-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown