Fastest way to perform complex search on pandas dataframeAdd one row to pandas DataFrameSelecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow do I get the row count of a pandas DataFrame?How to iterate over rows in a DataFrame in Pandas?Writing a pandas DataFrame to CSV fileSelect rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersConvert list of dictionaries to a pandas DataFrame

About the paper by Buekenhout, Delandtsheer, Doyen, Kleidman, Liebeck and Saxl

When editor does not respond to the request for withdrawal

Are skill challenges an official option or homebrewed?

What is Gilligan's full name?

Am I being scammed by a sugar daddy?

Fastest way from 8 to 7

Is time complexity more important than space complexity?

As easy as Three, Two, One... How fast can you go from Five to Four?

Is the first of the 10 Commandments considered a mitzvah?

Must a CPU have a GPU if the motherboard provides a display port (when there isn't any separate video card)?

Can I use 220 V outlets on a 15 ampere breaker and wire it up as 110 V?

Do they make "karaoke" versions of concertos for solo practice?

Part of my house is inexplicably gone

Can I get a photo of an Ancient Arrow?

What does this line mean in Zelazny's The Courts of Chaos?

Is it true that "only photographers care about noise"?

In American Politics, why is the Justice Department under the President?

Idiom for 'person who gets violent when drunk"

Realistic, logical way for men with medieval-era weaponry to compete with much larger and physically stronger foes

Why is my Taiyaki (Cake that looks like a fish) too hard and dry?

A life of PhD: is it feasible?

Can an open source licence be revoked if it violates employer's IP?

Nth term of Van Eck Sequence

What publication claimed that Michael Jackson died in a nuclear holocaust?

Fastest way to perform complex search on pandas dataframe

Add one row to pandas DataFrameSelecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow do I get the row count of a pandas DataFrame?How to iterate over rows in a DataFrame in Pandas?Writing a pandas DataFrame to CSV fileSelect rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersConvert list of dictionaries to a pandas DataFrame

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I am trying to figure out the fastest way to perform search and sort on a pandas dataframe. Below are before and after dataframes of what I am trying to accomplish.

Before:

flightTo flightFrom toNum fromNum toCode fromCode
 ABC DEF 123 456 8000 8000
 DEF XYZ 456 893 9999 9999
 AAA BBB 473 917 5555 5555
 BBB CCC 917 341 5555 5555

After search/sort:

flightTo flightFrom toNum fromNum toCode fromCode
 ABC XYZ 123 893 8000 9999
 AAA CCC 473 341 5555 5555

In this example I am essentially trying to filter out 'flights' that exist in between end destinations. This should be done by using some sort of drop duplicates method but what leaves me confused is how to handle all of the columns. Would a binary search be the best way to accomplish this? Hints appreciated, trying hard to figure this out.

possible edge case:

What if the data is switched up and our end connections are in the same column?

flight1 flight2 1Num 2Num 1Code 2Code
 ABC DEF 123 456 8000 8000
 XYZ DEF 893 456 9999 9999

After search/sort:

flight1 flight2 1Num 2Num 1Code 2Code
 ABC XYZ 123 893 8000 9999

This case logically shouldn't happen. After all how can you go DEF-ABC and DEF-XYZ? You can't, but the 'endpoints' would still be ABC-XYZ

edited May 28 at 18:40

asked May 28 at 14:07

MaxB

1879

Are the connecting flights always adjacent in the data frame?

– Mike
May 28 at 14:14

np.where(condition)

– Dadu Khan
May 28 at 14:14

how about df['flightFrom'].shift() != df['fightTo']?

– IanS
May 28 at 14:17

@Mike the information can be completely random in the DataFrame

– MaxB
May 28 at 14:18

1

@IanS check the values in fromNum, fromCode expected output, that's what makes this question complex imo.

– Erfan
May 28 at 14:26

add a comment |

I am trying to figure out the fastest way to perform search and sort on a pandas dataframe. Below are before and after dataframes of what I am trying to accomplish.

Before:

flightTo flightFrom toNum fromNum toCode fromCode
 ABC DEF 123 456 8000 8000
 DEF XYZ 456 893 9999 9999
 AAA BBB 473 917 5555 5555
 BBB CCC 917 341 5555 5555

After search/sort:

flightTo flightFrom toNum fromNum toCode fromCode
 ABC XYZ 123 893 8000 9999
 AAA CCC 473 341 5555 5555

possible edge case:

What if the data is switched up and our end connections are in the same column?

flight1 flight2 1Num 2Num 1Code 2Code
 ABC DEF 123 456 8000 8000
 XYZ DEF 893 456 9999 9999

After search/sort:

flight1 flight2 1Num 2Num 1Code 2Code
 ABC XYZ 123 893 8000 9999

This case logically shouldn't happen. After all how can you go DEF-ABC and DEF-XYZ? You can't, but the 'endpoints' would still be ABC-XYZ

edited May 28 at 18:40

asked May 28 at 14:07

MaxB

1879

Are the connecting flights always adjacent in the data frame?

– Mike
May 28 at 14:14

np.where(condition)

– Dadu Khan
May 28 at 14:14

how about df['flightFrom'].shift() != df['fightTo']?

– IanS
May 28 at 14:17

@Mike the information can be completely random in the DataFrame

– MaxB
May 28 at 14:18

1

@IanS check the values in fromNum, fromCode expected output, that's what makes this question complex imo.

– Erfan
May 28 at 14:26

add a comment |

I am trying to figure out the fastest way to perform search and sort on a pandas dataframe. Below are before and after dataframes of what I am trying to accomplish.

Before:

flightTo flightFrom toNum fromNum toCode fromCode
 ABC DEF 123 456 8000 8000
 DEF XYZ 456 893 9999 9999
 AAA BBB 473 917 5555 5555
 BBB CCC 917 341 5555 5555

After search/sort:

flightTo flightFrom toNum fromNum toCode fromCode
 ABC XYZ 123 893 8000 9999
 AAA CCC 473 341 5555 5555

possible edge case:

What if the data is switched up and our end connections are in the same column?

flight1 flight2 1Num 2Num 1Code 2Code
 ABC DEF 123 456 8000 8000
 XYZ DEF 893 456 9999 9999

After search/sort:

flight1 flight2 1Num 2Num 1Code 2Code
 ABC XYZ 123 893 8000 9999

This case logically shouldn't happen. After all how can you go DEF-ABC and DEF-XYZ? You can't, but the 'endpoints' would still be ABC-XYZ

edited May 28 at 18:40

asked May 28 at 14:07

MaxB

1879

I am trying to figure out the fastest way to perform search and sort on a pandas dataframe. Below are before and after dataframes of what I am trying to accomplish.

Before:

flightTo flightFrom toNum fromNum toCode fromCode
 ABC DEF 123 456 8000 8000
 DEF XYZ 456 893 9999 9999
 AAA BBB 473 917 5555 5555
 BBB CCC 917 341 5555 5555

After search/sort:

flightTo flightFrom toNum fromNum toCode fromCode
 ABC XYZ 123 893 8000 9999
 AAA CCC 473 341 5555 5555

possible edge case:

What if the data is switched up and our end connections are in the same column?

flight1 flight2 1Num 2Num 1Code 2Code
 ABC DEF 123 456 8000 8000
 XYZ DEF 893 456 9999 9999

After search/sort:

flight1 flight2 1Num 2Num 1Code 2Code
 ABC XYZ 123 893 8000 9999

This case logically shouldn't happen. After all how can you go DEF-ABC and DEF-XYZ? You can't, but the 'endpoints' would still be ABC-XYZ

python pandas binary-search-tree

edited May 28 at 18:40

asked May 28 at 14:07

MaxB

1879

edited May 28 at 18:40

asked May 28 at 14:07

MaxB

1879

edited May 28 at 18:40

asked May 28 at 14:07

MaxB

1879

asked May 28 at 14:07

MaxB

1879

asked May 28 at 14:07

MaxB

1879

Are the connecting flights always adjacent in the data frame?

– Mike
May 28 at 14:14

np.where(condition)

– Dadu Khan
May 28 at 14:14

how about df['flightFrom'].shift() != df['fightTo']?

– IanS
May 28 at 14:17

@Mike the information can be completely random in the DataFrame

– MaxB
May 28 at 14:18

1

@IanS check the values in fromNum, fromCode expected output, that's what makes this question complex imo.

– Erfan
May 28 at 14:26

add a comment |

Are the connecting flights always adjacent in the data frame?

– Mike
May 28 at 14:14

np.where(condition)

– Dadu Khan
May 28 at 14:14

how about df['flightFrom'].shift() != df['fightTo']?

– IanS
May 28 at 14:17

@Mike the information can be completely random in the DataFrame

– MaxB
May 28 at 14:18

1

@IanS check the values in fromNum, fromCode expected output, that's what makes this question complex imo.

– Erfan
May 28 at 14:26

Are the connecting flights always adjacent in the data frame?

– Mike
May 28 at 14:14

np.where(condition)

– Dadu Khan
May 28 at 14:14

how about df['flightFrom'].shift() != df['fightTo']?

– IanS
May 28 at 14:17

@Mike the information can be completely random in the DataFrame

– MaxB
May 28 at 14:18

@IanS check the values in fromNum, fromCode expected output, that's what makes this question complex imo.

– Erfan
May 28 at 14:26

add a comment |

2 Answers
2

active

oldest

votes

This is network problem , so we using networkx , notice , here you can have more than two stops , which means you can have some case like NY-DC-WA-NC

import networkx as nx
G=nx.from_pandas_edgelist(df, 'flightTo', 'flightFrom')

# create the nx object from pandas dataframe

l=list(nx.connected_components(G))

# then we get the list of components which as tied to each other , 
# in a net work graph , they are linked 
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]

# then from the above we can create our map dict , 
# since every components connected to each other , 
# then we just need to pick of of them as key , then map with others

d=k: v for d in L for k, v in d.items()

# create the dict for groupby , since we need _from as first item and _to as last item 
grouppd=dict(zip(df.columns.tolist(),['first','last']*3))
df.groupby(df.flightTo.map(d)).agg(grouppd) # then using agg with dict yield your output 

Out[22]: 
 flightTo flightFrom toNum fromNum toCode fromCode
flightTo 
0 ABC XYZ 123 893 8000 9999
1 AAA CCC 473 341 5555 5555

Installation networkx

Pip: pip install networkx

Anaconda: conda install -c anaconda networkx

edited May 28 at 18:18

answered May 28 at 14:19

WeNYoBen

139k84878

2

great answer! Looked into networkx couple times, will do more now!

– Erfan
May 28 at 14:21

2

@Erfan love the enthusiasm ;) same here(for networkx)

– anky_91
May 28 at 14:22

2

This answer deserves to be broken down in more explanation :) (so I can learn from it hehe)

– Erfan
May 28 at 14:24

1

@Erfan ok let me working on it

– WeNYoBen
May 28 at 14:24

1

Best answer I have read. Is it possible to edit variables, using information names, instead of letters, and expands the solution. Or best write a post/article on medium(or other place) explaining this methodology

– Prayson W. Daniel
May 28 at 15:40

|
show 5 more comments

Here's a NumPy solution, which might be convenient in the case performance is relevant:

def remove_middle_dest(df):
 x = df.to_numpy()
 # obtain a flat numpy array from both columns
 b = x[:,0:2].ravel()
 _, ix, inv = np.unique(b, return_index=True, return_inverse=True)
 # Index of duplicate values in b
 ixs_drop = np.setdiff1d(np.arange(len(b)), ix) 
 # Indices to be used to replace the content in the columns
 replace_at = (inv[:,None] == inv[ixs_drop]).argmax(0) 
 # Col index of where duplicate value is, 0 or 1
 col = (ixs_drop % 2) ^ 1
 # 2d array to index and replace values in the df
 # index to obtain values with which to replace
 keep_cols = np.broadcast_to([3,5],(len(col),2))
 ixs = np.concatenate([col[:,None], keep_cols], 1)
 # translate indices to row indices
 rows_drop, rows_replace = (ixs_drop // 2), (replace_at // 2)
 c = np.empty((len(col), 5), dtype=x.dtype)
 c[:,::2] = x[rows_drop[:,None], ixs]
 c[:,1::2] = x[rows_replace[:,None], [2,4]]
 # update dataframe and drop rows
 df.iloc[rows_replace, 1:] = c
 return df.drop(rows_drop)

Which fo the proposed dataframe yields the expected output:

print(df)
 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC DEF 123 456 8000 8000
1 DEF XYZ 456 893 9999 9999
2 AAA BBB 473 917 5555 5555
3 BBB CCC 917 341 5555 5555

remove_middle_dest(df)

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC XYZ 123 893 8000 9999
2 AAA CCC 473 341 5555 5555

This approach does not assume any particular order in terms of the rows where the duplicate is, and the same applies to the columns (to cover the edge case described in the question). If we use for instance the following dataframe:

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC DEF 123 456 8000 8000
1 XYZ DEF 893 456 9999 9999
2 AAA BBB 473 917 5555 5555
3 BBB CCC 917 341 5555 5555

remove_middle_dest(df)

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC XYZ 123 456 8000 9999
2 AAA CCC 473 341 5555 5555

edited May 29 at 12:35

answered May 28 at 14:32

yatu

24.4k42252

Would this generalize to the case where the flights are randomly distributed over the dataframe?

– Erfan
May 28 at 14:38

I think the only problem is //2

– WeNYoBen
May 28 at 14:48

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56344082%2ffastest-way-to-perform-complex-search-on-pandas-dataframe%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

This is network problem , so we using networkx , notice , here you can have more than two stops , which means you can have some case like NY-DC-WA-NC

import networkx as nx
G=nx.from_pandas_edgelist(df, 'flightTo', 'flightFrom')

# create the nx object from pandas dataframe

l=list(nx.connected_components(G))

# then we get the list of components which as tied to each other , 
# in a net work graph , they are linked 
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]

# then from the above we can create our map dict , 
# since every components connected to each other , 
# then we just need to pick of of them as key , then map with others

d=k: v for d in L for k, v in d.items()

# create the dict for groupby , since we need _from as first item and _to as last item 
grouppd=dict(zip(df.columns.tolist(),['first','last']*3))
df.groupby(df.flightTo.map(d)).agg(grouppd) # then using agg with dict yield your output 

Out[22]: 
 flightTo flightFrom toNum fromNum toCode fromCode
flightTo 
0 ABC XYZ 123 893 8000 9999
1 AAA CCC 473 341 5555 5555

Installation networkx

Pip: pip install networkx

Anaconda: conda install -c anaconda networkx

edited May 28 at 18:18

answered May 28 at 14:19

WeNYoBen

139k84878

2

great answer! Looked into networkx couple times, will do more now!

– Erfan
May 28 at 14:21

2

@Erfan love the enthusiasm ;) same here(for networkx)

– anky_91
May 28 at 14:22

2

This answer deserves to be broken down in more explanation :) (so I can learn from it hehe)

– Erfan
May 28 at 14:24

1

@Erfan ok let me working on it

– WeNYoBen
May 28 at 14:24

1

Best answer I have read. Is it possible to edit variables, using information names, instead of letters, and expands the solution. Or best write a post/article on medium(or other place) explaining this methodology

– Prayson W. Daniel
May 28 at 15:40

|
show 5 more comments

This is network problem , so we using networkx , notice , here you can have more than two stops , which means you can have some case like NY-DC-WA-NC

import networkx as nx
G=nx.from_pandas_edgelist(df, 'flightTo', 'flightFrom')

# create the nx object from pandas dataframe

l=list(nx.connected_components(G))

# then we get the list of components which as tied to each other , 
# in a net work graph , they are linked 
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]

# then from the above we can create our map dict , 
# since every components connected to each other , 
# then we just need to pick of of them as key , then map with others

d=k: v for d in L for k, v in d.items()

# create the dict for groupby , since we need _from as first item and _to as last item 
grouppd=dict(zip(df.columns.tolist(),['first','last']*3))
df.groupby(df.flightTo.map(d)).agg(grouppd) # then using agg with dict yield your output 

Out[22]: 
 flightTo flightFrom toNum fromNum toCode fromCode
flightTo 
0 ABC XYZ 123 893 8000 9999
1 AAA CCC 473 341 5555 5555

Installation networkx

Pip: pip install networkx

Anaconda: conda install -c anaconda networkx

edited May 28 at 18:18

answered May 28 at 14:19

WeNYoBen

139k84878

2

great answer! Looked into networkx couple times, will do more now!

– Erfan
May 28 at 14:21

2

@Erfan love the enthusiasm ;) same here(for networkx)

– anky_91
May 28 at 14:22

2

This answer deserves to be broken down in more explanation :) (so I can learn from it hehe)

– Erfan
May 28 at 14:24

1

@Erfan ok let me working on it

– WeNYoBen
May 28 at 14:24

1

Best answer I have read. Is it possible to edit variables, using information names, instead of letters, and expands the solution. Or best write a post/article on medium(or other place) explaining this methodology

– Prayson W. Daniel
May 28 at 15:40

|
show 5 more comments

This is network problem , so we using networkx , notice , here you can have more than two stops , which means you can have some case like NY-DC-WA-NC

import networkx as nx
G=nx.from_pandas_edgelist(df, 'flightTo', 'flightFrom')

# create the nx object from pandas dataframe

l=list(nx.connected_components(G))

# then we get the list of components which as tied to each other , 
# in a net work graph , they are linked 
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]

# then from the above we can create our map dict , 
# since every components connected to each other , 
# then we just need to pick of of them as key , then map with others

d=k: v for d in L for k, v in d.items()

# create the dict for groupby , since we need _from as first item and _to as last item 
grouppd=dict(zip(df.columns.tolist(),['first','last']*3))
df.groupby(df.flightTo.map(d)).agg(grouppd) # then using agg with dict yield your output 

Out[22]: 
 flightTo flightFrom toNum fromNum toCode fromCode
flightTo 
0 ABC XYZ 123 893 8000 9999
1 AAA CCC 473 341 5555 5555

Installation networkx

Pip: pip install networkx

Anaconda: conda install -c anaconda networkx

edited May 28 at 18:18

answered May 28 at 14:19

WeNYoBen

139k84878

This is network problem , so we using networkx , notice , here you can have more than two stops , which means you can have some case like NY-DC-WA-NC

import networkx as nx
G=nx.from_pandas_edgelist(df, 'flightTo', 'flightFrom')

# create the nx object from pandas dataframe

l=list(nx.connected_components(G))

# then we get the list of components which as tied to each other , 
# in a net work graph , they are linked 
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]

# then from the above we can create our map dict , 
# since every components connected to each other , 
# then we just need to pick of of them as key , then map with others

d=k: v for d in L for k, v in d.items()

# create the dict for groupby , since we need _from as first item and _to as last item 
grouppd=dict(zip(df.columns.tolist(),['first','last']*3))
df.groupby(df.flightTo.map(d)).agg(grouppd) # then using agg with dict yield your output 

Out[22]: 
 flightTo flightFrom toNum fromNum toCode fromCode
flightTo 
0 ABC XYZ 123 893 8000 9999
1 AAA CCC 473 341 5555 5555

Installation networkx

Pip: pip install networkx

Anaconda: conda install -c anaconda networkx

edited May 28 at 18:18

answered May 28 at 14:19

WeNYoBen

139k84878

edited May 28 at 18:18

answered May 28 at 14:19

WeNYoBen

139k84878

answered May 28 at 14:19

WeNYoBen

139k84878

answered May 28 at 14:19

WeNYoBen

139k84878

2

great answer! Looked into networkx couple times, will do more now!

– Erfan
May 28 at 14:21

2

@Erfan love the enthusiasm ;) same here(for networkx)

– anky_91
May 28 at 14:22

2

This answer deserves to be broken down in more explanation :) (so I can learn from it hehe)

– Erfan
May 28 at 14:24

1

@Erfan ok let me working on it

– WeNYoBen
May 28 at 14:24

1

Best answer I have read. Is it possible to edit variables, using information names, instead of letters, and expands the solution. Or best write a post/article on medium(or other place) explaining this methodology

– Prayson W. Daniel
May 28 at 15:40

|
show 5 more comments

2

great answer! Looked into networkx couple times, will do more now!

– Erfan
May 28 at 14:21

2

@Erfan love the enthusiasm ;) same here(for networkx)

– anky_91
May 28 at 14:22

2

This answer deserves to be broken down in more explanation :) (so I can learn from it hehe)

– Erfan
May 28 at 14:24

1

@Erfan ok let me working on it

– WeNYoBen
May 28 at 14:24

1

Best answer I have read. Is it possible to edit variables, using information names, instead of letters, and expands the solution. Or best write a post/article on medium(or other place) explaining this methodology

– Prayson W. Daniel
May 28 at 15:40

great answer! Looked into networkx couple times, will do more now!

– Erfan
May 28 at 14:21

@Erfan love the enthusiasm ;) same here(for networkx)

– anky_91
May 28 at 14:22

This answer deserves to be broken down in more explanation :) (so I can learn from it hehe)

– Erfan
May 28 at 14:24

@Erfan ok let me working on it

– WeNYoBen
May 28 at 14:24

Best answer I have read. Is it possible to edit variables, using information names, instead of letters, and expands the solution. Or best write a post/article on medium(or other place) explaining this methodology

– Prayson W. Daniel
May 28 at 15:40

|
show 5 more comments

Here's a NumPy solution, which might be convenient in the case performance is relevant:

def remove_middle_dest(df):
 x = df.to_numpy()
 # obtain a flat numpy array from both columns
 b = x[:,0:2].ravel()
 _, ix, inv = np.unique(b, return_index=True, return_inverse=True)
 # Index of duplicate values in b
 ixs_drop = np.setdiff1d(np.arange(len(b)), ix) 
 # Indices to be used to replace the content in the columns
 replace_at = (inv[:,None] == inv[ixs_drop]).argmax(0) 
 # Col index of where duplicate value is, 0 or 1
 col = (ixs_drop % 2) ^ 1
 # 2d array to index and replace values in the df
 # index to obtain values with which to replace
 keep_cols = np.broadcast_to([3,5],(len(col),2))
 ixs = np.concatenate([col[:,None], keep_cols], 1)
 # translate indices to row indices
 rows_drop, rows_replace = (ixs_drop // 2), (replace_at // 2)
 c = np.empty((len(col), 5), dtype=x.dtype)
 c[:,::2] = x[rows_drop[:,None], ixs]
 c[:,1::2] = x[rows_replace[:,None], [2,4]]
 # update dataframe and drop rows
 df.iloc[rows_replace, 1:] = c
 return df.drop(rows_drop)

Which fo the proposed dataframe yields the expected output:

print(df)
 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC DEF 123 456 8000 8000
1 DEF XYZ 456 893 9999 9999
2 AAA BBB 473 917 5555 5555
3 BBB CCC 917 341 5555 5555

remove_middle_dest(df)

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC XYZ 123 893 8000 9999
2 AAA CCC 473 341 5555 5555

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC DEF 123 456 8000 8000
1 XYZ DEF 893 456 9999 9999
2 AAA BBB 473 917 5555 5555
3 BBB CCC 917 341 5555 5555

remove_middle_dest(df)

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC XYZ 123 456 8000 9999
2 AAA CCC 473 341 5555 5555

edited May 29 at 12:35

answered May 28 at 14:32

yatu

24.4k42252

Would this generalize to the case where the flights are randomly distributed over the dataframe?

– Erfan
May 28 at 14:38

I think the only problem is //2

– WeNYoBen
May 28 at 14:48

add a comment |

Here's a NumPy solution, which might be convenient in the case performance is relevant:

def remove_middle_dest(df):
 x = df.to_numpy()
 # obtain a flat numpy array from both columns
 b = x[:,0:2].ravel()
 _, ix, inv = np.unique(b, return_index=True, return_inverse=True)
 # Index of duplicate values in b
 ixs_drop = np.setdiff1d(np.arange(len(b)), ix) 
 # Indices to be used to replace the content in the columns
 replace_at = (inv[:,None] == inv[ixs_drop]).argmax(0) 
 # Col index of where duplicate value is, 0 or 1
 col = (ixs_drop % 2) ^ 1
 # 2d array to index and replace values in the df
 # index to obtain values with which to replace
 keep_cols = np.broadcast_to([3,5],(len(col),2))
 ixs = np.concatenate([col[:,None], keep_cols], 1)
 # translate indices to row indices
 rows_drop, rows_replace = (ixs_drop // 2), (replace_at // 2)
 c = np.empty((len(col), 5), dtype=x.dtype)
 c[:,::2] = x[rows_drop[:,None], ixs]
 c[:,1::2] = x[rows_replace[:,None], [2,4]]
 # update dataframe and drop rows
 df.iloc[rows_replace, 1:] = c
 return df.drop(rows_drop)

Which fo the proposed dataframe yields the expected output:

print(df)
 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC DEF 123 456 8000 8000
1 DEF XYZ 456 893 9999 9999
2 AAA BBB 473 917 5555 5555
3 BBB CCC 917 341 5555 5555

remove_middle_dest(df)

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC XYZ 123 893 8000 9999
2 AAA CCC 473 341 5555 5555

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC DEF 123 456 8000 8000
1 XYZ DEF 893 456 9999 9999
2 AAA BBB 473 917 5555 5555
3 BBB CCC 917 341 5555 5555

remove_middle_dest(df)

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC XYZ 123 456 8000 9999
2 AAA CCC 473 341 5555 5555

edited May 29 at 12:35

answered May 28 at 14:32

yatu

24.4k42252

Would this generalize to the case where the flights are randomly distributed over the dataframe?

– Erfan
May 28 at 14:38

I think the only problem is //2

– WeNYoBen
May 28 at 14:48

add a comment |

Here's a NumPy solution, which might be convenient in the case performance is relevant:

def remove_middle_dest(df):
 x = df.to_numpy()
 # obtain a flat numpy array from both columns
 b = x[:,0:2].ravel()
 _, ix, inv = np.unique(b, return_index=True, return_inverse=True)
 # Index of duplicate values in b
 ixs_drop = np.setdiff1d(np.arange(len(b)), ix) 
 # Indices to be used to replace the content in the columns
 replace_at = (inv[:,None] == inv[ixs_drop]).argmax(0) 
 # Col index of where duplicate value is, 0 or 1
 col = (ixs_drop % 2) ^ 1
 # 2d array to index and replace values in the df
 # index to obtain values with which to replace
 keep_cols = np.broadcast_to([3,5],(len(col),2))
 ixs = np.concatenate([col[:,None], keep_cols], 1)
 # translate indices to row indices
 rows_drop, rows_replace = (ixs_drop // 2), (replace_at // 2)
 c = np.empty((len(col), 5), dtype=x.dtype)
 c[:,::2] = x[rows_drop[:,None], ixs]
 c[:,1::2] = x[rows_replace[:,None], [2,4]]
 # update dataframe and drop rows
 df.iloc[rows_replace, 1:] = c
 return df.drop(rows_drop)

Which fo the proposed dataframe yields the expected output:

print(df)
 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC DEF 123 456 8000 8000
1 DEF XYZ 456 893 9999 9999
2 AAA BBB 473 917 5555 5555
3 BBB CCC 917 341 5555 5555

remove_middle_dest(df)

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC XYZ 123 893 8000 9999
2 AAA CCC 473 341 5555 5555

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC DEF 123 456 8000 8000
1 XYZ DEF 893 456 9999 9999
2 AAA BBB 473 917 5555 5555
3 BBB CCC 917 341 5555 5555

remove_middle_dest(df)

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC XYZ 123 456 8000 9999
2 AAA CCC 473 341 5555 5555

edited May 29 at 12:35

answered May 28 at 14:32

yatu

24.4k42252

Here's a NumPy solution, which might be convenient in the case performance is relevant:

def remove_middle_dest(df):
 x = df.to_numpy()
 # obtain a flat numpy array from both columns
 b = x[:,0:2].ravel()
 _, ix, inv = np.unique(b, return_index=True, return_inverse=True)
 # Index of duplicate values in b
 ixs_drop = np.setdiff1d(np.arange(len(b)), ix) 
 # Indices to be used to replace the content in the columns
 replace_at = (inv[:,None] == inv[ixs_drop]).argmax(0) 
 # Col index of where duplicate value is, 0 or 1
 col = (ixs_drop % 2) ^ 1
 # 2d array to index and replace values in the df
 # index to obtain values with which to replace
 keep_cols = np.broadcast_to([3,5],(len(col),2))
 ixs = np.concatenate([col[:,None], keep_cols], 1)
 # translate indices to row indices
 rows_drop, rows_replace = (ixs_drop // 2), (replace_at // 2)
 c = np.empty((len(col), 5), dtype=x.dtype)
 c[:,::2] = x[rows_drop[:,None], ixs]
 c[:,1::2] = x[rows_replace[:,None], [2,4]]
 # update dataframe and drop rows
 df.iloc[rows_replace, 1:] = c
 return df.drop(rows_drop)

Which fo the proposed dataframe yields the expected output:

print(df)
 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC DEF 123 456 8000 8000
1 DEF XYZ 456 893 9999 9999
2 AAA BBB 473 917 5555 5555
3 BBB CCC 917 341 5555 5555

remove_middle_dest(df)

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC XYZ 123 893 8000 9999
2 AAA CCC 473 341 5555 5555

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC DEF 123 456 8000 8000
1 XYZ DEF 893 456 9999 9999
2 AAA BBB 473 917 5555 5555
3 BBB CCC 917 341 5555 5555

remove_middle_dest(df)

 flightTo flightFrom toNum fromNum toCode fromCode
0 ABC XYZ 123 456 8000 9999
2 AAA CCC 473 341 5555 5555

edited May 29 at 12:35

answered May 28 at 14:32

yatu

24.4k42252

edited May 29 at 12:35

answered May 28 at 14:32

yatu

24.4k42252

answered May 28 at 14:32

yatu

24.4k42252

answered May 28 at 14:32

yatu

24.4k42252

Would this generalize to the case where the flights are randomly distributed over the dataframe?

– Erfan
May 28 at 14:38

I think the only problem is //2

– WeNYoBen
May 28 at 14:48

add a comment |

Would this generalize to the case where the flights are randomly distributed over the dataframe?

– Erfan
May 28 at 14:38

I think the only problem is //2

– WeNYoBen
May 28 at 14:48

Would this generalize to the case where the flights are randomly distributed over the dataframe?

– Erfan
May 28 at 14:38

I think the only problem is //2

– WeNYoBen
May 28 at 14:48

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

A0ZBNi3MbDmf2SQgKPDXKwKaHPXQS8 ShMVrdL6AN9WQ855ydeu sQCN7 WiQvZnlJKs,VWEunL 7Qr

搜尋此網誌

Otdfbt

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O﻿ / ﻿43.24775, -8.60070

2 Answers
2

2 Answers
2

2 Answers
2

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070