Improve Performance of Comparing two Numpy ArraysImplementing F1 scoreDefensive programming type-checkingRecursive function, high performance criticalHints to make Sudoku solver more PythonicMatrix rotation algorithmHackerRank “Nested Lists” CodeStudents with second lowest gradeReturn a minimum number of ranges from a collection of rangesEnsuring performance of sketching/streaming algorithm (countSketch)Concordance index calculation
What do you call bracelets you wear around the legs?
Referring to a character in 3rd person when they have amnesia
Physically unpleasant work environment
What color to choose as "danger" if the main color of my app is red
Why is the S-duct intake on the Tu-154 uniquely oblong?
Why would you put your input amplifier in front of your filtering for an ECG signal?
How to draw pentagram-like shape in Latex?
how to create an executable file for an AppleScript?
Why does string strummed with finger sound different from the one strummed with pick?
Should I twist DC power and ground wires from a power supply?
Taylor series leads to two different functions - why?
Why does Taylor’s series “work”?
Largest memory peripheral for Sinclair ZX81?
on the truth quest vs in the quest for truth
FIFO data structure in pure C
Should all adjustments be random effects in a mixed linear effect?
How come Arya Stark wasn't hurt by this in Game of Thrones Season 8 Episode 5?
Divisor Rich and Poor Numbers
Quotient of Three Dimensional Torus by Permutation on Coordinates
How do we explain the use of a software on a math paper?
Parse a C++14 integer literal
How would fantasy dwarves exist, realistically?
Pedaling at different gear ratios on flat terrain: what's the point?
Who is frowning in the sentence "Daisy looked at Tom frowning"?
Improve Performance of Comparing two Numpy Arrays
Implementing F1 scoreDefensive programming type-checkingRecursive function, high performance criticalHints to make Sudoku solver more PythonicMatrix rotation algorithmHackerRank “Nested Lists” CodeStudents with second lowest gradeReturn a minimum number of ranges from a collection of rangesEnsuring performance of sketching/streaming algorithm (countSketch)Concordance index calculation
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I had a code challenge for a class I'm taking that built a NN algorithm. I got it to work but I used really basic methods for solving it. There are two 1D NP Arrays that have values 0-2 in them, both equal length. They represent two different trains and test data The output is a confusion matrix that shows which received the right predictions and which received the wrong (doesn't matter ;).
This code is correct - I just feel I took the lazy way out working with lists and then turning those lists into a ndarray. I would love to see if people have some tips on maybe utilizing Numpy for this? Anything Clever?
import numpy as np
x = [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0]
y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
testy = np.array(x)
testy_fit = np.array(y)
row_no = [0,0,0]
row_dh = [0,0,0]
row_sl = [0,0,0]
# Code for the first row - NO
for i in range(len(testy)):
if testy.item(i) == 0 and testy_fit.item(i) == 0:
row_no[0] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 1:
row_no[1] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 2:
row_no[2] += 1
# Code for the second row - DH
for i in range(len(testy)):
if testy.item(i) == 1 and testy_fit.item(i) == 0:
row_dh[0] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 1:
row_dh[1] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 2:
row_dh[2] += 1
# Code for the third row - SL
for i in range(len(testy)):
if testy.item(i) == 2 and testy_fit.item(i) == 0:
row_sl[0] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 1:
row_sl[1] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 2:
row_sl[2] += 1
confusion = np.array([row_no,row_dh,row_sl])
print(confusion)
the result of the print is correct as follow:
[[16 10 0]
[ 2 10 0]
[ 2 0 22]]
python numpy
$endgroup$
migrated from stackoverflow.com May 5 at 23:52
This question came from our site for professional and enthusiast programmers.
add a comment |
$begingroup$
I had a code challenge for a class I'm taking that built a NN algorithm. I got it to work but I used really basic methods for solving it. There are two 1D NP Arrays that have values 0-2 in them, both equal length. They represent two different trains and test data The output is a confusion matrix that shows which received the right predictions and which received the wrong (doesn't matter ;).
This code is correct - I just feel I took the lazy way out working with lists and then turning those lists into a ndarray. I would love to see if people have some tips on maybe utilizing Numpy for this? Anything Clever?
import numpy as np
x = [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0]
y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
testy = np.array(x)
testy_fit = np.array(y)
row_no = [0,0,0]
row_dh = [0,0,0]
row_sl = [0,0,0]
# Code for the first row - NO
for i in range(len(testy)):
if testy.item(i) == 0 and testy_fit.item(i) == 0:
row_no[0] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 1:
row_no[1] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 2:
row_no[2] += 1
# Code for the second row - DH
for i in range(len(testy)):
if testy.item(i) == 1 and testy_fit.item(i) == 0:
row_dh[0] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 1:
row_dh[1] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 2:
row_dh[2] += 1
# Code for the third row - SL
for i in range(len(testy)):
if testy.item(i) == 2 and testy_fit.item(i) == 0:
row_sl[0] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 1:
row_sl[1] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 2:
row_sl[2] += 1
confusion = np.array([row_no,row_dh,row_sl])
print(confusion)
the result of the print is correct as follow:
[[16 10 0]
[ 2 10 0]
[ 2 0 22]]
python numpy
$endgroup$
migrated from stackoverflow.com May 5 at 23:52
This question came from our site for professional and enthusiast programmers.
1
$begingroup$
Good thing this got an answer on SO before it was moved. Performance questions fornumpy
are routine on SO.
$endgroup$
– hpaulj
May 6 at 0:15
add a comment |
$begingroup$
I had a code challenge for a class I'm taking that built a NN algorithm. I got it to work but I used really basic methods for solving it. There are two 1D NP Arrays that have values 0-2 in them, both equal length. They represent two different trains and test data The output is a confusion matrix that shows which received the right predictions and which received the wrong (doesn't matter ;).
This code is correct - I just feel I took the lazy way out working with lists and then turning those lists into a ndarray. I would love to see if people have some tips on maybe utilizing Numpy for this? Anything Clever?
import numpy as np
x = [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0]
y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
testy = np.array(x)
testy_fit = np.array(y)
row_no = [0,0,0]
row_dh = [0,0,0]
row_sl = [0,0,0]
# Code for the first row - NO
for i in range(len(testy)):
if testy.item(i) == 0 and testy_fit.item(i) == 0:
row_no[0] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 1:
row_no[1] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 2:
row_no[2] += 1
# Code for the second row - DH
for i in range(len(testy)):
if testy.item(i) == 1 and testy_fit.item(i) == 0:
row_dh[0] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 1:
row_dh[1] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 2:
row_dh[2] += 1
# Code for the third row - SL
for i in range(len(testy)):
if testy.item(i) == 2 and testy_fit.item(i) == 0:
row_sl[0] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 1:
row_sl[1] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 2:
row_sl[2] += 1
confusion = np.array([row_no,row_dh,row_sl])
print(confusion)
the result of the print is correct as follow:
[[16 10 0]
[ 2 10 0]
[ 2 0 22]]
python numpy
$endgroup$
I had a code challenge for a class I'm taking that built a NN algorithm. I got it to work but I used really basic methods for solving it. There are two 1D NP Arrays that have values 0-2 in them, both equal length. They represent two different trains and test data The output is a confusion matrix that shows which received the right predictions and which received the wrong (doesn't matter ;).
This code is correct - I just feel I took the lazy way out working with lists and then turning those lists into a ndarray. I would love to see if people have some tips on maybe utilizing Numpy for this? Anything Clever?
import numpy as np
x = [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0]
y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
testy = np.array(x)
testy_fit = np.array(y)
row_no = [0,0,0]
row_dh = [0,0,0]
row_sl = [0,0,0]
# Code for the first row - NO
for i in range(len(testy)):
if testy.item(i) == 0 and testy_fit.item(i) == 0:
row_no[0] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 1:
row_no[1] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 2:
row_no[2] += 1
# Code for the second row - DH
for i in range(len(testy)):
if testy.item(i) == 1 and testy_fit.item(i) == 0:
row_dh[0] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 1:
row_dh[1] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 2:
row_dh[2] += 1
# Code for the third row - SL
for i in range(len(testy)):
if testy.item(i) == 2 and testy_fit.item(i) == 0:
row_sl[0] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 1:
row_sl[1] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 2:
row_sl[2] += 1
confusion = np.array([row_no,row_dh,row_sl])
print(confusion)
the result of the print is correct as follow:
[[16 10 0]
[ 2 10 0]
[ 2 0 22]]
python numpy
python numpy
asked May 5 at 23:36
broepkebroepke
283
283
migrated from stackoverflow.com May 5 at 23:52
This question came from our site for professional and enthusiast programmers.
migrated from stackoverflow.com May 5 at 23:52
This question came from our site for professional and enthusiast programmers.
1
$begingroup$
Good thing this got an answer on SO before it was moved. Performance questions fornumpy
are routine on SO.
$endgroup$
– hpaulj
May 6 at 0:15
add a comment |
1
$begingroup$
Good thing this got an answer on SO before it was moved. Performance questions fornumpy
are routine on SO.
$endgroup$
– hpaulj
May 6 at 0:15
1
1
$begingroup$
Good thing this got an answer on SO before it was moved. Performance questions for
numpy
are routine on SO.$endgroup$
– hpaulj
May 6 at 0:15
$begingroup$
Good thing this got an answer on SO before it was moved. Performance questions for
numpy
are routine on SO.$endgroup$
– hpaulj
May 6 at 0:15
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
This can be implemented concisely by using numpy.add.at
:
In [2]: c = np.zeros((3, 3), dtype=int)
In [3]: np.add.at(c, (x, y), 1)
In [4]: c
Out[4]:
array([[16, 10, 0],
[ 2, 10, 0],
[ 2, 0, 22]])
$endgroup$
$begingroup$
Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
$endgroup$
– broepke
May 6 at 2:04
2
$begingroup$
Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
$endgroup$
– Oscar Smith
May 6 at 5:39
add a comment |
$begingroup$
For now disregarding that there is a (way) better numpy
solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.
testy.item(i)
is a very unusual way to saytesty[i]
. It is probably also slower as it involves an attribute lookup.Don't repeat yourself. You test e.g.
if testy.item(i) == 0
three times, each time with a different second condition. Just nest them in anif
block:for i in range(len(testy)):
if testy[i] == 0:
if testy_fit[i] == 0:
row_no[0] += 1
elif testy_fit[i] == 1:
row_no[1] += 1
elif testy_fit[i] == 2:
row_no[2] += 1Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:
for test, fit in zip(testy, testy_fit):
if test == 0 and fit in 0, 1, 2:
row_no[fit] += 1You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:
n = 3
confusion_matrix = [[0] * n for _ in range(n)]
for test, fit in zip(testy, testy_fit):
confusion_matrix[test][fit] += 1
print(np.array(confusion_matrix))Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a
docstring
, and execute them under aif __name__ == "__main__":
guard, which allows you to import from this script from another script without your code running:def confusion_matrix(x, y):
"""Return the confusion matrix for two vectors `x` and `y`.
x and y must only have values from 0 to n and 0 to m, respectively.
"""
n, m = np.max(x) + 1, np.max(y) + 1
matrix = [[0] * m for _ in range(n)]
for a, b in zip(x, y):
matrix[a][b] += 1
return matrix
if __name__ == "__main__":
x = ...
y = ...
print(np.array(confusion_matrix(x, y)))
Once you have come this far, you can just swap the implementation of this function to the faster numpy
one without changing anything (except that it then directly returns a numpy.array
instead of a list of lists).
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f219781%2fimprove-performance-of-comparing-two-numpy-arrays%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
This can be implemented concisely by using numpy.add.at
:
In [2]: c = np.zeros((3, 3), dtype=int)
In [3]: np.add.at(c, (x, y), 1)
In [4]: c
Out[4]:
array([[16, 10, 0],
[ 2, 10, 0],
[ 2, 0, 22]])
$endgroup$
$begingroup$
Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
$endgroup$
– broepke
May 6 at 2:04
2
$begingroup$
Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
$endgroup$
– Oscar Smith
May 6 at 5:39
add a comment |
$begingroup$
This can be implemented concisely by using numpy.add.at
:
In [2]: c = np.zeros((3, 3), dtype=int)
In [3]: np.add.at(c, (x, y), 1)
In [4]: c
Out[4]:
array([[16, 10, 0],
[ 2, 10, 0],
[ 2, 0, 22]])
$endgroup$
$begingroup$
Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
$endgroup$
– broepke
May 6 at 2:04
2
$begingroup$
Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
$endgroup$
– Oscar Smith
May 6 at 5:39
add a comment |
$begingroup$
This can be implemented concisely by using numpy.add.at
:
In [2]: c = np.zeros((3, 3), dtype=int)
In [3]: np.add.at(c, (x, y), 1)
In [4]: c
Out[4]:
array([[16, 10, 0],
[ 2, 10, 0],
[ 2, 0, 22]])
$endgroup$
This can be implemented concisely by using numpy.add.at
:
In [2]: c = np.zeros((3, 3), dtype=int)
In [3]: np.add.at(c, (x, y), 1)
In [4]: c
Out[4]:
array([[16, 10, 0],
[ 2, 10, 0],
[ 2, 0, 22]])
answered May 5 at 23:41
Warren Weckesser
$begingroup$
Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
$endgroup$
– broepke
May 6 at 2:04
2
$begingroup$
Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
$endgroup$
– Oscar Smith
May 6 at 5:39
add a comment |
$begingroup$
Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
$endgroup$
– broepke
May 6 at 2:04
2
$begingroup$
Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
$endgroup$
– Oscar Smith
May 6 at 5:39
$begingroup$
Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
$endgroup$
– broepke
May 6 at 2:04
$begingroup$
Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
$endgroup$
– broepke
May 6 at 2:04
2
2
$begingroup$
Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
$endgroup$
– Oscar Smith
May 6 at 5:39
$begingroup$
Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
$endgroup$
– Oscar Smith
May 6 at 5:39
add a comment |
$begingroup$
For now disregarding that there is a (way) better numpy
solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.
testy.item(i)
is a very unusual way to saytesty[i]
. It is probably also slower as it involves an attribute lookup.Don't repeat yourself. You test e.g.
if testy.item(i) == 0
three times, each time with a different second condition. Just nest them in anif
block:for i in range(len(testy)):
if testy[i] == 0:
if testy_fit[i] == 0:
row_no[0] += 1
elif testy_fit[i] == 1:
row_no[1] += 1
elif testy_fit[i] == 2:
row_no[2] += 1Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:
for test, fit in zip(testy, testy_fit):
if test == 0 and fit in 0, 1, 2:
row_no[fit] += 1You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:
n = 3
confusion_matrix = [[0] * n for _ in range(n)]
for test, fit in zip(testy, testy_fit):
confusion_matrix[test][fit] += 1
print(np.array(confusion_matrix))Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a
docstring
, and execute them under aif __name__ == "__main__":
guard, which allows you to import from this script from another script without your code running:def confusion_matrix(x, y):
"""Return the confusion matrix for two vectors `x` and `y`.
x and y must only have values from 0 to n and 0 to m, respectively.
"""
n, m = np.max(x) + 1, np.max(y) + 1
matrix = [[0] * m for _ in range(n)]
for a, b in zip(x, y):
matrix[a][b] += 1
return matrix
if __name__ == "__main__":
x = ...
y = ...
print(np.array(confusion_matrix(x, y)))
Once you have come this far, you can just swap the implementation of this function to the faster numpy
one without changing anything (except that it then directly returns a numpy.array
instead of a list of lists).
$endgroup$
add a comment |
$begingroup$
For now disregarding that there is a (way) better numpy
solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.
testy.item(i)
is a very unusual way to saytesty[i]
. It is probably also slower as it involves an attribute lookup.Don't repeat yourself. You test e.g.
if testy.item(i) == 0
three times, each time with a different second condition. Just nest them in anif
block:for i in range(len(testy)):
if testy[i] == 0:
if testy_fit[i] == 0:
row_no[0] += 1
elif testy_fit[i] == 1:
row_no[1] += 1
elif testy_fit[i] == 2:
row_no[2] += 1Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:
for test, fit in zip(testy, testy_fit):
if test == 0 and fit in 0, 1, 2:
row_no[fit] += 1You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:
n = 3
confusion_matrix = [[0] * n for _ in range(n)]
for test, fit in zip(testy, testy_fit):
confusion_matrix[test][fit] += 1
print(np.array(confusion_matrix))Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a
docstring
, and execute them under aif __name__ == "__main__":
guard, which allows you to import from this script from another script without your code running:def confusion_matrix(x, y):
"""Return the confusion matrix for two vectors `x` and `y`.
x and y must only have values from 0 to n and 0 to m, respectively.
"""
n, m = np.max(x) + 1, np.max(y) + 1
matrix = [[0] * m for _ in range(n)]
for a, b in zip(x, y):
matrix[a][b] += 1
return matrix
if __name__ == "__main__":
x = ...
y = ...
print(np.array(confusion_matrix(x, y)))
Once you have come this far, you can just swap the implementation of this function to the faster numpy
one without changing anything (except that it then directly returns a numpy.array
instead of a list of lists).
$endgroup$
add a comment |
$begingroup$
For now disregarding that there is a (way) better numpy
solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.
testy.item(i)
is a very unusual way to saytesty[i]
. It is probably also slower as it involves an attribute lookup.Don't repeat yourself. You test e.g.
if testy.item(i) == 0
three times, each time with a different second condition. Just nest them in anif
block:for i in range(len(testy)):
if testy[i] == 0:
if testy_fit[i] == 0:
row_no[0] += 1
elif testy_fit[i] == 1:
row_no[1] += 1
elif testy_fit[i] == 2:
row_no[2] += 1Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:
for test, fit in zip(testy, testy_fit):
if test == 0 and fit in 0, 1, 2:
row_no[fit] += 1You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:
n = 3
confusion_matrix = [[0] * n for _ in range(n)]
for test, fit in zip(testy, testy_fit):
confusion_matrix[test][fit] += 1
print(np.array(confusion_matrix))Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a
docstring
, and execute them under aif __name__ == "__main__":
guard, which allows you to import from this script from another script without your code running:def confusion_matrix(x, y):
"""Return the confusion matrix for two vectors `x` and `y`.
x and y must only have values from 0 to n and 0 to m, respectively.
"""
n, m = np.max(x) + 1, np.max(y) + 1
matrix = [[0] * m for _ in range(n)]
for a, b in zip(x, y):
matrix[a][b] += 1
return matrix
if __name__ == "__main__":
x = ...
y = ...
print(np.array(confusion_matrix(x, y)))
Once you have come this far, you can just swap the implementation of this function to the faster numpy
one without changing anything (except that it then directly returns a numpy.array
instead of a list of lists).
$endgroup$
For now disregarding that there is a (way) better numpy
solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.
testy.item(i)
is a very unusual way to saytesty[i]
. It is probably also slower as it involves an attribute lookup.Don't repeat yourself. You test e.g.
if testy.item(i) == 0
three times, each time with a different second condition. Just nest them in anif
block:for i in range(len(testy)):
if testy[i] == 0:
if testy_fit[i] == 0:
row_no[0] += 1
elif testy_fit[i] == 1:
row_no[1] += 1
elif testy_fit[i] == 2:
row_no[2] += 1Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:
for test, fit in zip(testy, testy_fit):
if test == 0 and fit in 0, 1, 2:
row_no[fit] += 1You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:
n = 3
confusion_matrix = [[0] * n for _ in range(n)]
for test, fit in zip(testy, testy_fit):
confusion_matrix[test][fit] += 1
print(np.array(confusion_matrix))Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a
docstring
, and execute them under aif __name__ == "__main__":
guard, which allows you to import from this script from another script without your code running:def confusion_matrix(x, y):
"""Return the confusion matrix for two vectors `x` and `y`.
x and y must only have values from 0 to n and 0 to m, respectively.
"""
n, m = np.max(x) + 1, np.max(y) + 1
matrix = [[0] * m for _ in range(n)]
for a, b in zip(x, y):
matrix[a][b] += 1
return matrix
if __name__ == "__main__":
x = ...
y = ...
print(np.array(confusion_matrix(x, y)))
Once you have come this far, you can just swap the implementation of this function to the faster numpy
one without changing anything (except that it then directly returns a numpy.array
instead of a list of lists).
answered May 6 at 6:58
GraipherGraipher
27.9k54499
27.9k54499
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f219781%2fimprove-performance-of-comparing-two-numpy-arrays%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Good thing this got an answer on SO before it was moved. Performance questions for
numpy
are routine on SO.$endgroup$
– hpaulj
May 6 at 0:15