Improve Performance of Comparing two Numpy ArraysImplementing F1 scoreDefensive programming type-checkingRecursive function, high performance criticalHints to make Sudoku solver more PythonicMatrix rotation algorithmHackerRank “Nested Lists” CodeStudents with second lowest gradeReturn a minimum number of ranges from a collection of rangesEnsuring performance of sketching/streaming algorithm (countSketch)Concordance index calculation

What do you call bracelets you wear around the legs?

Referring to a character in 3rd person when they have amnesia

Physically unpleasant work environment

What color to choose as "danger" if the main color of my app is red

Why is the S-duct intake on the Tu-154 uniquely oblong?

Why would you put your input amplifier in front of your filtering for an ECG signal?

How to draw pentagram-like shape in Latex?

how to create an executable file for an AppleScript?

Why does string strummed with finger sound different from the one strummed with pick?

Should I twist DC power and ground wires from a power supply?

Taylor series leads to two different functions - why?

Why does Taylor’s series “work”?

Largest memory peripheral for Sinclair ZX81?

on the truth quest vs in the quest for truth

FIFO data structure in pure C

Should all adjustments be random effects in a mixed linear effect?

How come Arya Stark wasn't hurt by this in Game of Thrones Season 8 Episode 5?

Divisor Rich and Poor Numbers

Quotient of Three Dimensional Torus by Permutation on Coordinates

How do we explain the use of a software on a math paper?

Parse a C++14 integer literal

How would fantasy dwarves exist, realistically?

Pedaling at different gear ratios on flat terrain: what's the point?

Who is frowning in the sentence "Daisy looked at Tom frowning"?



Improve Performance of Comparing two Numpy Arrays


Implementing F1 scoreDefensive programming type-checkingRecursive function, high performance criticalHints to make Sudoku solver more PythonicMatrix rotation algorithmHackerRank “Nested Lists” CodeStudents with second lowest gradeReturn a minimum number of ranges from a collection of rangesEnsuring performance of sketching/streaming algorithm (countSketch)Concordance index calculation






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








5












$begingroup$


I had a code challenge for a class I'm taking that built a NN algorithm. I got it to work but I used really basic methods for solving it. There are two 1D NP Arrays that have values 0-2 in them, both equal length. They represent two different trains and test data The output is a confusion matrix that shows which received the right predictions and which received the wrong (doesn't matter ;).



This code is correct - I just feel I took the lazy way out working with lists and then turning those lists into a ndarray. I would love to see if people have some tips on maybe utilizing Numpy for this? Anything Clever?



import numpy as np

x = [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0]
y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

testy = np.array(x)
testy_fit = np.array(y)

row_no = [0,0,0]
row_dh = [0,0,0]
row_sl = [0,0,0]

# Code for the first row - NO
for i in range(len(testy)):
if testy.item(i) == 0 and testy_fit.item(i) == 0:
row_no[0] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 1:
row_no[1] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 2:
row_no[2] += 1

# Code for the second row - DH
for i in range(len(testy)):
if testy.item(i) == 1 and testy_fit.item(i) == 0:
row_dh[0] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 1:
row_dh[1] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 2:
row_dh[2] += 1

# Code for the third row - SL
for i in range(len(testy)):
if testy.item(i) == 2 and testy_fit.item(i) == 0:
row_sl[0] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 1:
row_sl[1] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 2:
row_sl[2] += 1

confusion = np.array([row_no,row_dh,row_sl])

print(confusion)



the result of the print is correct as follow:



[[16 10 0]
[ 2 10 0]
[ 2 0 22]]









share|improve this question









$endgroup$



migrated from stackoverflow.com May 5 at 23:52


This question came from our site for professional and enthusiast programmers.













  • 1




    $begingroup$
    Good thing this got an answer on SO before it was moved. Performance questions for numpy are routine on SO.
    $endgroup$
    – hpaulj
    May 6 at 0:15

















5












$begingroup$


I had a code challenge for a class I'm taking that built a NN algorithm. I got it to work but I used really basic methods for solving it. There are two 1D NP Arrays that have values 0-2 in them, both equal length. They represent two different trains and test data The output is a confusion matrix that shows which received the right predictions and which received the wrong (doesn't matter ;).



This code is correct - I just feel I took the lazy way out working with lists and then turning those lists into a ndarray. I would love to see if people have some tips on maybe utilizing Numpy for this? Anything Clever?



import numpy as np

x = [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0]
y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

testy = np.array(x)
testy_fit = np.array(y)

row_no = [0,0,0]
row_dh = [0,0,0]
row_sl = [0,0,0]

# Code for the first row - NO
for i in range(len(testy)):
if testy.item(i) == 0 and testy_fit.item(i) == 0:
row_no[0] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 1:
row_no[1] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 2:
row_no[2] += 1

# Code for the second row - DH
for i in range(len(testy)):
if testy.item(i) == 1 and testy_fit.item(i) == 0:
row_dh[0] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 1:
row_dh[1] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 2:
row_dh[2] += 1

# Code for the third row - SL
for i in range(len(testy)):
if testy.item(i) == 2 and testy_fit.item(i) == 0:
row_sl[0] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 1:
row_sl[1] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 2:
row_sl[2] += 1

confusion = np.array([row_no,row_dh,row_sl])

print(confusion)



the result of the print is correct as follow:



[[16 10 0]
[ 2 10 0]
[ 2 0 22]]









share|improve this question









$endgroup$



migrated from stackoverflow.com May 5 at 23:52


This question came from our site for professional and enthusiast programmers.













  • 1




    $begingroup$
    Good thing this got an answer on SO before it was moved. Performance questions for numpy are routine on SO.
    $endgroup$
    – hpaulj
    May 6 at 0:15













5












5








5





$begingroup$


I had a code challenge for a class I'm taking that built a NN algorithm. I got it to work but I used really basic methods for solving it. There are two 1D NP Arrays that have values 0-2 in them, both equal length. They represent two different trains and test data The output is a confusion matrix that shows which received the right predictions and which received the wrong (doesn't matter ;).



This code is correct - I just feel I took the lazy way out working with lists and then turning those lists into a ndarray. I would love to see if people have some tips on maybe utilizing Numpy for this? Anything Clever?



import numpy as np

x = [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0]
y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

testy = np.array(x)
testy_fit = np.array(y)

row_no = [0,0,0]
row_dh = [0,0,0]
row_sl = [0,0,0]

# Code for the first row - NO
for i in range(len(testy)):
if testy.item(i) == 0 and testy_fit.item(i) == 0:
row_no[0] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 1:
row_no[1] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 2:
row_no[2] += 1

# Code for the second row - DH
for i in range(len(testy)):
if testy.item(i) == 1 and testy_fit.item(i) == 0:
row_dh[0] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 1:
row_dh[1] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 2:
row_dh[2] += 1

# Code for the third row - SL
for i in range(len(testy)):
if testy.item(i) == 2 and testy_fit.item(i) == 0:
row_sl[0] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 1:
row_sl[1] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 2:
row_sl[2] += 1

confusion = np.array([row_no,row_dh,row_sl])

print(confusion)



the result of the print is correct as follow:



[[16 10 0]
[ 2 10 0]
[ 2 0 22]]









share|improve this question









$endgroup$




I had a code challenge for a class I'm taking that built a NN algorithm. I got it to work but I used really basic methods for solving it. There are two 1D NP Arrays that have values 0-2 in them, both equal length. They represent two different trains and test data The output is a confusion matrix that shows which received the right predictions and which received the wrong (doesn't matter ;).



This code is correct - I just feel I took the lazy way out working with lists and then turning those lists into a ndarray. I would love to see if people have some tips on maybe utilizing Numpy for this? Anything Clever?



import numpy as np

x = [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0]
y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

testy = np.array(x)
testy_fit = np.array(y)

row_no = [0,0,0]
row_dh = [0,0,0]
row_sl = [0,0,0]

# Code for the first row - NO
for i in range(len(testy)):
if testy.item(i) == 0 and testy_fit.item(i) == 0:
row_no[0] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 1:
row_no[1] += 1
elif testy.item(i) == 0 and testy_fit.item(i) == 2:
row_no[2] += 1

# Code for the second row - DH
for i in range(len(testy)):
if testy.item(i) == 1 and testy_fit.item(i) == 0:
row_dh[0] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 1:
row_dh[1] += 1
elif testy.item(i) == 1 and testy_fit.item(i) == 2:
row_dh[2] += 1

# Code for the third row - SL
for i in range(len(testy)):
if testy.item(i) == 2 and testy_fit.item(i) == 0:
row_sl[0] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 1:
row_sl[1] += 1
elif testy.item(i) == 2 and testy_fit.item(i) == 2:
row_sl[2] += 1

confusion = np.array([row_no,row_dh,row_sl])

print(confusion)



the result of the print is correct as follow:



[[16 10 0]
[ 2 10 0]
[ 2 0 22]]






python numpy






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked May 5 at 23:36









broepkebroepke

283




283




migrated from stackoverflow.com May 5 at 23:52


This question came from our site for professional and enthusiast programmers.









migrated from stackoverflow.com May 5 at 23:52


This question came from our site for professional and enthusiast programmers.









  • 1




    $begingroup$
    Good thing this got an answer on SO before it was moved. Performance questions for numpy are routine on SO.
    $endgroup$
    – hpaulj
    May 6 at 0:15












  • 1




    $begingroup$
    Good thing this got an answer on SO before it was moved. Performance questions for numpy are routine on SO.
    $endgroup$
    – hpaulj
    May 6 at 0:15







1




1




$begingroup$
Good thing this got an answer on SO before it was moved. Performance questions for numpy are routine on SO.
$endgroup$
– hpaulj
May 6 at 0:15




$begingroup$
Good thing this got an answer on SO before it was moved. Performance questions for numpy are routine on SO.
$endgroup$
– hpaulj
May 6 at 0:15










2 Answers
2






active

oldest

votes


















5












$begingroup$

This can be implemented concisely by using numpy.add.at:



In [2]: c = np.zeros((3, 3), dtype=int) 

In [3]: np.add.at(c, (x, y), 1)

In [4]: c
Out[4]:
array([[16, 10, 0],
[ 2, 10, 0],
[ 2, 0, 22]])





share|improve this answer









$endgroup$












  • $begingroup$
    Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
    $endgroup$
    – broepke
    May 6 at 2:04






  • 2




    $begingroup$
    Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
    $endgroup$
    – Oscar Smith
    May 6 at 5:39


















3












$begingroup$

For now disregarding that there is a (way) better numpy solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.




  • testy.item(i) is a very unusual way to say testy[i]. It is probably also slower as it involves an attribute lookup.


  • Don't repeat yourself. You test e.g. if testy.item(i) == 0 three times, each time with a different second condition. Just nest them in an if block:



    for i in range(len(testy)):
    if testy[i] == 0:
    if testy_fit[i] == 0:
    row_no[0] += 1
    elif testy_fit[i] == 1:
    row_no[1] += 1
    elif testy_fit[i] == 2:
    row_no[2] += 1



  • Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:



    for test, fit in zip(testy, testy_fit):
    if test == 0 and fit in 0, 1, 2:
    row_no[fit] += 1



  • You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:



    n = 3
    confusion_matrix = [[0] * n for _ in range(n)]
    for test, fit in zip(testy, testy_fit):
    confusion_matrix[test][fit] += 1

    print(np.array(confusion_matrix))



  • Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a docstring, and execute them under a if __name__ == "__main__": guard, which allows you to import from this script from another script without your code running:



    def confusion_matrix(x, y):
    """Return the confusion matrix for two vectors `x` and `y`.
    x and y must only have values from 0 to n and 0 to m, respectively.
    """
    n, m = np.max(x) + 1, np.max(y) + 1
    matrix = [[0] * m for _ in range(n)]
    for a, b in zip(x, y):
    matrix[a][b] += 1
    return matrix

    if __name__ == "__main__":
    x = ...
    y = ...
    print(np.array(confusion_matrix(x, y)))


Once you have come this far, you can just swap the implementation of this function to the faster numpy one without changing anything (except that it then directly returns a numpy.array instead of a list of lists).






share|improve this answer









$endgroup$













    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "196"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f219781%2fimprove-performance-of-comparing-two-numpy-arrays%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    5












    $begingroup$

    This can be implemented concisely by using numpy.add.at:



    In [2]: c = np.zeros((3, 3), dtype=int) 

    In [3]: np.add.at(c, (x, y), 1)

    In [4]: c
    Out[4]:
    array([[16, 10, 0],
    [ 2, 10, 0],
    [ 2, 0, 22]])





    share|improve this answer









    $endgroup$












    • $begingroup$
      Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
      $endgroup$
      – broepke
      May 6 at 2:04






    • 2




      $begingroup$
      Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
      $endgroup$
      – Oscar Smith
      May 6 at 5:39















    5












    $begingroup$

    This can be implemented concisely by using numpy.add.at:



    In [2]: c = np.zeros((3, 3), dtype=int) 

    In [3]: np.add.at(c, (x, y), 1)

    In [4]: c
    Out[4]:
    array([[16, 10, 0],
    [ 2, 10, 0],
    [ 2, 0, 22]])





    share|improve this answer









    $endgroup$












    • $begingroup$
      Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
      $endgroup$
      – broepke
      May 6 at 2:04






    • 2




      $begingroup$
      Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
      $endgroup$
      – Oscar Smith
      May 6 at 5:39













    5












    5








    5





    $begingroup$

    This can be implemented concisely by using numpy.add.at:



    In [2]: c = np.zeros((3, 3), dtype=int) 

    In [3]: np.add.at(c, (x, y), 1)

    In [4]: c
    Out[4]:
    array([[16, 10, 0],
    [ 2, 10, 0],
    [ 2, 0, 22]])





    share|improve this answer









    $endgroup$



    This can be implemented concisely by using numpy.add.at:



    In [2]: c = np.zeros((3, 3), dtype=int) 

    In [3]: np.add.at(c, (x, y), 1)

    In [4]: c
    Out[4]:
    array([[16, 10, 0],
    [ 2, 10, 0],
    [ 2, 0, 22]])






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered May 5 at 23:41







    Warren Weckesser


















    • $begingroup$
      Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
      $endgroup$
      – broepke
      May 6 at 2:04






    • 2




      $begingroup$
      Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
      $endgroup$
      – Oscar Smith
      May 6 at 5:39
















    • $begingroup$
      Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
      $endgroup$
      – broepke
      May 6 at 2:04






    • 2




      $begingroup$
      Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
      $endgroup$
      – Oscar Smith
      May 6 at 5:39















    $begingroup$
    Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
    $endgroup$
    – broepke
    May 6 at 2:04




    $begingroup$
    Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!
    $endgroup$
    – broepke
    May 6 at 2:04




    2




    2




    $begingroup$
    Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
    $endgroup$
    – Oscar Smith
    May 6 at 5:39




    $begingroup$
    Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.
    $endgroup$
    – Oscar Smith
    May 6 at 5:39













    3












    $begingroup$

    For now disregarding that there is a (way) better numpy solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.




    • testy.item(i) is a very unusual way to say testy[i]. It is probably also slower as it involves an attribute lookup.


    • Don't repeat yourself. You test e.g. if testy.item(i) == 0 three times, each time with a different second condition. Just nest them in an if block:



      for i in range(len(testy)):
      if testy[i] == 0:
      if testy_fit[i] == 0:
      row_no[0] += 1
      elif testy_fit[i] == 1:
      row_no[1] += 1
      elif testy_fit[i] == 2:
      row_no[2] += 1



    • Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:



      for test, fit in zip(testy, testy_fit):
      if test == 0 and fit in 0, 1, 2:
      row_no[fit] += 1



    • You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:



      n = 3
      confusion_matrix = [[0] * n for _ in range(n)]
      for test, fit in zip(testy, testy_fit):
      confusion_matrix[test][fit] += 1

      print(np.array(confusion_matrix))



    • Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a docstring, and execute them under a if __name__ == "__main__": guard, which allows you to import from this script from another script without your code running:



      def confusion_matrix(x, y):
      """Return the confusion matrix for two vectors `x` and `y`.
      x and y must only have values from 0 to n and 0 to m, respectively.
      """
      n, m = np.max(x) + 1, np.max(y) + 1
      matrix = [[0] * m for _ in range(n)]
      for a, b in zip(x, y):
      matrix[a][b] += 1
      return matrix

      if __name__ == "__main__":
      x = ...
      y = ...
      print(np.array(confusion_matrix(x, y)))


    Once you have come this far, you can just swap the implementation of this function to the faster numpy one without changing anything (except that it then directly returns a numpy.array instead of a list of lists).






    share|improve this answer









    $endgroup$

















      3












      $begingroup$

      For now disregarding that there is a (way) better numpy solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.




      • testy.item(i) is a very unusual way to say testy[i]. It is probably also slower as it involves an attribute lookup.


      • Don't repeat yourself. You test e.g. if testy.item(i) == 0 three times, each time with a different second condition. Just nest them in an if block:



        for i in range(len(testy)):
        if testy[i] == 0:
        if testy_fit[i] == 0:
        row_no[0] += 1
        elif testy_fit[i] == 1:
        row_no[1] += 1
        elif testy_fit[i] == 2:
        row_no[2] += 1



      • Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:



        for test, fit in zip(testy, testy_fit):
        if test == 0 and fit in 0, 1, 2:
        row_no[fit] += 1



      • You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:



        n = 3
        confusion_matrix = [[0] * n for _ in range(n)]
        for test, fit in zip(testy, testy_fit):
        confusion_matrix[test][fit] += 1

        print(np.array(confusion_matrix))



      • Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a docstring, and execute them under a if __name__ == "__main__": guard, which allows you to import from this script from another script without your code running:



        def confusion_matrix(x, y):
        """Return the confusion matrix for two vectors `x` and `y`.
        x and y must only have values from 0 to n and 0 to m, respectively.
        """
        n, m = np.max(x) + 1, np.max(y) + 1
        matrix = [[0] * m for _ in range(n)]
        for a, b in zip(x, y):
        matrix[a][b] += 1
        return matrix

        if __name__ == "__main__":
        x = ...
        y = ...
        print(np.array(confusion_matrix(x, y)))


      Once you have come this far, you can just swap the implementation of this function to the faster numpy one without changing anything (except that it then directly returns a numpy.array instead of a list of lists).






      share|improve this answer









      $endgroup$















        3












        3








        3





        $begingroup$

        For now disregarding that there is a (way) better numpy solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.




        • testy.item(i) is a very unusual way to say testy[i]. It is probably also slower as it involves an attribute lookup.


        • Don't repeat yourself. You test e.g. if testy.item(i) == 0 three times, each time with a different second condition. Just nest them in an if block:



          for i in range(len(testy)):
          if testy[i] == 0:
          if testy_fit[i] == 0:
          row_no[0] += 1
          elif testy_fit[i] == 1:
          row_no[1] += 1
          elif testy_fit[i] == 2:
          row_no[2] += 1



        • Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:



          for test, fit in zip(testy, testy_fit):
          if test == 0 and fit in 0, 1, 2:
          row_no[fit] += 1



        • You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:



          n = 3
          confusion_matrix = [[0] * n for _ in range(n)]
          for test, fit in zip(testy, testy_fit):
          confusion_matrix[test][fit] += 1

          print(np.array(confusion_matrix))



        • Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a docstring, and execute them under a if __name__ == "__main__": guard, which allows you to import from this script from another script without your code running:



          def confusion_matrix(x, y):
          """Return the confusion matrix for two vectors `x` and `y`.
          x and y must only have values from 0 to n and 0 to m, respectively.
          """
          n, m = np.max(x) + 1, np.max(y) + 1
          matrix = [[0] * m for _ in range(n)]
          for a, b in zip(x, y):
          matrix[a][b] += 1
          return matrix

          if __name__ == "__main__":
          x = ...
          y = ...
          print(np.array(confusion_matrix(x, y)))


        Once you have come this far, you can just swap the implementation of this function to the faster numpy one without changing anything (except that it then directly returns a numpy.array instead of a list of lists).






        share|improve this answer









        $endgroup$



        For now disregarding that there is a (way) better numpy solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.




        • testy.item(i) is a very unusual way to say testy[i]. It is probably also slower as it involves an attribute lookup.


        • Don't repeat yourself. You test e.g. if testy.item(i) == 0 three times, each time with a different second condition. Just nest them in an if block:



          for i in range(len(testy)):
          if testy[i] == 0:
          if testy_fit[i] == 0:
          row_no[0] += 1
          elif testy_fit[i] == 1:
          row_no[1] += 1
          elif testy_fit[i] == 2:
          row_no[2] += 1



        • Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:



          for test, fit in zip(testy, testy_fit):
          if test == 0 and fit in 0, 1, 2:
          row_no[fit] += 1



        • You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:



          n = 3
          confusion_matrix = [[0] * n for _ in range(n)]
          for test, fit in zip(testy, testy_fit):
          confusion_matrix[test][fit] += 1

          print(np.array(confusion_matrix))



        • Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a docstring, and execute them under a if __name__ == "__main__": guard, which allows you to import from this script from another script without your code running:



          def confusion_matrix(x, y):
          """Return the confusion matrix for two vectors `x` and `y`.
          x and y must only have values from 0 to n and 0 to m, respectively.
          """
          n, m = np.max(x) + 1, np.max(y) + 1
          matrix = [[0] * m for _ in range(n)]
          for a, b in zip(x, y):
          matrix[a][b] += 1
          return matrix

          if __name__ == "__main__":
          x = ...
          y = ...
          print(np.array(confusion_matrix(x, y)))


        Once you have come this far, you can just swap the implementation of this function to the faster numpy one without changing anything (except that it then directly returns a numpy.array instead of a list of lists).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered May 6 at 6:58









        GraipherGraipher

        27.9k54499




        27.9k54499



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f219781%2fimprove-performance-of-comparing-two-numpy-arrays%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

            Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

            What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company