Is numpy.corrcoef() enough to find correlation?How to get correlation between two categorical variable and a categorical variable and continuous variable?Find correlation in observed dataUse of Correlation scoreCorrelation and feature selectionExporting Correlation Matrix (from function)How decision trees work in PythonHow to interpret partial dependence interaction plot for binary classification?How to select variables based on the mean correlation in a correlation matrix?How to approach a machine learning problem?Why is my Seaborn distplot creating bouncing lines instead of smooth lines?What measures can I use to find correlation between categorical features and binary label?

What is the object moving across the ceiling in this stock footage?

Why are these traces shaped in such way?

Why do Russians call their women expensive ("дорогая")?

How did early x86 BIOS programmers manage to program full blown TUIs given very few bytes of ROM/EPROM?

Python program to convert a 24 hour format to 12 hour format

Can't remember the name of this game

Why colon to denote that a value belongs to a type?

What is the 中 in ダウンロード中?

How to prevent bad sectors?

Would jet fuel for an F-16 or F-35 be producible during WW2?

How were these pictures of spacecraft wind tunnel testing taken?

Why are C64 games inconsistent with which joystick port they use?

Is there a general effective method to solve Smullyan style Knights and Knaves problems? Is the truth table method the most appropriate one?

ESTA/WVP - leaving US within 90 days, then staying in DR

Is the first derivative operation on a signal a causal system?

Can you heal a summoned creature?

How do I subvert the tropes of a train heist?

Why is desire the root of suffering?

What does the view outside my ship traveling at light speed look like?

Where is the encrypted mask value?

Placing bypass capacitors after VCC reaches the IC

Command to Search for Filenames Exceeding 143 Characters?

Windows 10 Programs start without visual Interface

How can people dance around bonfires on Lag Lo'Omer - it's darchei emori?



Is numpy.corrcoef() enough to find correlation?


How to get correlation between two categorical variable and a categorical variable and continuous variable?Find correlation in observed dataUse of Correlation scoreCorrelation and feature selectionExporting Correlation Matrix (from function)How decision trees work in PythonHow to interpret partial dependence interaction plot for binary classification?How to select variables based on the mean correlation in a correlation matrix?How to approach a machine learning problem?Why is my Seaborn distplot creating bouncing lines instead of smooth lines?What measures can I use to find correlation between categorical features and binary label?













1












$begingroup$


I am currently working through Kaggle's titanic competition and I'm trying to figure out the correlation between the Survived column and other columns. I am using numpy.corrcoef() to matrix the correlation between the columns and here is what I have:



The correlation between pClass & Survived is: [[ 1. -0.33848104]
[-0.33848104 1. ]]

The correlation between Sex & Survived is: [[ 1. -0.54335138]
[-0.54335138 1. ]]

The correlation between Age & Survived is:[[ 1. -0.07065723]
[-0.07065723 1. ]]

The correlation between Fare & Survived is: [[1. 0.25730652]
[0.25730652 1. ]]

The correlation between Parent-Children & Survived is: [[1. 0.08162941]
[0.08162941 1. ]]

The correlation between Sibling-Spouse & Survived is: [[ 1. -0.0353225]
[-0.0353225 1. ]]

The correlation between Embarked & Survived is: [[ 1. -0.16767531]
[-0.16767531 1. ]]


There should be higher correlation between Survived and [pClass, sex, Sibling-Spouse] and yet the values are really low. I'm new to this so I understand that a simple method is not the best way to find correlations but at the moment, this doesn't add up.



This is my full code (without the printf() calls):



import pandas as pd
import numpy as np

train = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/train.csv")
test = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/test.csv")

survived = train['Survived']
pClass = train['Pclass']
sex = train['Sex'].replace(['female', 'male'], [0, 1])
age = train['Age'].fillna(round(float(np.mean(train['Age'].dropna()))))
fare = train['Fare']
parch = train['Parch']
sibSp = train['SibSp']
embarked = train['Embarked'].replace(['C', 'Q', 'S'], [1, 2, 3])









share|improve this question











$endgroup$











  • $begingroup$
    why do you think the values should be higher?
    $endgroup$
    – nairboon
    May 14 at 9:56










  • $begingroup$
    Because there is a strong correlation between sex, class and survival. Women and rich passengers were most likely to survive.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 9:59















1












$begingroup$


I am currently working through Kaggle's titanic competition and I'm trying to figure out the correlation between the Survived column and other columns. I am using numpy.corrcoef() to matrix the correlation between the columns and here is what I have:



The correlation between pClass & Survived is: [[ 1. -0.33848104]
[-0.33848104 1. ]]

The correlation between Sex & Survived is: [[ 1. -0.54335138]
[-0.54335138 1. ]]

The correlation between Age & Survived is:[[ 1. -0.07065723]
[-0.07065723 1. ]]

The correlation between Fare & Survived is: [[1. 0.25730652]
[0.25730652 1. ]]

The correlation between Parent-Children & Survived is: [[1. 0.08162941]
[0.08162941 1. ]]

The correlation between Sibling-Spouse & Survived is: [[ 1. -0.0353225]
[-0.0353225 1. ]]

The correlation between Embarked & Survived is: [[ 1. -0.16767531]
[-0.16767531 1. ]]


There should be higher correlation between Survived and [pClass, sex, Sibling-Spouse] and yet the values are really low. I'm new to this so I understand that a simple method is not the best way to find correlations but at the moment, this doesn't add up.



This is my full code (without the printf() calls):



import pandas as pd
import numpy as np

train = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/train.csv")
test = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/test.csv")

survived = train['Survived']
pClass = train['Pclass']
sex = train['Sex'].replace(['female', 'male'], [0, 1])
age = train['Age'].fillna(round(float(np.mean(train['Age'].dropna()))))
fare = train['Fare']
parch = train['Parch']
sibSp = train['SibSp']
embarked = train['Embarked'].replace(['C', 'Q', 'S'], [1, 2, 3])









share|improve this question











$endgroup$











  • $begingroup$
    why do you think the values should be higher?
    $endgroup$
    – nairboon
    May 14 at 9:56










  • $begingroup$
    Because there is a strong correlation between sex, class and survival. Women and rich passengers were most likely to survive.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 9:59













1












1








1





$begingroup$


I am currently working through Kaggle's titanic competition and I'm trying to figure out the correlation between the Survived column and other columns. I am using numpy.corrcoef() to matrix the correlation between the columns and here is what I have:



The correlation between pClass & Survived is: [[ 1. -0.33848104]
[-0.33848104 1. ]]

The correlation between Sex & Survived is: [[ 1. -0.54335138]
[-0.54335138 1. ]]

The correlation between Age & Survived is:[[ 1. -0.07065723]
[-0.07065723 1. ]]

The correlation between Fare & Survived is: [[1. 0.25730652]
[0.25730652 1. ]]

The correlation between Parent-Children & Survived is: [[1. 0.08162941]
[0.08162941 1. ]]

The correlation between Sibling-Spouse & Survived is: [[ 1. -0.0353225]
[-0.0353225 1. ]]

The correlation between Embarked & Survived is: [[ 1. -0.16767531]
[-0.16767531 1. ]]


There should be higher correlation between Survived and [pClass, sex, Sibling-Spouse] and yet the values are really low. I'm new to this so I understand that a simple method is not the best way to find correlations but at the moment, this doesn't add up.



This is my full code (without the printf() calls):



import pandas as pd
import numpy as np

train = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/train.csv")
test = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/test.csv")

survived = train['Survived']
pClass = train['Pclass']
sex = train['Sex'].replace(['female', 'male'], [0, 1])
age = train['Age'].fillna(round(float(np.mean(train['Age'].dropna()))))
fare = train['Fare']
parch = train['Parch']
sibSp = train['SibSp']
embarked = train['Embarked'].replace(['C', 'Q', 'S'], [1, 2, 3])









share|improve this question











$endgroup$




I am currently working through Kaggle's titanic competition and I'm trying to figure out the correlation between the Survived column and other columns. I am using numpy.corrcoef() to matrix the correlation between the columns and here is what I have:



The correlation between pClass & Survived is: [[ 1. -0.33848104]
[-0.33848104 1. ]]

The correlation between Sex & Survived is: [[ 1. -0.54335138]
[-0.54335138 1. ]]

The correlation between Age & Survived is:[[ 1. -0.07065723]
[-0.07065723 1. ]]

The correlation between Fare & Survived is: [[1. 0.25730652]
[0.25730652 1. ]]

The correlation between Parent-Children & Survived is: [[1. 0.08162941]
[0.08162941 1. ]]

The correlation between Sibling-Spouse & Survived is: [[ 1. -0.0353225]
[-0.0353225 1. ]]

The correlation between Embarked & Survived is: [[ 1. -0.16767531]
[-0.16767531 1. ]]


There should be higher correlation between Survived and [pClass, sex, Sibling-Spouse] and yet the values are really low. I'm new to this so I understand that a simple method is not the best way to find correlations but at the moment, this doesn't add up.



This is my full code (without the printf() calls):



import pandas as pd
import numpy as np

train = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/train.csv")
test = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/test.csv")

survived = train['Survived']
pClass = train['Pclass']
sex = train['Sex'].replace(['female', 'male'], [0, 1])
age = train['Age'].fillna(round(float(np.mean(train['Age'].dropna()))))
fare = train['Fare']
parch = train['Parch']
sibSp = train['SibSp']
embarked = train['Embarked'].replace(['C', 'Q', 'S'], [1, 2, 3])






machine-learning python feature-selection numpy kaggle






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 14 at 13:45









Juan Esteban de la Calle

1,363324




1,363324










asked May 14 at 9:45









Atilla AdrianopolosAtilla Adrianopolos

1134




1134











  • $begingroup$
    why do you think the values should be higher?
    $endgroup$
    – nairboon
    May 14 at 9:56










  • $begingroup$
    Because there is a strong correlation between sex, class and survival. Women and rich passengers were most likely to survive.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 9:59
















  • $begingroup$
    why do you think the values should be higher?
    $endgroup$
    – nairboon
    May 14 at 9:56










  • $begingroup$
    Because there is a strong correlation between sex, class and survival. Women and rich passengers were most likely to survive.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 9:59















$begingroup$
why do you think the values should be higher?
$endgroup$
– nairboon
May 14 at 9:56




$begingroup$
why do you think the values should be higher?
$endgroup$
– nairboon
May 14 at 9:56












$begingroup$
Because there is a strong correlation between sex, class and survival. Women and rich passengers were most likely to survive.
$endgroup$
– Atilla Adrianopolos
May 14 at 9:59




$begingroup$
Because there is a strong correlation between sex, class and survival. Women and rich passengers were most likely to survive.
$endgroup$
– Atilla Adrianopolos
May 14 at 9:59










2 Answers
2






active

oldest

votes


















3












$begingroup$

On a side note, I don't think correlation is the correct measure of relation for you to be using, since Survived is technically a binary categorical variable.



"Correlation" measures used should depend on the type of variables being investigated:



  1. continuous variable v continuous variable: use "traditional" correlation - e.g. Spearman's rank correlation or Pearson's linear correlation.

  2. continuous variable v categorical variable: use an ANOVA F-test / difference of means

  3. categorical variable v categorical variable: use Chi-square / Cramer's V





share|improve this answer









$endgroup$








  • 1




    $begingroup$
    Here is a closely related old post.
    $endgroup$
    – Esmailian
    May 18 at 15:29










  • $begingroup$
    @bradS When you say ANOVA F-test/difference of means, do you mean dividing ANOVA F-test by difference of means?
    $endgroup$
    – Atilla Adrianopolos
    May 19 at 17:50










  • $begingroup$
    @AtillaAdrianopolos, no I mean "/" as "or". Using item 3 above as an example, use Chi-square test of independence or Cramer's V.
    $endgroup$
    – bradS
    May 20 at 8:09


















1












$begingroup$

You probably encoded Women as 0 and men as 1 that's why you get a negative correlation of -0.54, because Survived is 0 for No and 1 for Yes. Your calculation actually show what you've expected. The negative correlation is only about the direction depending on your encoding, the relationship between Women and Survived is 0.54.



Similarly pClass is correlated negatively with -0.33 because the highest class (1st class) is encoded as 1 and the lowest as 3, thus the direction is negative.



You could make the relations more intuitive if you make new columns for men and women where you put 0 and 1 depending on the sex, then the correlations will have the intuitive direction (sign). The same holds for pClass.






share|improve this answer











$endgroup$












  • $begingroup$
    I've added my code.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 10:14










  • $begingroup$
    What if I encode male/female with 3/4 instead? They're still binary values and just might solve the problem you're raisng.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 10:15











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f51935%2fis-numpy-corrcoef-enough-to-find-correlation%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









3












$begingroup$

On a side note, I don't think correlation is the correct measure of relation for you to be using, since Survived is technically a binary categorical variable.



"Correlation" measures used should depend on the type of variables being investigated:



  1. continuous variable v continuous variable: use "traditional" correlation - e.g. Spearman's rank correlation or Pearson's linear correlation.

  2. continuous variable v categorical variable: use an ANOVA F-test / difference of means

  3. categorical variable v categorical variable: use Chi-square / Cramer's V





share|improve this answer









$endgroup$








  • 1




    $begingroup$
    Here is a closely related old post.
    $endgroup$
    – Esmailian
    May 18 at 15:29










  • $begingroup$
    @bradS When you say ANOVA F-test/difference of means, do you mean dividing ANOVA F-test by difference of means?
    $endgroup$
    – Atilla Adrianopolos
    May 19 at 17:50










  • $begingroup$
    @AtillaAdrianopolos, no I mean "/" as "or". Using item 3 above as an example, use Chi-square test of independence or Cramer's V.
    $endgroup$
    – bradS
    May 20 at 8:09















3












$begingroup$

On a side note, I don't think correlation is the correct measure of relation for you to be using, since Survived is technically a binary categorical variable.



"Correlation" measures used should depend on the type of variables being investigated:



  1. continuous variable v continuous variable: use "traditional" correlation - e.g. Spearman's rank correlation or Pearson's linear correlation.

  2. continuous variable v categorical variable: use an ANOVA F-test / difference of means

  3. categorical variable v categorical variable: use Chi-square / Cramer's V





share|improve this answer









$endgroup$








  • 1




    $begingroup$
    Here is a closely related old post.
    $endgroup$
    – Esmailian
    May 18 at 15:29










  • $begingroup$
    @bradS When you say ANOVA F-test/difference of means, do you mean dividing ANOVA F-test by difference of means?
    $endgroup$
    – Atilla Adrianopolos
    May 19 at 17:50










  • $begingroup$
    @AtillaAdrianopolos, no I mean "/" as "or". Using item 3 above as an example, use Chi-square test of independence or Cramer's V.
    $endgroup$
    – bradS
    May 20 at 8:09













3












3








3





$begingroup$

On a side note, I don't think correlation is the correct measure of relation for you to be using, since Survived is technically a binary categorical variable.



"Correlation" measures used should depend on the type of variables being investigated:



  1. continuous variable v continuous variable: use "traditional" correlation - e.g. Spearman's rank correlation or Pearson's linear correlation.

  2. continuous variable v categorical variable: use an ANOVA F-test / difference of means

  3. categorical variable v categorical variable: use Chi-square / Cramer's V





share|improve this answer









$endgroup$



On a side note, I don't think correlation is the correct measure of relation for you to be using, since Survived is technically a binary categorical variable.



"Correlation" measures used should depend on the type of variables being investigated:



  1. continuous variable v continuous variable: use "traditional" correlation - e.g. Spearman's rank correlation or Pearson's linear correlation.

  2. continuous variable v categorical variable: use an ANOVA F-test / difference of means

  3. categorical variable v categorical variable: use Chi-square / Cramer's V






share|improve this answer












share|improve this answer



share|improve this answer










answered May 14 at 11:07









bradSbradS

783214




783214







  • 1




    $begingroup$
    Here is a closely related old post.
    $endgroup$
    – Esmailian
    May 18 at 15:29










  • $begingroup$
    @bradS When you say ANOVA F-test/difference of means, do you mean dividing ANOVA F-test by difference of means?
    $endgroup$
    – Atilla Adrianopolos
    May 19 at 17:50










  • $begingroup$
    @AtillaAdrianopolos, no I mean "/" as "or". Using item 3 above as an example, use Chi-square test of independence or Cramer's V.
    $endgroup$
    – bradS
    May 20 at 8:09












  • 1




    $begingroup$
    Here is a closely related old post.
    $endgroup$
    – Esmailian
    May 18 at 15:29










  • $begingroup$
    @bradS When you say ANOVA F-test/difference of means, do you mean dividing ANOVA F-test by difference of means?
    $endgroup$
    – Atilla Adrianopolos
    May 19 at 17:50










  • $begingroup$
    @AtillaAdrianopolos, no I mean "/" as "or". Using item 3 above as an example, use Chi-square test of independence or Cramer's V.
    $endgroup$
    – bradS
    May 20 at 8:09







1




1




$begingroup$
Here is a closely related old post.
$endgroup$
– Esmailian
May 18 at 15:29




$begingroup$
Here is a closely related old post.
$endgroup$
– Esmailian
May 18 at 15:29












$begingroup$
@bradS When you say ANOVA F-test/difference of means, do you mean dividing ANOVA F-test by difference of means?
$endgroup$
– Atilla Adrianopolos
May 19 at 17:50




$begingroup$
@bradS When you say ANOVA F-test/difference of means, do you mean dividing ANOVA F-test by difference of means?
$endgroup$
– Atilla Adrianopolos
May 19 at 17:50












$begingroup$
@AtillaAdrianopolos, no I mean "/" as "or". Using item 3 above as an example, use Chi-square test of independence or Cramer's V.
$endgroup$
– bradS
May 20 at 8:09




$begingroup$
@AtillaAdrianopolos, no I mean "/" as "or". Using item 3 above as an example, use Chi-square test of independence or Cramer's V.
$endgroup$
– bradS
May 20 at 8:09











1












$begingroup$

You probably encoded Women as 0 and men as 1 that's why you get a negative correlation of -0.54, because Survived is 0 for No and 1 for Yes. Your calculation actually show what you've expected. The negative correlation is only about the direction depending on your encoding, the relationship between Women and Survived is 0.54.



Similarly pClass is correlated negatively with -0.33 because the highest class (1st class) is encoded as 1 and the lowest as 3, thus the direction is negative.



You could make the relations more intuitive if you make new columns for men and women where you put 0 and 1 depending on the sex, then the correlations will have the intuitive direction (sign). The same holds for pClass.






share|improve this answer











$endgroup$












  • $begingroup$
    I've added my code.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 10:14










  • $begingroup$
    What if I encode male/female with 3/4 instead? They're still binary values and just might solve the problem you're raisng.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 10:15















1












$begingroup$

You probably encoded Women as 0 and men as 1 that's why you get a negative correlation of -0.54, because Survived is 0 for No and 1 for Yes. Your calculation actually show what you've expected. The negative correlation is only about the direction depending on your encoding, the relationship between Women and Survived is 0.54.



Similarly pClass is correlated negatively with -0.33 because the highest class (1st class) is encoded as 1 and the lowest as 3, thus the direction is negative.



You could make the relations more intuitive if you make new columns for men and women where you put 0 and 1 depending on the sex, then the correlations will have the intuitive direction (sign). The same holds for pClass.






share|improve this answer











$endgroup$












  • $begingroup$
    I've added my code.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 10:14










  • $begingroup$
    What if I encode male/female with 3/4 instead? They're still binary values and just might solve the problem you're raisng.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 10:15













1












1








1





$begingroup$

You probably encoded Women as 0 and men as 1 that's why you get a negative correlation of -0.54, because Survived is 0 for No and 1 for Yes. Your calculation actually show what you've expected. The negative correlation is only about the direction depending on your encoding, the relationship between Women and Survived is 0.54.



Similarly pClass is correlated negatively with -0.33 because the highest class (1st class) is encoded as 1 and the lowest as 3, thus the direction is negative.



You could make the relations more intuitive if you make new columns for men and women where you put 0 and 1 depending on the sex, then the correlations will have the intuitive direction (sign). The same holds for pClass.






share|improve this answer











$endgroup$



You probably encoded Women as 0 and men as 1 that's why you get a negative correlation of -0.54, because Survived is 0 for No and 1 for Yes. Your calculation actually show what you've expected. The negative correlation is only about the direction depending on your encoding, the relationship between Women and Survived is 0.54.



Similarly pClass is correlated negatively with -0.33 because the highest class (1st class) is encoded as 1 and the lowest as 3, thus the direction is negative.



You could make the relations more intuitive if you make new columns for men and women where you put 0 and 1 depending on the sex, then the correlations will have the intuitive direction (sign). The same holds for pClass.







share|improve this answer














share|improve this answer



share|improve this answer








edited May 14 at 13:16









Stephen Rauch

1,51361330




1,51361330










answered May 14 at 10:10









nairboonnairboon

1132




1132











  • $begingroup$
    I've added my code.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 10:14










  • $begingroup$
    What if I encode male/female with 3/4 instead? They're still binary values and just might solve the problem you're raisng.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 10:15
















  • $begingroup$
    I've added my code.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 10:14










  • $begingroup$
    What if I encode male/female with 3/4 instead? They're still binary values and just might solve the problem you're raisng.
    $endgroup$
    – Atilla Adrianopolos
    May 14 at 10:15















$begingroup$
I've added my code.
$endgroup$
– Atilla Adrianopolos
May 14 at 10:14




$begingroup$
I've added my code.
$endgroup$
– Atilla Adrianopolos
May 14 at 10:14












$begingroup$
What if I encode male/female with 3/4 instead? They're still binary values and just might solve the problem you're raisng.
$endgroup$
– Atilla Adrianopolos
May 14 at 10:15




$begingroup$
What if I encode male/female with 3/4 instead? They're still binary values and just might solve the problem you're raisng.
$endgroup$
– Atilla Adrianopolos
May 14 at 10:15

















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f51935%2fis-numpy-corrcoef-enough-to-find-correlation%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company