Is the Keras Embedding layer dependent on the target label?How does Keras 'Embedding' layer work?Should word embedding vectors be normalized before being used as inputs?deep learning - word embedding with parts of speechRandomly initialized embedding matrixHow to use Keras pre-trained 'Embedding' layer?How the embedding layer is trained in Keras Embedding layerDimension reduction - word embeddings as inputs for a time series model (LSTM)What is difference between keras embedding layer and word2vec?Learning image embeddings using VGG and Word2VecCan an embedding layer be replaced by a fully connected layer?Autoencoder keeping constant vector as predict in keras

What does 2>&1 | tee mean?

What is the best delay to use between characters sent to the serial port

Wilcoxon signed rank test – critical value for n>50

How do I find and plot the intersection of these three surfaces?

Are there any vegetarian astronauts?

When is it ok to add filler to a story?

Set vertical spacing between two particular items

Do equal angles necessarily mean a polygon is regular?

If my Scout rogue has used his full movement on his turn, can he later use the reaction from the Skirmisher feature to move again?

How to positively portray high and mighty characters?

Should I hide continue button until tasks are completed?

Is adding a new player (or players) a DM decision, or a group decision?

Was "I have the farts, again" broadcast from the Moon to the whole world?

Can a US president have someone sent to prison?

What shortcut does ⌦ symbol in Camunda macOS app indicate and how to invoke it?

Children's short story about material that accelerates away from gravity

Should I report a leak of confidential HR information?

The use of "I" and "we" used in the same sentence and other questions

Can you get infinite turns with this 2 card combo?

Cross over of arrows in a complex diagram

Correct spacing in the alignat*-environment

Was touching your nose a greeting in second millenium Mesopotamia?

can’t run a function against EXEC

Inverse-quotes-quine



Is the Keras Embedding layer dependent on the target label?


How does Keras 'Embedding' layer work?Should word embedding vectors be normalized before being used as inputs?deep learning - word embedding with parts of speechRandomly initialized embedding matrixHow to use Keras pre-trained 'Embedding' layer?How the embedding layer is trained in Keras Embedding layerDimension reduction - word embeddings as inputs for a time series model (LSTM)What is difference between keras embedding layer and word2vec?Learning image embeddings using VGG and Word2VecCan an embedding layer be replaced by a fully connected layer?Autoencoder keeping constant vector as predict in keras






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1












$begingroup$


I learned how to 'use' the Keras Embedding layer, but I am not able to find any more specific information about the actual behavior and training process of this layer. For now, I understand that the Keras Embedding layer maps distinct categorical features to n-dimensional vectors, which allows us to find, for example, how similar two features are.



What I do not understand is how these vectors in the embedding layer are trained. Here is an explanation where there is information that these vectors are not computed with any operation, but working only as a lookup table, but I always thought that they are somehow "trained" to find similarities between distinct features.



If they are trained, are they trained from target labels, or from the order in which they appear (similar to GloVe, word2vec, etc.) or from both?



I have the following example of two pairs of rows in a dataset. y is the model target label and X are the features encoded to integers to be used in the embedding layer:



#pair 1 
dataset_y_row1 = [1]
dataset_y_row2 = [0]
dataset_X_row1 = [3,5,8,45,2]
dataset_X_row2 = [3,5,8,45,2]

#pair 2
dataset_y_row3 = [1]
dataset_y_row4 = [1]
dataset_X_row3 = [3,5,8,45,2]
dataset_X_row4 = [3,5,45,8,2]


My questions are the following:



  1. Will the embedding layer see any difference between rows 1 and 2 (i.e. is
    it 'target-label-sensitive')?

  2. Will the embedding layer see any difference between rows 3 and 4 (i.e. is it sensitive to order of features like word2vec, GloVe, etc.)?









share|cite|improve this question











$endgroup$


















    1












    $begingroup$


    I learned how to 'use' the Keras Embedding layer, but I am not able to find any more specific information about the actual behavior and training process of this layer. For now, I understand that the Keras Embedding layer maps distinct categorical features to n-dimensional vectors, which allows us to find, for example, how similar two features are.



    What I do not understand is how these vectors in the embedding layer are trained. Here is an explanation where there is information that these vectors are not computed with any operation, but working only as a lookup table, but I always thought that they are somehow "trained" to find similarities between distinct features.



    If they are trained, are they trained from target labels, or from the order in which they appear (similar to GloVe, word2vec, etc.) or from both?



    I have the following example of two pairs of rows in a dataset. y is the model target label and X are the features encoded to integers to be used in the embedding layer:



    #pair 1 
    dataset_y_row1 = [1]
    dataset_y_row2 = [0]
    dataset_X_row1 = [3,5,8,45,2]
    dataset_X_row2 = [3,5,8,45,2]

    #pair 2
    dataset_y_row3 = [1]
    dataset_y_row4 = [1]
    dataset_X_row3 = [3,5,8,45,2]
    dataset_X_row4 = [3,5,45,8,2]


    My questions are the following:



    1. Will the embedding layer see any difference between rows 1 and 2 (i.e. is
      it 'target-label-sensitive')?

    2. Will the embedding layer see any difference between rows 3 and 4 (i.e. is it sensitive to order of features like word2vec, GloVe, etc.)?









    share|cite|improve this question











    $endgroup$














      1












      1








      1





      $begingroup$


      I learned how to 'use' the Keras Embedding layer, but I am not able to find any more specific information about the actual behavior and training process of this layer. For now, I understand that the Keras Embedding layer maps distinct categorical features to n-dimensional vectors, which allows us to find, for example, how similar two features are.



      What I do not understand is how these vectors in the embedding layer are trained. Here is an explanation where there is information that these vectors are not computed with any operation, but working only as a lookup table, but I always thought that they are somehow "trained" to find similarities between distinct features.



      If they are trained, are they trained from target labels, or from the order in which they appear (similar to GloVe, word2vec, etc.) or from both?



      I have the following example of two pairs of rows in a dataset. y is the model target label and X are the features encoded to integers to be used in the embedding layer:



      #pair 1 
      dataset_y_row1 = [1]
      dataset_y_row2 = [0]
      dataset_X_row1 = [3,5,8,45,2]
      dataset_X_row2 = [3,5,8,45,2]

      #pair 2
      dataset_y_row3 = [1]
      dataset_y_row4 = [1]
      dataset_X_row3 = [3,5,8,45,2]
      dataset_X_row4 = [3,5,45,8,2]


      My questions are the following:



      1. Will the embedding layer see any difference between rows 1 and 2 (i.e. is
        it 'target-label-sensitive')?

      2. Will the embedding layer see any difference between rows 3 and 4 (i.e. is it sensitive to order of features like word2vec, GloVe, etc.)?









      share|cite|improve this question











      $endgroup$




      I learned how to 'use' the Keras Embedding layer, but I am not able to find any more specific information about the actual behavior and training process of this layer. For now, I understand that the Keras Embedding layer maps distinct categorical features to n-dimensional vectors, which allows us to find, for example, how similar two features are.



      What I do not understand is how these vectors in the embedding layer are trained. Here is an explanation where there is information that these vectors are not computed with any operation, but working only as a lookup table, but I always thought that they are somehow "trained" to find similarities between distinct features.



      If they are trained, are they trained from target labels, or from the order in which they appear (similar to GloVe, word2vec, etc.) or from both?



      I have the following example of two pairs of rows in a dataset. y is the model target label and X are the features encoded to integers to be used in the embedding layer:



      #pair 1 
      dataset_y_row1 = [1]
      dataset_y_row2 = [0]
      dataset_X_row1 = [3,5,8,45,2]
      dataset_X_row2 = [3,5,8,45,2]

      #pair 2
      dataset_y_row3 = [1]
      dataset_y_row4 = [1]
      dataset_X_row3 = [3,5,8,45,2]
      dataset_X_row4 = [3,5,45,8,2]


      My questions are the following:



      1. Will the embedding layer see any difference between rows 1 and 2 (i.e. is
        it 'target-label-sensitive')?

      2. Will the embedding layer see any difference between rows 3 and 4 (i.e. is it sensitive to order of features like word2vec, GloVe, etc.)?






      neural-networks keras word-embeddings embeddings






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Jun 9 at 19:49









      Mihai Chelaru

      2311 silver badge7 bronze badges




      2311 silver badge7 bronze badges










      asked Jun 9 at 16:20









      Jan MusilJan Musil

      354 bronze badges




      354 bronze badges




















          1 Answer
          1






          active

          oldest

          votes


















          3












          $begingroup$

          Embeddings layer for vocabulary of size $m$, that encodes each word into embeddings vector of size $k$ is a shorthand for having the words one-hot encoded using into $m$ features and then putting dense layer with $k$ units over it. Word2vec and GloVe are specialized algorithms for learning the embeddings, but the end product is a matrix of weights that is multiplied by the one-hot encoded words.



          If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .



          To answer your question, one would need to consider what is your network architecture and the data. Algorithms like word2vec and GloVe are trained on language data, to predict things like next word in a sequence. On another hand, if you use the embeddingss layer that is trained from the scratch and used as a part of larger network, that has some utilitarian purpose (e.g. spam detection, sentiment classification), then the layers work as any other dense layers, so they serve purpose of automatic feature engineering. In the latter case, you would expect to see more specialised embeddingss, that would learn features related to the objective of your network.






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
            $endgroup$
            – Jan Musil
            Jun 9 at 18:29










          • $begingroup$
            @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
            $endgroup$
            – Tim
            Jun 9 at 19:38













          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "65"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f412206%2fis-the-keras-embedding-layer-dependent-on-the-target-label%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3












          $begingroup$

          Embeddings layer for vocabulary of size $m$, that encodes each word into embeddings vector of size $k$ is a shorthand for having the words one-hot encoded using into $m$ features and then putting dense layer with $k$ units over it. Word2vec and GloVe are specialized algorithms for learning the embeddings, but the end product is a matrix of weights that is multiplied by the one-hot encoded words.



          If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .



          To answer your question, one would need to consider what is your network architecture and the data. Algorithms like word2vec and GloVe are trained on language data, to predict things like next word in a sequence. On another hand, if you use the embeddingss layer that is trained from the scratch and used as a part of larger network, that has some utilitarian purpose (e.g. spam detection, sentiment classification), then the layers work as any other dense layers, so they serve purpose of automatic feature engineering. In the latter case, you would expect to see more specialised embeddingss, that would learn features related to the objective of your network.






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
            $endgroup$
            – Jan Musil
            Jun 9 at 18:29










          • $begingroup$
            @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
            $endgroup$
            – Tim
            Jun 9 at 19:38















          3












          $begingroup$

          Embeddings layer for vocabulary of size $m$, that encodes each word into embeddings vector of size $k$ is a shorthand for having the words one-hot encoded using into $m$ features and then putting dense layer with $k$ units over it. Word2vec and GloVe are specialized algorithms for learning the embeddings, but the end product is a matrix of weights that is multiplied by the one-hot encoded words.



          If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .



          To answer your question, one would need to consider what is your network architecture and the data. Algorithms like word2vec and GloVe are trained on language data, to predict things like next word in a sequence. On another hand, if you use the embeddingss layer that is trained from the scratch and used as a part of larger network, that has some utilitarian purpose (e.g. spam detection, sentiment classification), then the layers work as any other dense layers, so they serve purpose of automatic feature engineering. In the latter case, you would expect to see more specialised embeddingss, that would learn features related to the objective of your network.






          share|cite|improve this answer











          $endgroup$








          • 1




            $begingroup$
            okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
            $endgroup$
            – Jan Musil
            Jun 9 at 18:29










          • $begingroup$
            @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
            $endgroup$
            – Tim
            Jun 9 at 19:38













          3












          3








          3





          $begingroup$

          Embeddings layer for vocabulary of size $m$, that encodes each word into embeddings vector of size $k$ is a shorthand for having the words one-hot encoded using into $m$ features and then putting dense layer with $k$ units over it. Word2vec and GloVe are specialized algorithms for learning the embeddings, but the end product is a matrix of weights that is multiplied by the one-hot encoded words.



          If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .



          To answer your question, one would need to consider what is your network architecture and the data. Algorithms like word2vec and GloVe are trained on language data, to predict things like next word in a sequence. On another hand, if you use the embeddingss layer that is trained from the scratch and used as a part of larger network, that has some utilitarian purpose (e.g. spam detection, sentiment classification), then the layers work as any other dense layers, so they serve purpose of automatic feature engineering. In the latter case, you would expect to see more specialised embeddingss, that would learn features related to the objective of your network.






          share|cite|improve this answer











          $endgroup$



          Embeddings layer for vocabulary of size $m$, that encodes each word into embeddings vector of size $k$ is a shorthand for having the words one-hot encoded using into $m$ features and then putting dense layer with $k$ units over it. Word2vec and GloVe are specialized algorithms for learning the embeddings, but the end product is a matrix of weights that is multiplied by the one-hot encoded words.



          If you are interested in detailed, yet accessible introductory source on word embeddingss, check the series of blog post by Sebastian Ruder .



          To answer your question, one would need to consider what is your network architecture and the data. Algorithms like word2vec and GloVe are trained on language data, to predict things like next word in a sequence. On another hand, if you use the embeddingss layer that is trained from the scratch and used as a part of larger network, that has some utilitarian purpose (e.g. spam detection, sentiment classification), then the layers work as any other dense layers, so they serve purpose of automatic feature engineering. In the latter case, you would expect to see more specialised embeddingss, that would learn features related to the objective of your network.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Jun 9 at 19:39

























          answered Jun 9 at 17:19









          TimTim

          62.2k9 gold badges138 silver badges235 bronze badges




          62.2k9 gold badges138 silver badges235 bronze badges







          • 1




            $begingroup$
            okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
            $endgroup$
            – Jan Musil
            Jun 9 at 18:29










          • $begingroup$
            @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
            $endgroup$
            – Tim
            Jun 9 at 19:38












          • 1




            $begingroup$
            okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
            $endgroup$
            – Jan Musil
            Jun 9 at 18:29










          • $begingroup$
            @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
            $endgroup$
            – Tim
            Jun 9 at 19:38







          1




          1




          $begingroup$
          okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
          $endgroup$
          – Jan Musil
          Jun 9 at 18:29




          $begingroup$
          okay, thanks just ask to "but the end product is a matrix of weights that is multiplied by the one-hot encoded words." This is related to word2vec and glove, or also to the first part of paragraph (keras Embedding layer). Does it mean that Embedding vector of size m can be just simulated by using one hot encoded layer as input, and dense layer with m neurons? So vector for each one-hot encoded feature should be just it's m weights going from this input feature to dense layer neurons?
          $endgroup$
          – Jan Musil
          Jun 9 at 18:29












          $begingroup$
          @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
          $endgroup$
          – Tim
          Jun 9 at 19:38




          $begingroup$
          @JanMusil as I said, embeddingss are dense layers, so they are matrices of weights to be multiplied by the features, it applies to all the embeddings.
          $endgroup$
          – Tim
          Jun 9 at 19:38

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Cross Validated!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f412206%2fis-the-keras-embedding-layer-dependent-on-the-target-label%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

          Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

          Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070