Why is word embeddings important for natural language processing?

from https://www.tensorflow.org/get_started/embedding_viz

What is Distributed Representation of Words?

In this section, I will explain distributed representation of words for the purpose of understanding it. For comparison, I will also explain one-hot representation of words. After explaining one-hot representation and its problem, I will explain distributed representation and its merits.

One-hot Representation

One way to represent a word as a vector is one-hot representation. One-hot representation is a representation method in which only one element is 1 and the other elements are 0 in the vector. By setting 1 or 0 for each dimension, it represents “that word or not”.

Example of one-hot representation.
Inner-product of two one-hot vectors.

Distributed Representation

Distributed representation, on the other hand, is a representation of a word as a low-dimensional real-valued vector. It is often expressed from about 50 to 300 dimensions. For example, the above words can be expressed as distributed representation as follows:

Example of distributed representation.

Why is Distributed Representation of Words Important?

In this section, I will explain the importance of distributed representation of words in NLP. First, I will explain input to NLP tasks. Then, I will talk about using distributed representation as input. Finally, I will explain distributed representation affecting task performance.

Problems of Distributed Representation

Distributed representation of word is not a silver bullet in NLP. As a result of many studies, we know that there are various problems. Here I will choose two of them and introduce it.

Problem 1: Performance fall short of expectation

The first problem is that when we use distributed representation in actual task (document classification etc.), the performance often fall short of expectation. In the first place, how distributed representation of word is evaluated is often evaluated by the degree of correlation with the evaluation set of word similarity created by humans (Schnabel, Tobias, et al, 2015). In other words, using the distributed representation obtained in the model that can produce results that correlate with the human evaluation will not improve performance even if it is used for the actual task.

Personally, I hope that the currently ignored models will be reevaluated by evaluating with new datasets or tasks.

Problem 2: Word ambiguity

The second problem is that the current distributed representation does not take word ambiguity into account. Words have various meanings. For example, the word “bank” has the meaning of “sloping land” in addition to the meaning of “a financial institution”. In this way, there is a limit to represent word as one vector without considering word ambiguity.

More detail

In the following repositories, I have listed information on distributed representation of words and sentences, pre-trained vectors, and implementations.

Conclusion

Distributed representation of word is an interesting field that is actively studied. I hope this article will help you understand.

References

  1. Mikolov, Tomas, et al. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781 (2013).
  2. Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. “Glove: Global Vectors for Word Representation.” EMNLP. Vol. 14. 2014.
  3. Schnabel, Tobias, et al. “Evaluation methods for unsupervised word embeddings.” EMNLP. 2015.
  4. Chiu, Billy, Anna Korhonen, and Sampo Pyysalo. “Intrinsic evaluation of word vectors fails to predict extrinsic performance.” ACL 2016 (2016): 1.
  5. Oded Avraham, Yoav Goldberg. “Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure.” arXiv preprint arXiv:1611.03641 (2016).
  6. Nayak, Neha, Gabor Angeli, and Christopher D. Manning. “Evaluating Word Embeddings Using a Representative Suite of Practical Tasks.” ACL 2016 (2016): 19.
  7. Trask, Andrew, Phil Michalak, and John Liu. “sense2vec-A fast and accurate method for word sense disambiguation in neural word embeddings.” arXiv preprint arXiv:1511.06388 (2015).
  8. Iacobacci, I., Pilehvar, M. T., & Navigli, R. (2015). SensEmbed: Learning Sense Embeddings for Word and Relational Similarity. In ACL (1) (pp. 95–105).
  9. Reisinger, Joseph, and Raymond J. Mooney. “Multi-prototype vector-space models of word meaning.” Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010.
  10. Huang, Eric H., et al. “Improving word representations via global context and multiple word prototypes.” Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 2012.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Hiroki Nakayama

Hiroki Nakayama

281 Followers

Open source developer. Interested in machine learning and natural language processing. GitHub: https://github.com/Hironsan