This won't be covered in the news because it's cutting edge AI research in Natural Language Processing that only happened in the past two years. And I need to stress that it potentially could end up being the wrong approach, but I don't think so which is why I'm typing it out.
The short version is this: This is the first time we have figured out a way to represent the meaning behind human language sentences as a mathematical matrix. A matrix that contains the meaning of a sentence. One could almost say this matrix represents a single thought or idea.
Right now we are not doing a lot with these sentence-level matrices of meaning, but as a artificial intelligence and natural language researcher I can't wait to see how researchers will extend upon this work. I'm pretty sure in the short term, we will be able to further combine these sentence-matrices into a larger matrix/tensor that represents the meaning of an entire paragraph or document.
Perhaps we can manipulate these meaning-matrices, these "thoughts". Perhaps we can alter these thoughts in a consistent way. If we can combine small thoughts into bigger thoughts, perhaps we can do the reverse and split a big thoughts into smaller thoughts, then transform these thoughts back into words. This would represent a huge development as this would be the first time we have an algorithm that work on artificial thoughts, and perhaps our first delving into artificial thinking.
Let me backtrack a little. Up till about 3-4 years ago, the AI technology that handle human languages could only do the simpler activities, such as predicting what you're going to type, identifying whether the tone of a paragraph is positive or negative, identifying potential answers for a question. All these activities may seem complex but in reality it could be achieved by simply counting words and the use of some relatively simple probability computation.
For example in my research work, my AI algorithm from over a decade ago could answer a question like "Name opponents who George Foreman defeated" simply by searching for names of that appear in articles near mentions of "George Foreman" AND different permutations of "defeat". This requires nothing more than a big search engine and some relatively advanced word probability computation.
The problem with this approach is that the algorithm doesn't actually understand human languages, and would produce the same set of answers to the question "Name opponents who defeated George Foreman". The algorithm only does a word by word analysis of the question, which allows it to identify "George Foreman" to be a name and "defeat" is somehow related to this name. However, the algorithm does not realize there's a huge difference between "George Foreman defeated" and "defeated George Foreman".
This is why there are human language problems that we currently cannot solve well, such as translating between languages. To solve these more advanced problems, we needed a way to go beyond processing word by word. Somehow, we needed to transform individual words in a sentence/passage into meaning.
Three years ago, we began making inroads starting with a technique called word2vec. Word2vec uses a relatively simple neural network to convert every unique word into a unique mathematical vector, like [0.1234, -0.0013, 0.7532, ..., -0.24612]. The numbers in the vector represents a coordinate in a high dimensional space. And this high dimensional space is structured by word similarity. So the coordinates for "water" and "liquid" would be located geometrically close to each other, where as "fire" and "ice" would be located farther apart. Such a representation would also embed quite a bit of common sense knowledge, so "dog" and "fetch" would lie closer than "dog" and "think". Every single word in the human language would have it's own unique coordinate, and we can easily compute how similar or different pairs of words are. However as advanced as Word2vec is, it still operated on a word by word basis and so it still cannot do complex stuff like translating languages.
However, things started to change over the past two years when someone figured out how to use a more specialized type of Recurrent Neural Network made with LSTMs (Long-Short-Term Memory units) to mathematically combine the individual word2vec word-vectors into a single matrix.
This is as far as the current research has achieved. It's not much but like I said, this is the first time we've ever been able to convert a sentence into a mathematical matrix that still contain most of the meaning in the original sentence. If we can learn to manipulate this matrix - by massaging it's meaning, by combining meanings and separating big meanings into smaller meanings, it could potentially be the first time we have achieved something akin to artificial thoughts. and the ability to artificially alter thoughts.
The short version is this: This is the first time we have figured out a way to represent the meaning behind human language sentences as a mathematical matrix. A matrix that contains the meaning of a sentence. One could almost say this matrix represents a single thought or idea.
Right now we are not doing a lot with these sentence-level matrices of meaning, but as a artificial intelligence and natural language researcher I can't wait to see how researchers will extend upon this work. I'm pretty sure in the short term, we will be able to further combine these sentence-matrices into a larger matrix/tensor that represents the meaning of an entire paragraph or document.
Perhaps we can manipulate these meaning-matrices, these "thoughts". Perhaps we can alter these thoughts in a consistent way. If we can combine small thoughts into bigger thoughts, perhaps we can do the reverse and split a big thoughts into smaller thoughts, then transform these thoughts back into words. This would represent a huge development as this would be the first time we have an algorithm that work on artificial thoughts, and perhaps our first delving into artificial thinking.
Let me backtrack a little. Up till about 3-4 years ago, the AI technology that handle human languages could only do the simpler activities, such as predicting what you're going to type, identifying whether the tone of a paragraph is positive or negative, identifying potential answers for a question. All these activities may seem complex but in reality it could be achieved by simply counting words and the use of some relatively simple probability computation.
For example in my research work, my AI algorithm from over a decade ago could answer a question like "Name opponents who George Foreman defeated" simply by searching for names of that appear in articles near mentions of "George Foreman" AND different permutations of "defeat". This requires nothing more than a big search engine and some relatively advanced word probability computation.
The problem with this approach is that the algorithm doesn't actually understand human languages, and would produce the same set of answers to the question "Name opponents who defeated George Foreman". The algorithm only does a word by word analysis of the question, which allows it to identify "George Foreman" to be a name and "defeat" is somehow related to this name. However, the algorithm does not realize there's a huge difference between "George Foreman defeated" and "defeated George Foreman".
This is why there are human language problems that we currently cannot solve well, such as translating between languages. To solve these more advanced problems, we needed a way to go beyond processing word by word. Somehow, we needed to transform individual words in a sentence/passage into meaning.
Three years ago, we began making inroads starting with a technique called word2vec. Word2vec uses a relatively simple neural network to convert every unique word into a unique mathematical vector, like [0.1234, -0.0013, 0.7532, ..., -0.24612]. The numbers in the vector represents a coordinate in a high dimensional space. And this high dimensional space is structured by word similarity. So the coordinates for "water" and "liquid" would be located geometrically close to each other, where as "fire" and "ice" would be located farther apart. Such a representation would also embed quite a bit of common sense knowledge, so "dog" and "fetch" would lie closer than "dog" and "think". Every single word in the human language would have it's own unique coordinate, and we can easily compute how similar or different pairs of words are. However as advanced as Word2vec is, it still operated on a word by word basis and so it still cannot do complex stuff like translating languages.
However, things started to change over the past two years when someone figured out how to use a more specialized type of Recurrent Neural Network made with LSTMs (Long-Short-Term Memory units) to mathematically combine the individual word2vec word-vectors into a single matrix.
This is as far as the current research has achieved. It's not much but like I said, this is the first time we've ever been able to convert a sentence into a mathematical matrix that still contain most of the meaning in the original sentence. If we can learn to manipulate this matrix - by massaging it's meaning, by combining meanings and separating big meanings into smaller meanings, it could potentially be the first time we have achieved something akin to artificial thoughts. and the ability to artificially alter thoughts.
Last edited: