Home List your patent My account Help Support us

Natural Language Processor (more comprehensive explanation)

[Category : - SOFTWARES]
[Viewed 1853 times]

TITLE: Natural Language Processing System

BACKGROUND

[0001] The problem to be solved involves making machines able to process information
given to them in natural language format in order to derive new information. Traditionally this
involves representing the meaning of words in some way, then manipulating this
representation.
BRIEF SUMMARY OF THE INVENTION

[0002] This invention is a natural language processing system that essentially maps sentences or
groups of sentences to other sentences. It also introduces algorithms for speeding up
the performance of the system and a novel way of representing relationships
between words.

DETAILED DESCRIPTION AND BEST MODE OF IMPLEMENTATION

[0003] This invention is a natural language processing system that essentially maps
sentences or sets of groups of sentences to other sentences. For example, in one embodiment it
takes a sentence written in one language, performs the mapping transformation on it and
returns the equivalent statement in another language. The system needs to be trained using supervised
methods
before it can be put into operation, and there will be many functions that represent
different ways to process, or extract information from the input sentences and each require different types
of training data.
Currently most natural language processing systems rely on a database contain words in a
multidimensional space, where each dimension represents a nuance of the word and
proximity of two words with respect to one dimension shows their similarity
in that sense. To create these relationships between words a system is trained to analyse
the relationships between words in the sentences in the corpus it is made to read. As it
reads more and more sentences, the relationships between words as represented by word vectors are
adjusted to reflect new, additional and hopefully
more accurate information on the relationships.
These word vectors contain information on what it means for a certain group of words
from their collective when placed in a finite sequence in a specific order to make sense or
have meaning, or in other words, what it means for a sentence to be meaningful.
What is really happening during training is that the system is asked. Given that “n”
different input sentences are each meaningful, how can we represent the words/elements that make
up each sentence, to represent this similarity of *meaningfulness while taking care to also represent
elements /words in these sentences that are identical I. E. have the same *meaning also similarly. In short
it must accurately model a system which contains two structures with different types of similarity, one
(words) may have a similarity in *meaning and the other (sentences)must have a similarity
in*meaningfulness.so
as the system keeps receiving new sentences as input during training, there are more and
more constraints on the possible values of word vectors in the words encountered so that
they maintain their status as solutions to the problem they have of having to give the
same property (meaningfulness) to many instances of sentences which they are
distributed amongst.
e.g. if the system encounters the following two sentences “I am not fine” and “cats do not
swim” it knows that those two sentences are alike in some respect, i.e. they are both
sensible sentences (meaningful)
so, it looks for a way to represent each of the words in each sentence so they represent
this sensibility. There are many solutions given just these two sentences, but whatever
solution we choose will have as a property that the word “not” will be represented
similarly, in both of the sentences, but we can be more flexible with what options we
choose for the rest of the words in each sentence.
Given many training examples/sentences, all the words that have been met will reoccur in
multiple sentences, and so we will get increasingly less flexible with what values we are permitted to add to
words, I. E. the solution space for what values the word vectors can take
becomes narrower, and so represents the capturing of the rigour of the logic defining sensibility of
sentences. Or rather
information on the definition of what it means for a sentence to be meaningful is
contained in the word vectors of the database.
All this is a method of encoding the rules(logic) for making sentences, into the properties
of their words, by assigning values to these word properties.
The purpose of this invention can now be illustrated using one embodiment, for clarity.
It is a machine translation application that takes as input one sentence in one language.
E.g. English and translates it to another, e.g. French.
There is a certain logic needed to form sentences. [that is what is encoded in word vectors
using the above method].
There is another logic needed to relate sentences in a language, so it expresses even more. If we only had
words and no sentences then the representation of words would have literally no constraints, when we
force words to make meaningful sentences the representation of the words becomes more restricted, and
finally, when we move from having only individual sentences to groups of sentences the representation of
the words in each sentence becomes even more restricted. By restriction I mean the number of variations
in the possible Ways to set up the values of every word in relation to each other becomes smaller.
The logic needed to manipulate sentences contains more information than can be deduced
from the information that was encoded in word vectors using the above training method.
Because word vectors only encoded up to the sentence level logic and not sentence
manipulation logic. In this invention our task now becomes different. We will follow
similar reasoning as that used to create the word vectors above. But instead of encoding
into word vectors the logic required to build sentences, we will encode into them the
logic to (manipulate) translate sentences.
Just as we said there was a similarity between two statements in one language, i.e. they
were both sensible or meaningful, then given this similarity we induced the encoding of
this similarity onto the individual words. We will now use the similarity in meaning,
between, two sentences BETWEEN two languages. Then given this similarity induce the
encoding of this similarity onto the individual words. A new method is described to do this
that can be extended for other embodiments of this invention.
Also, a new method of representing the relationships between the words is presented that,
with an additional algorithm will speed up the systems convergence considerably during
training. We will choose to represent words as waveforms. And sentences as composite
waveforms made from the waveforms of the individual words. The information contained
in order of the words within a sentence will be represented as the wave of that word being added to the
sentences wave at different phases, where shifts to the right in phase of the words
waveform of a certain degree signifies its position in the sentence. During training, two
sentences will be given (from the different languages) and after some computation the
words in the sentences are assigned the particular waveform they will keep.
As training continues the frequencies contained in each word wavelength will be adjusted
to reflect an increase in information, where this information represents the disposition of
the sentences that the words in one language form to be mapped to a particular sentence
in the target language.
The main plan will take the following form.
We will arbitrarily start off with some waveform and call it “theta”. (Theta could be any
waveform, it’s just important that once we have chosen to use a particular theta we stick
with it throughout training and operation of the system. Its purpose will be demonstrated
shortly.)
Then we are given the first two statements of the training set. One in one language, the
other in another target language. Then we arbitrarily assign waveforms to the words in the target sentence,
taking care to give the exact same word if it appears
more than once in the target to give it the exact same waveform, just at a different phase.
Now we have two waveforms, theta and that of the target sentence. From these two we
subtract theta from the target waveform and assign the resultant waveform to the sentence
we have in the language different from the target, I. E. The one we are supposed to translate. Then the
system will distribute the frequencies of the waveform
assigned to this initial sentence we wish to translate, amongst its words, taking care of all
considerations, such as phase of words and multiple instances of the same word.
This assignment will be otherwise random apart from the considerations mentioned.
This is what we have done so far. We now have a waveform (theta) that if we added to
the waveform of the initial sentence we are translating, will transform it to the waveform
of the target sentence.
Let us add the second training example. We will perform the exact same thing we did in
the first training example except this will be less random, because of the considerations
we mentioned. We will only assign a random value to a word if it was not assigned a
value in a previous training set in order to derive the target waveform. And when we
distribute the frequencies of theta minus the new target wave over the words in the initial
sentence, we will keep previously assigned words the same.
We carry on this training for multiple examples of translations adjusting the values of the
words as we go. But there are more considerations.
When we have encountered all or most of the words in the language they will have
assigned frequencies, and there will be less we can do to redistribute frequencies and so
continue learning. To cater for this, we will redistribute the frequencies differently. We
will redistribute the frequencies of the initial sentence we are translating, over its words,
by adding more change to those words that were encountered less often than those that
have been encountered more often.
What this will do is that, it will make it such that the more a word has been encountered
(i.e. we have adjusted it more times) the more accurately we assume it has encoded the
information we want it to encode and the less we should change it.
When we change the value of the frequencies in one word, there will be a change in other
previous training pairs. What will happen is that the change in the previous sentence will
be redistributed among the other words in the sentence (the previous one) in such a way
as to preserve the original shape of the waveform of that sentence (so theta will still do its
job of mapping the initial statement to the target).
The new changed values will affect other sentences that have these words and so on producing a ripple
effect throughout the whole system that
would quickly reduce to noise if we did not have the rule that the more a word has been
encountered the less we change it by.
Because of this rule the system converges till the net magnitude of the changes becomes
less and less with time.
We may need to have a pause between training sets to establish convergence, where we
will typically prematurely stop the ripple after some time or after the net change in
magnitude of the word is less than an acceptable value. Also, to speed up convergence a neural net can be
employed to detect patterns such as which groups of words get adjusted in similar ways at similar
times and then the system would adapt the frequency redistribution rule to accentuate this based on the
level of similarity of the changes. This accentuation could benefit from reinforcement
techniques where the net identifies patterns and explores by extrapolating these patterns
whether or not the system converges faster or with more stability or not.
One very important point is that since both sides of theta (the initial and target
waveforms) need adjusting if we simply let them both adjust simultaneously we will
never reach convergence and the whole system of waveforms will just either oscillate or
drift randomly and we will not converge. The solution would be to subtract the target
waveform from theta, then compute or subtract the initial wave from theta, when those
results are compared we will use the one which is smaller to determine which sentences waveform to
change. It will be the remaining waveform not used in the calculation.
There may be disjoint domains where there are multiple types of translations. Where we
could choose to translate a certain sentence one way as opposed to another way. This
represents noise in the learning process and could delay convergence. To handle this
case, we will have a neural net detect the oscillations or patterns produced by this noise
and use this information in some way so we select different versions of theta to cater for the
different types of translation.
There are other embodiments of this invention where the system is given two sentences in
the same language and has a theta for extracting the information implied by both
statements. Given the method of generating new thetas for different ways of representing
the processed information, this system could be fed many statements, and perform rapidly
addition of different thetas on them their waveforms pairwise and individually to generate new sentences,
that it operates on again, it can handle the hidden Markov state reasonably well. Note that
with each new training session and type, we may have to change the waveforms of the
previous training sessions and types and so there will be a ripple that will move along the
whole combined system as the wavelengths of words change. This could be mitigated by
having a different database of words with different waveforms for each version of theta. note that during
operation the system will produce a target waveform from an initial one. The system would then analyse
this and derive words from it by separating out the frequencies that make up the new waveform.
Ultimately this system moves us closer to the creation of a literate machine.

ABSTRACT:

[0004] This invention specifies a novel method of representing words to a machine, using
waves. Sentences become composite waves and so do groups of sentences. Transforming
or extracting information to obtain one sentence from another becomes the process of
adding an additional wave here called theta, that was determined during the training
phase for a particular role. Additionally, as the machine is trained information propagates
through the systems adjusting the disposition words have to perform the role they were
trained for. During operational phase we may now apply theta to any wave represented
sentence and derive new information from it.

Patent publications:

No publication

Asking price:

Make an offer

Rate this patent

Great invention
Viewed: 1853 times

Seller:

moyo (Zimbabwe)

Ask a question
or start a negotiation

Contact the seller

Share on social media