Goal
Simplest embedding problem using Tensorflow Inspiried from this: http://matpalm.com/blog/2015/03/28/theano_word_embeddings/
The Setup
- Predict whether two objects (o and p) are "compatible" / should occur together or not. - Embed N arbitrary objects identified by integers into 2d (x,y) (so I can plot them) - Try learning a simple linear ordering "compatibility" function: - Make adjacent pairs compatible Prob(o,p)=1.0 : (o,p) = (0,1), (1,2), (4,5), ... - Make all others incompatible Prob(o,p)=0.0 : (o,p) = (0,2), (3,5), ... - Assume pairs are ordered so (o,p) -> p>o - Try a very simple learning network: Input: (o, p) (object integers) Embed: embedding(o) = 2d Computation: f(o,p) = single linear layer. input = [embedding(x), embedding(y)] output= logit for [1-P, P] where P is probability o,p are compatible Output: softmax of network output The key here is that when learning, gradients can be pushed back to the embedding vectors so they can be updated. It is _automatic_ in frameworks like Tensorflow. Source Code: https://github.com/michaelbrownid/simplest-embedding or simply simplest-embedding.python.txt The relevant parts are quite simple:
# input batchSize by two integer object numbers
inputData = tf.placeholder(tf.int32, [None,2])
# target is batchSize probability of compatible: [1-Prob(compat), Prob(compat)]
target = tf.placeholder(tf.float32, [None,2])
# this is the matrix of embedding vectors. rows=objects. cols=embedding. random [-1,1]
embedding = tf.Variable(tf.random_uniform([numObjects, embeddingSize], -1.0, 1.0))
# pull out the embedding vectors from integer ids
emap = tf.nn.embedding_lookup(embedding, inputData)
# reshape to feed pair of embedding points as single vector per datapoint
inputs = tf.reshape(emap , [-1, 2*embeddingSize])
# Set model to linear function to [1-p,p] 2d prediction vector
softmaxW = tf.Variable(tf.random_normal([2*embeddingSize,2],stddev=0.1), name="softmaxW")
softmaxB = tf.Variable(tf.random_normal([1,2], stddev=0.1), name="softmaxB")
logits = tf.matmul(inputs, softmaxW ) + softmaxB
probs = tf.nn.softmax(logits)
# cross entropy loss = \sum( -log(P prediction) of correct label)/N. Jensen's E(logX) < = logE(X).
cross = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits, target))
# optimize
optimizer = tf.train.AdamOptimizer(learningRate).minimize(cross)
Note this is interesting because we will estimate both the embedding
and the linear discriminant function on the embedding simultaneously.
TODO: eliminate the linear function and use dot product with no
learned parameters.
================================
RESULT
Animate the embeddings and the decision boundries for the first 4 points (colors=black, red, green, blue) every 3 epochs of optimization for 1000 epochs total.
What experiment do you want to see?
Coverged1NotQuite
Converged2
current / 333
frame step
frame delay Smaller for faster animation!
Animate!


