More on Fasttext for product embedding
In the previous post, I’ve introduce fasttext as a method for product embedding and discuss it advantages. Today, let’s discuss some background and interesting corollaries of using fasttext in such way.
What’s the matrix that fasttext is factorizing ?
We now know that word2vec is equivalent to matrix factorization. This matrix contains shared information between the word and the context (surrounding words). The analogy in market basket analysis is user and item. This matrix is an user-by-item matrix. Word2vec factorizes this matrix stochastically.
So what’s the matrix that fasttext is factoring ? cue Bojanowski et al.
Each word w is represented as a bag of character n-gram.
We add special boundary symbols < and > at the beginning
and end of words, allowing us to distinguish prefixes and
suffixes from other character sequences. We also include
the word w itself in the set of its n-grams, to learn a
representation for each word (in addition to character n-grams)
By this, we can think of fasttext for word embedding as word2vec with extra columns added to the matrix. Each column represents the n-grams appearing in the word. And the cell contains Pointwise Mutual Information between the context and the particular n-gram.
Example: Product_12123 ~ T-shirt + Blue Color + brand Nike + Dry Fit + ...
It gets interesting when we think of fasttext applied for users and items, when we model an item as a sum of its attributes and itself. In this sense, the cell contains the information about an user and an attribute, for example the interest of a particular brand Nike, or a particular category of product.
We can later extract these user-category association, or user-brand association information from the embedding directly, without having to train an explicit embedding for those relationships.
How about user attributes ?
Fasttext supplies the item attributes to enrich items. This could help solve the cold start problem for new items ( given that we know its attributes). What if we can supply user attributes, things like interests, and factorize them jointly ? This would help solve the cold start for users ( thinking how Spotify asks you for your music genre when signing up).
We can even extract and understand the association between generic user attributes and generic item attributes.