Commit c39283d8 authored by Valentin Pelloin's avatar Valentin Pelloin
Browse files

Merge branch 'master' of git-lium.univ-lemans.fr:vpelloin/svd2vec

parents b3c591ca bfd71bb0
......@@ -11,19 +11,54 @@ wget http://mattmahoney.net/dc/text8.zip -O text8.gz
gzip -d text8.gz -f
```
```python
# Building
>>> from svd2vec import svd2vec
>>> documents = [open("text8", "r").read().split(" ")]
>>> svd_vect = svd2vec(documents, window=2, min_count=100)
>>> svd = svd2vec(documents, window=2, min_count=100)
```
```python
# I/O
>>> svd.save("svd.bin")
>>> svd = svd2vec.load("svd.bin")
```
```python
# Similarities
>>> svd.similarity("bad", "good")
# 0.4156516999158368
>>> svd.similarity("monday", "friday")
# 0.839529117681973
```
```python
# Most similar words
>>> svd.most_similar(positive=["january"], topn=2)
# [('february', 0.6854849518368631), ('october', 0.6653385092683669)]
>>> svd.most_similar(positive=['moscow', 'france'], negative=['paris'], topn=4)
# [('russia', 0.6221746629754187), ('ussr', 0.6024809889985986), ('soviet', 0.5794180517326273), ('bolsheviks', 0.5365123080505297)]
```
>>> svd_vect.analogy("paris", "france", "berlin")
```python
# Analogies
>>> svd.analogy("paris", "france", "berlin")
# [('germany', 0.6977716641680641), ...]
>>> svd_vect.analogy("road", "cars", "rail")
>>> svd.analogy("road", "cars", "rail")
# [('trains', 0.7532519174901262), ...]
>>> svd_vect.analogy("cow", "cows", "pig")
[('pigs', 0.6944101149919422), ...]
>>> svd.analogy("cow", "cows", "pig")
# [('pigs', 0.6944101149919422), ...]
>>> svd.analogy("man", "men", "woman")
# [('women', 0.7471792753875327), ...]
```
Using [Gensim](https://pypi.org/project/gensim/) you can load a `svd2vec` model using it's `word2vec` representation:
```python
>>> from gensim.models.keyedvectors import Word2VecKeyedVectors
>>> svd.save_word2vec_format("svd_word2vec_format.txt")
>>> keyed_vector = Word2VecKeyedVectors.load_word2vec_format("svd_word2vec_format.txt")
>>> keyed_vector.similarity("good", "bad")
# 0.54922897
```
---
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment