Skip to content
GitLab
Menu
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Valentin Pelloin
svd2vec
Commits
13e5878a
Commit
13e5878a
authored
Jun 03, 2019
by
Valentin Pelloin
Browse files
confidence interval for similarity evaluation
parent
66ada874
Changes
2
Hide whitespace changes
Inline
Side-by-side
svd2vec/core.py
View file @
13e5878a
...
...
@@ -627,8 +627,8 @@ class svd2vec:
msim
=
self
.
similarity
(
w1
,
w2
)
x
.
append
(
hsim
)
y
.
append
(
msim
)
pearson
=
pearson
r
(
np
.
array
(
x
),
np
.
array
(
y
))
return
pearson
pearson
,
p_value
,
low
,
high
=
Utils
.
confidence_
pearson
(
np
.
array
(
x
),
np
.
array
(
y
))
return
pearson
,
p_value
,
(
low
,
high
)
def
evaluate_word_analogies
(
self
,
analogies
,
section_separator
=
":"
):
...
...
svd2vec/utils.py
View file @
13e5878a
...
...
@@ -2,6 +2,7 @@
import
sys
import
random
import
numpy
as
np
from
scipy
import
stats
import
cProfile
import
pstats
...
...
@@ -85,3 +86,13 @@ class Utils:
return
result
return
profiled_func
def
confidence_pearson
(
x
,
y
,
alpha
=
0.05
):
# thanks to https://zhiyzuo.github.io/Pearson-Correlation-CI-in-Python/
r
,
p
=
stats
.
pearsonr
(
x
,
y
)
r_z
=
np
.
arctanh
(
r
)
se
=
1
/
np
.
sqrt
(
x
.
size
-
3
)
z
=
stats
.
norm
.
ppf
(
1
-
alpha
/
2
)
lo_z
,
hi_z
=
r_z
-
z
*
se
,
r_z
+
z
*
se
lo
,
hi
=
np
.
tanh
((
lo_z
,
hi_z
))
return
lo
,
hi
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment