Is there a metric with wildcard?
up vote
2
down vote
favorite
Is it possible to define a metric over a set of elements $e=(x,y)$ where $x,yin {*,0,1}$, $*$ being the wildcard symbol?
For simplicity, assume all words of length 2, i.e. $0*$, $11$ and $**$.
First try was to redefine Hamming distance $d$ from
the number of positions at which the corresponding symbols are different
to
the number of positions at which the corresponding symbols are contradictory
But then for example $d(0*,11)=1$ while $d(0*,**)+d(**,11)=0$, which contradicts the triangle inequality and therefore it is not a metric.
For later generalization: I would keep fixed-length words, but use a finite alphabet that is larger than just 2. The wildcard can replace any character and I want to capture somehow the notion of $e_1$ "contradicts" $e_2$.
More formally, I want to define "equality", that is $d(e_1,e_2)=0$ as "in all positions, the symbols are either equal, or at least one of them is $*$". If this is not possible, perhaps at least have that the distance between any two words where "in all positions, the symbols are either equal, or at least one of them is $*$" is always lower than the distance between any two words where this is not the case.
EDIT:
It was suggested to use $max_i(d(a_i,b_i))$ for the distance between two words $a,b$, where
$d(a_i,b_i)=0$ iff $a_i=b_i$
$d(a_i,b_i)=frac{1}{2}$ iff $a_ineq b_i wedge(a_i= * vee b_i=*)$
$d(a_i,b_i)=1$ iff $a_ineq b_i wedge a_ineq * wedge b_ineq *$
Following the intuition of an edit distance with intermediate wildcard, one could also say it is possible to either change a symbol directly at cost of $1$ or first change it to the wildcard for $frac{1}{2}$ and then again from wildcard to the other symbol for another $frac{1}{2}$.
Then $d(a,b)=sum_i(d(a_i,b_i))$ seems nicer, but as pointed out, we end up with the problem that i.e. the distance between non-contradictory $***$ and $111$ is larger than the distance between contradictory $111$ and $110$.
Does this mean it is impossible to combine the symbol distances by summation?
At least not entirely:
$d(a,b)=0$ iff $a=b$
$d(a,b)=1$ iff $aneq b wedge (exists i: a_ineq b_i wedge a_ineq*wedge b_ineq*)$
$d(a,b)=(frac{1}{2}-frac{1}{2n})+sum_i^n frac{1}{2n}[a_i=*oplus b_i=*]$ else
general-topology metric-spaces regular-language regular-expressions
add a comment |
up vote
2
down vote
favorite
Is it possible to define a metric over a set of elements $e=(x,y)$ where $x,yin {*,0,1}$, $*$ being the wildcard symbol?
For simplicity, assume all words of length 2, i.e. $0*$, $11$ and $**$.
First try was to redefine Hamming distance $d$ from
the number of positions at which the corresponding symbols are different
to
the number of positions at which the corresponding symbols are contradictory
But then for example $d(0*,11)=1$ while $d(0*,**)+d(**,11)=0$, which contradicts the triangle inequality and therefore it is not a metric.
For later generalization: I would keep fixed-length words, but use a finite alphabet that is larger than just 2. The wildcard can replace any character and I want to capture somehow the notion of $e_1$ "contradicts" $e_2$.
More formally, I want to define "equality", that is $d(e_1,e_2)=0$ as "in all positions, the symbols are either equal, or at least one of them is $*$". If this is not possible, perhaps at least have that the distance between any two words where "in all positions, the symbols are either equal, or at least one of them is $*$" is always lower than the distance between any two words where this is not the case.
EDIT:
It was suggested to use $max_i(d(a_i,b_i))$ for the distance between two words $a,b$, where
$d(a_i,b_i)=0$ iff $a_i=b_i$
$d(a_i,b_i)=frac{1}{2}$ iff $a_ineq b_i wedge(a_i= * vee b_i=*)$
$d(a_i,b_i)=1$ iff $a_ineq b_i wedge a_ineq * wedge b_ineq *$
Following the intuition of an edit distance with intermediate wildcard, one could also say it is possible to either change a symbol directly at cost of $1$ or first change it to the wildcard for $frac{1}{2}$ and then again from wildcard to the other symbol for another $frac{1}{2}$.
Then $d(a,b)=sum_i(d(a_i,b_i))$ seems nicer, but as pointed out, we end up with the problem that i.e. the distance between non-contradictory $***$ and $111$ is larger than the distance between contradictory $111$ and $110$.
Does this mean it is impossible to combine the symbol distances by summation?
At least not entirely:
$d(a,b)=0$ iff $a=b$
$d(a,b)=1$ iff $aneq b wedge (exists i: a_ineq b_i wedge a_ineq*wedge b_ineq*)$
$d(a,b)=(frac{1}{2}-frac{1}{2n})+sum_i^n frac{1}{2n}[a_i=*oplus b_i=*]$ else
general-topology metric-spaces regular-language regular-expressions
2
can you formally list the requirements that you want for your metric? it is not clear what you want to do.
– supinf
Nov 21 at 16:07
You will always have problems with the triangle inequality, because for any two words $e_1,e_2$ with $d(e_1,e_2)>0$, you have $d(e_1,e_*)+d(e_*,e_2)=0$ where $e_*$ is the word of all $*$'s.
– Rahul
Nov 21 at 16:24
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
Is it possible to define a metric over a set of elements $e=(x,y)$ where $x,yin {*,0,1}$, $*$ being the wildcard symbol?
For simplicity, assume all words of length 2, i.e. $0*$, $11$ and $**$.
First try was to redefine Hamming distance $d$ from
the number of positions at which the corresponding symbols are different
to
the number of positions at which the corresponding symbols are contradictory
But then for example $d(0*,11)=1$ while $d(0*,**)+d(**,11)=0$, which contradicts the triangle inequality and therefore it is not a metric.
For later generalization: I would keep fixed-length words, but use a finite alphabet that is larger than just 2. The wildcard can replace any character and I want to capture somehow the notion of $e_1$ "contradicts" $e_2$.
More formally, I want to define "equality", that is $d(e_1,e_2)=0$ as "in all positions, the symbols are either equal, or at least one of them is $*$". If this is not possible, perhaps at least have that the distance between any two words where "in all positions, the symbols are either equal, or at least one of them is $*$" is always lower than the distance between any two words where this is not the case.
EDIT:
It was suggested to use $max_i(d(a_i,b_i))$ for the distance between two words $a,b$, where
$d(a_i,b_i)=0$ iff $a_i=b_i$
$d(a_i,b_i)=frac{1}{2}$ iff $a_ineq b_i wedge(a_i= * vee b_i=*)$
$d(a_i,b_i)=1$ iff $a_ineq b_i wedge a_ineq * wedge b_ineq *$
Following the intuition of an edit distance with intermediate wildcard, one could also say it is possible to either change a symbol directly at cost of $1$ or first change it to the wildcard for $frac{1}{2}$ and then again from wildcard to the other symbol for another $frac{1}{2}$.
Then $d(a,b)=sum_i(d(a_i,b_i))$ seems nicer, but as pointed out, we end up with the problem that i.e. the distance between non-contradictory $***$ and $111$ is larger than the distance between contradictory $111$ and $110$.
Does this mean it is impossible to combine the symbol distances by summation?
At least not entirely:
$d(a,b)=0$ iff $a=b$
$d(a,b)=1$ iff $aneq b wedge (exists i: a_ineq b_i wedge a_ineq*wedge b_ineq*)$
$d(a,b)=(frac{1}{2}-frac{1}{2n})+sum_i^n frac{1}{2n}[a_i=*oplus b_i=*]$ else
general-topology metric-spaces regular-language regular-expressions
Is it possible to define a metric over a set of elements $e=(x,y)$ where $x,yin {*,0,1}$, $*$ being the wildcard symbol?
For simplicity, assume all words of length 2, i.e. $0*$, $11$ and $**$.
First try was to redefine Hamming distance $d$ from
the number of positions at which the corresponding symbols are different
to
the number of positions at which the corresponding symbols are contradictory
But then for example $d(0*,11)=1$ while $d(0*,**)+d(**,11)=0$, which contradicts the triangle inequality and therefore it is not a metric.
For later generalization: I would keep fixed-length words, but use a finite alphabet that is larger than just 2. The wildcard can replace any character and I want to capture somehow the notion of $e_1$ "contradicts" $e_2$.
More formally, I want to define "equality", that is $d(e_1,e_2)=0$ as "in all positions, the symbols are either equal, or at least one of them is $*$". If this is not possible, perhaps at least have that the distance between any two words where "in all positions, the symbols are either equal, or at least one of them is $*$" is always lower than the distance between any two words where this is not the case.
EDIT:
It was suggested to use $max_i(d(a_i,b_i))$ for the distance between two words $a,b$, where
$d(a_i,b_i)=0$ iff $a_i=b_i$
$d(a_i,b_i)=frac{1}{2}$ iff $a_ineq b_i wedge(a_i= * vee b_i=*)$
$d(a_i,b_i)=1$ iff $a_ineq b_i wedge a_ineq * wedge b_ineq *$
Following the intuition of an edit distance with intermediate wildcard, one could also say it is possible to either change a symbol directly at cost of $1$ or first change it to the wildcard for $frac{1}{2}$ and then again from wildcard to the other symbol for another $frac{1}{2}$.
Then $d(a,b)=sum_i(d(a_i,b_i))$ seems nicer, but as pointed out, we end up with the problem that i.e. the distance between non-contradictory $***$ and $111$ is larger than the distance between contradictory $111$ and $110$.
Does this mean it is impossible to combine the symbol distances by summation?
At least not entirely:
$d(a,b)=0$ iff $a=b$
$d(a,b)=1$ iff $aneq b wedge (exists i: a_ineq b_i wedge a_ineq*wedge b_ineq*)$
$d(a,b)=(frac{1}{2}-frac{1}{2n})+sum_i^n frac{1}{2n}[a_i=*oplus b_i=*]$ else
general-topology metric-spaces regular-language regular-expressions
general-topology metric-spaces regular-language regular-expressions
edited Nov 28 at 10:39
asked Nov 21 at 15:59
Radio Controlled
153
153
2
can you formally list the requirements that you want for your metric? it is not clear what you want to do.
– supinf
Nov 21 at 16:07
You will always have problems with the triangle inequality, because for any two words $e_1,e_2$ with $d(e_1,e_2)>0$, you have $d(e_1,e_*)+d(e_*,e_2)=0$ where $e_*$ is the word of all $*$'s.
– Rahul
Nov 21 at 16:24
add a comment |
2
can you formally list the requirements that you want for your metric? it is not clear what you want to do.
– supinf
Nov 21 at 16:07
You will always have problems with the triangle inequality, because for any two words $e_1,e_2$ with $d(e_1,e_2)>0$, you have $d(e_1,e_*)+d(e_*,e_2)=0$ where $e_*$ is the word of all $*$'s.
– Rahul
Nov 21 at 16:24
2
2
can you formally list the requirements that you want for your metric? it is not clear what you want to do.
– supinf
Nov 21 at 16:07
can you formally list the requirements that you want for your metric? it is not clear what you want to do.
– supinf
Nov 21 at 16:07
You will always have problems with the triangle inequality, because for any two words $e_1,e_2$ with $d(e_1,e_2)>0$, you have $d(e_1,e_*)+d(e_*,e_2)=0$ where $e_*$ is the word of all $*$'s.
– Rahul
Nov 21 at 16:24
You will always have problems with the triangle inequality, because for any two words $e_1,e_2$ with $d(e_1,e_2)>0$, you have $d(e_1,e_*)+d(e_*,e_2)=0$ where $e_*$ is the word of all $*$'s.
– Rahul
Nov 21 at 16:24
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
accepted
I want to define "equality", that is $d(e_1,e_2)=0$ as "in all positions, the symbols are either equal, or at least one of them is $*$"
As Rahul mentioned, this is not possible due to the triangle inequality.
If this is not possible, perhaps at least have that the distance between any two words where "in all positions, the symbols are either equal, or at least one of them is $*$" is always lower than the distance between any two words where this is not the case.
This however should be possible.
We can define the metric on the set ${0,1,*}$ via
$$
d(0,1)=1,d(0,*)=frac12,d(1,*)=frac12.
$$
This can even be generalized to words of fixed length $n$:
Let $w_x = x_1x_2dots x_n$ and $w_y = y_1y_2dots y_n$ be words.
Then we can define
$$
d'(w_x,w_y) = max_{i=1,dots n} d(x_i,y_i)
$$
using the metric from above.
This does satisfy the requirements, and words $w_x,w_y$ are "equal" (in the sense that you formulated) if and only if $d'(x,y)leqfrac12$.
1
I'm not sure $d'$ satisfies the requirements as I understand them. Let $w_1=0000,w_2={*}{*}{*}{*},w_3=0001$. Then $w_1$ and $w_2$ are two words where "in all positions, the symbols are either equal, or at least one of them is $*$", and $w_1$ and $w_3$ are two words where this is not the case. So we want $d'(w_1,w_2) < d'(w_1,w_3)$, but that is false.
– Rahul
Nov 21 at 16:50
good point! I corrected it.
– supinf
Nov 21 at 16:54
Thanks guys, let's see where this leads!
– Radio Controlled
Nov 22 at 11:05
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
I want to define "equality", that is $d(e_1,e_2)=0$ as "in all positions, the symbols are either equal, or at least one of them is $*$"
As Rahul mentioned, this is not possible due to the triangle inequality.
If this is not possible, perhaps at least have that the distance between any two words where "in all positions, the symbols are either equal, or at least one of them is $*$" is always lower than the distance between any two words where this is not the case.
This however should be possible.
We can define the metric on the set ${0,1,*}$ via
$$
d(0,1)=1,d(0,*)=frac12,d(1,*)=frac12.
$$
This can even be generalized to words of fixed length $n$:
Let $w_x = x_1x_2dots x_n$ and $w_y = y_1y_2dots y_n$ be words.
Then we can define
$$
d'(w_x,w_y) = max_{i=1,dots n} d(x_i,y_i)
$$
using the metric from above.
This does satisfy the requirements, and words $w_x,w_y$ are "equal" (in the sense that you formulated) if and only if $d'(x,y)leqfrac12$.
1
I'm not sure $d'$ satisfies the requirements as I understand them. Let $w_1=0000,w_2={*}{*}{*}{*},w_3=0001$. Then $w_1$ and $w_2$ are two words where "in all positions, the symbols are either equal, or at least one of them is $*$", and $w_1$ and $w_3$ are two words where this is not the case. So we want $d'(w_1,w_2) < d'(w_1,w_3)$, but that is false.
– Rahul
Nov 21 at 16:50
good point! I corrected it.
– supinf
Nov 21 at 16:54
Thanks guys, let's see where this leads!
– Radio Controlled
Nov 22 at 11:05
add a comment |
up vote
0
down vote
accepted
I want to define "equality", that is $d(e_1,e_2)=0$ as "in all positions, the symbols are either equal, or at least one of them is $*$"
As Rahul mentioned, this is not possible due to the triangle inequality.
If this is not possible, perhaps at least have that the distance between any two words where "in all positions, the symbols are either equal, or at least one of them is $*$" is always lower than the distance between any two words where this is not the case.
This however should be possible.
We can define the metric on the set ${0,1,*}$ via
$$
d(0,1)=1,d(0,*)=frac12,d(1,*)=frac12.
$$
This can even be generalized to words of fixed length $n$:
Let $w_x = x_1x_2dots x_n$ and $w_y = y_1y_2dots y_n$ be words.
Then we can define
$$
d'(w_x,w_y) = max_{i=1,dots n} d(x_i,y_i)
$$
using the metric from above.
This does satisfy the requirements, and words $w_x,w_y$ are "equal" (in the sense that you formulated) if and only if $d'(x,y)leqfrac12$.
1
I'm not sure $d'$ satisfies the requirements as I understand them. Let $w_1=0000,w_2={*}{*}{*}{*},w_3=0001$. Then $w_1$ and $w_2$ are two words where "in all positions, the symbols are either equal, or at least one of them is $*$", and $w_1$ and $w_3$ are two words where this is not the case. So we want $d'(w_1,w_2) < d'(w_1,w_3)$, but that is false.
– Rahul
Nov 21 at 16:50
good point! I corrected it.
– supinf
Nov 21 at 16:54
Thanks guys, let's see where this leads!
– Radio Controlled
Nov 22 at 11:05
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
I want to define "equality", that is $d(e_1,e_2)=0$ as "in all positions, the symbols are either equal, or at least one of them is $*$"
As Rahul mentioned, this is not possible due to the triangle inequality.
If this is not possible, perhaps at least have that the distance between any two words where "in all positions, the symbols are either equal, or at least one of them is $*$" is always lower than the distance between any two words where this is not the case.
This however should be possible.
We can define the metric on the set ${0,1,*}$ via
$$
d(0,1)=1,d(0,*)=frac12,d(1,*)=frac12.
$$
This can even be generalized to words of fixed length $n$:
Let $w_x = x_1x_2dots x_n$ and $w_y = y_1y_2dots y_n$ be words.
Then we can define
$$
d'(w_x,w_y) = max_{i=1,dots n} d(x_i,y_i)
$$
using the metric from above.
This does satisfy the requirements, and words $w_x,w_y$ are "equal" (in the sense that you formulated) if and only if $d'(x,y)leqfrac12$.
I want to define "equality", that is $d(e_1,e_2)=0$ as "in all positions, the symbols are either equal, or at least one of them is $*$"
As Rahul mentioned, this is not possible due to the triangle inequality.
If this is not possible, perhaps at least have that the distance between any two words where "in all positions, the symbols are either equal, or at least one of them is $*$" is always lower than the distance between any two words where this is not the case.
This however should be possible.
We can define the metric on the set ${0,1,*}$ via
$$
d(0,1)=1,d(0,*)=frac12,d(1,*)=frac12.
$$
This can even be generalized to words of fixed length $n$:
Let $w_x = x_1x_2dots x_n$ and $w_y = y_1y_2dots y_n$ be words.
Then we can define
$$
d'(w_x,w_y) = max_{i=1,dots n} d(x_i,y_i)
$$
using the metric from above.
This does satisfy the requirements, and words $w_x,w_y$ are "equal" (in the sense that you formulated) if and only if $d'(x,y)leqfrac12$.
edited Nov 21 at 16:53
answered Nov 21 at 16:40
supinf
5,8481027
5,8481027
1
I'm not sure $d'$ satisfies the requirements as I understand them. Let $w_1=0000,w_2={*}{*}{*}{*},w_3=0001$. Then $w_1$ and $w_2$ are two words where "in all positions, the symbols are either equal, or at least one of them is $*$", and $w_1$ and $w_3$ are two words where this is not the case. So we want $d'(w_1,w_2) < d'(w_1,w_3)$, but that is false.
– Rahul
Nov 21 at 16:50
good point! I corrected it.
– supinf
Nov 21 at 16:54
Thanks guys, let's see where this leads!
– Radio Controlled
Nov 22 at 11:05
add a comment |
1
I'm not sure $d'$ satisfies the requirements as I understand them. Let $w_1=0000,w_2={*}{*}{*}{*},w_3=0001$. Then $w_1$ and $w_2$ are two words where "in all positions, the symbols are either equal, or at least one of them is $*$", and $w_1$ and $w_3$ are two words where this is not the case. So we want $d'(w_1,w_2) < d'(w_1,w_3)$, but that is false.
– Rahul
Nov 21 at 16:50
good point! I corrected it.
– supinf
Nov 21 at 16:54
Thanks guys, let's see where this leads!
– Radio Controlled
Nov 22 at 11:05
1
1
I'm not sure $d'$ satisfies the requirements as I understand them. Let $w_1=0000,w_2={*}{*}{*}{*},w_3=0001$. Then $w_1$ and $w_2$ are two words where "in all positions, the symbols are either equal, or at least one of them is $*$", and $w_1$ and $w_3$ are two words where this is not the case. So we want $d'(w_1,w_2) < d'(w_1,w_3)$, but that is false.
– Rahul
Nov 21 at 16:50
I'm not sure $d'$ satisfies the requirements as I understand them. Let $w_1=0000,w_2={*}{*}{*}{*},w_3=0001$. Then $w_1$ and $w_2$ are two words where "in all positions, the symbols are either equal, or at least one of them is $*$", and $w_1$ and $w_3$ are two words where this is not the case. So we want $d'(w_1,w_2) < d'(w_1,w_3)$, but that is false.
– Rahul
Nov 21 at 16:50
good point! I corrected it.
– supinf
Nov 21 at 16:54
good point! I corrected it.
– supinf
Nov 21 at 16:54
Thanks guys, let's see where this leads!
– Radio Controlled
Nov 22 at 11:05
Thanks guys, let's see where this leads!
– Radio Controlled
Nov 22 at 11:05
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3007919%2fis-there-a-metric-with-wildcard%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
can you formally list the requirements that you want for your metric? it is not clear what you want to do.
– supinf
Nov 21 at 16:07
You will always have problems with the triangle inequality, because for any two words $e_1,e_2$ with $d(e_1,e_2)>0$, you have $d(e_1,e_*)+d(e_*,e_2)=0$ where $e_*$ is the word of all $*$'s.
– Rahul
Nov 21 at 16:24