阅读(10.4k) 书签 (0)

计算TensorFlow序列之间的编辑距离

2018-10-12 16:38 更新

tf.edit_distance

edit_distance ( 
    hypothesis , 
    truth , 
    normalize = True , 
    name = 'edit_distance' 
)

定义在:tensorflow/python/ops/array_ops.py.

参见指南:数学函数>序列比较和索引

计算序列之间的编辑距离.

该操作采用可变长度序列(假设(hypothesis)和真值(truth)),每个序列都提供 SparseTensor,并计算编辑距离.通过将规范化设置为 true, 可以将编辑距离正常化.

例如,给出以下输入:

# 'hypothesis' is a tensor of shape `[2, 1]` with variable-length values:
#   (0,0) = ["a"]
#   (1,0) = ["b"]
hypothesis = tf.SparseTensor(
    [[0, 0, 0],
     [1, 0, 0]],
    ["a", "b"]
    (2, 1, 1))

# 'truth' is a tensor of shape `[2, 2]` with variable-length values:
#   (0,0) = []
#   (0,1) = ["a"]
#   (1,0) = ["b", "c"]
#   (1,1) = ["a"]
truth = tf.SparseTensor(
    [[0, 1, 0],
     [1, 0, 0],
     [1, 0, 1],
     [1, 1, 0]]
    ["a", "b", "c", "a"],
    (2, 2, 2))

normalize = True

此操作将返回以下内容:

# 'output' is a tensor of shape `[2, 2]` with edit distances normalized
# by 'truth' lengths.
output ==> [[inf, 1.0],  # (0,0): no truth, (0,1): no hypothesis
           [0.5, 1.0]]  # (1,0): addition, (1,1): no hypothesis

ARGS:

  • hypothesis:SparseTensor 含有假设序列.
  • truth:一个 SparseTensor 含有真值序列.
  • normalize:一个布尔值.如果为 True,将编辑的距离正常化为真值的长度.
  • name:操作的名称(可选).

返回:

返回秩为 R - 1 的稠密 Tensor,其中 R 是 SparseTensor 输入 hypothesis(假设) 和 truth(真值) 的秩.

注意:

  • TypeError:如果任何一个 hypothesis(假设) 和 truth(真值) 不是一个SparseTensor.