阅读(10.3k) 书签赞(0)

在TensorFlow中创建一个BigQueryReader

2018-08-30 10:58 更新

tf.contrib.cloud.BigQueryReader

tf.contrib.cloud.BigQueryReader 类

定义在：tensorflow/contrib/cloud/python/ops/bigquery_reader_ops.py.

一个从 BigQuery 表输出键和 tf.Example 值的读取器.

使用示例：

# Assume a BigQuery has the following schema,
#     name      STRING,
#     age       INT,
#     state     STRING
# Create the parse_examples list of features.
features = dict(
  name=tf.FixedLenFeature([1], tf.string),
  age=tf.FixedLenFeature([1], tf.int32),
  state=tf.FixedLenFeature([1], dtype=tf.string, default_value="UNK"))
# Create a Reader.
reader = bigquery_reader_ops.BigQueryReader(project_id=PROJECT,
                                            dataset_id=DATASET,
                                            table_id=TABLE,
                                            timestamp_millis=TIME,
                                            num_partitions=NUM_PARTITIONS,
                                            features=features)
# Populate a queue with the BigQuery Table partitions.
queue = tf.training.string_input_producer(reader.partitions())
# Read and parse examples.
row_id, examples_serialized = reader.read(queue)
examples = tf.parse_example(examples_serialized, features=features)
# Process the Tensors examples["name"], examples["age"], etc...

请注意,创建读取器时需要快照时间戳.这将使读者能够查看表的一致快照.有关更多信息,请参阅 BigQuery 文档中的 “Table Decorators”.

有关支持的方法,请参阅 ReaderBase.

属性

reader_ref

实现读取器的操作.

supports_serialize

Reader 实现是否可以序列化其状态.

方法

init

__init__ (
     project_id ,
     dataset_id ,
     table_id ,
     timestamp_millis ,
     num_partitions ,
     features = None ,
     columns = None ,
     test_end_point = None ,
     name = None
 )

创建一个 BigQueryReader.

ARGS：

project_id：GCP 项目 id.
dataset_id：BigQuery 数据集 id.
table_id：BigQuery 表 id.
timestamp_millis：时间戳以毫秒为单位对表进行快照.不允许使用相对 (负数或零) 快照时间.有关详细信息, 请参阅 BigQuery 文档中的 "Table Decorators".
num_partitions：要读取的非重叠分区的数目.
features：parse_example 兼容字典从键到 VarLenFeature 和 FixedLenFeature 对象.键从数据库读取为列.
columns：要读取的列的列表,当其特性为 None 时可以被设置.
test_end_point：仅用于测试目的(可选).
name：操作的名称(可选).

注意：

TypeError：出现的情况：如果 feature 既不是 None 也不是 dict；如果列既不是 None 也不是列表；如果特征和列都为 None 或被设置.

num_records_produced

num_records_produced ( name = None )

返回此读取器生成的记录数.
这与已成功读取的执行次数相同.

ARGS：

name：操作的名称(可选).

一个 int64 张量.

num_work_units_completed

num_work_units_completed ( name = None )

返回读取器已完成处理的工作单元数.

ARGS：

name：操作的名称(可选).

一个 int64 张量.

partitions

partitions( name = None)

返回序列化的 BigQueryTablePartition 消息.

这些消息表示大容量读取的表的不重叠分区.

ARGS：

name：操作的名称(可选).

序列化 BigQueryTablePartition 消息的1维字符串张量.

read

read (
     queue ,
     name = None
 )

返回读取器生成的下一个记录(键,值对).

如果需要,将从队列中出现一个工作单元(例如,当读取器需要从一个新文件开始读取,因为它已经完成了上一个文件).

ARGS：

queue：表示队列句柄的队列或可变的字符串张量, 带有字符串工作项.
name：操作的名称(可选).

张量的元组(key,value). key: 一个字符串标量张量； value: 一个字符串标量张量.

read_up_to

read_up_to (
     queue ,
     num_records ,
     name = None
 )

返回由读者生成的 num_records(键,值对).

如果需要,将从队列中出现一个工作单元(例如,当 Reader 需要从新文件开始读取,因为它已经完成了上一个文件).即使在最后一批之前,它也可能比 num_record 返回的少.

ARGS：

queue：队列或可变的字符串张量,表示队列的句柄,带有字符串工作项.
num_records：要读取的记录数.
name：操作的名称(可选).

张量的元组 (key,value). key: 1 维字符串张量. value: 1 维字符串张量.

reset

reset ( name = None )

将读取器恢复到初始状态.

ARGS：

name：操作的名称(可选).

创建的操作.

restore_state

restore_state (
     state ,
     name = None
 )

将读取器还原到先前保存的状态.

并非所有读者都支持恢复,所以这可能会产生未实现的错误.

ARGS：

state：字符串张量.具有匹配类型的读取器的 SerializeState 的结果.
name：操作的名称(可选).

创建的操作.

serialize_state

serialize_state ( name = None )

生成用于对读取器状态进行编码的字符串张量.
并非所有的读取器都支持序列化, 因此这可能产生未实现的错误.

ARGS：

name：操作的名称(可选).

返回字符串张量.

← TensorFlow 变分推理操作

TensorFlow中copy的使用方法 →

在TensorFlow中创建一个BigQueryReader

tf.contrib.cloud.BigQueryReader

tf.contrib.cloud.BigQueryReader 类

属性

reader_ref

supports_serialize

方法

init

ARGS：

注意：

num_records_produced

ARGS：

返回：

num_work_units_completed

ARGS：

返回：

partitions

ARGS：

返回：

read

ARGS：

返回：

read_up_to

ARGS：

返回：

reset

ARGS：

返回：

restore_state

ARGS：

返回：

serialize_state

ARGS：

返回：