TensorFlow 概率介绍 - TensorFlow 博客

TensorFlow 概率介绍

2018 年 4 月 11 日

由 Josh Dillon（软件工程师）、Mike Shwe（产品经理）和 Dustin Tran（研究科学家）发布 - 代表 TensorFlow 概率团队

在 2018 年 TensorFlow 开发者峰会上，我们宣布 TensorFlow 概率：一个面向机器学习研究人员和实践者的概率编程工具箱，它可以快速可靠地构建利用最先进硬件的复杂模型。如果以下情况适用，您应该使用 TensorFlow 概率：

您希望构建数据的生成模型，对隐藏过程进行推理。
您需要量化预测中的不确定性，而不是预测单个值。
您的训练集相对于数据点的数量具有大量的特征。
您的数据是结构化的 - 例如，具有组、空间、图形或语言语义 - 并且您希望使用先验信息来捕获这种结构。
您有一个逆问题 - 请参阅此 TFDS'18 演讲，了解如何从测量结果重建聚变等离子体。

TensorFlow 概率为您提供了解决这些问题的工具。此外，它继承了 TensorFlow 的优势，例如自动微分和跨各种平台（CPU、GPU 和 TPU）扩展性能的能力。

TensorFlow 概率中有什么？

我们的一系列概率 ML 工具提供了用于在 TensorFlow 生态系统中进行概率推理和统计分析的模块化抽象。

TensorFlow 概率概述。概率编程工具箱为用户提供好处，从数据科学家和统计学家到所有 TensorFlow 用户。

第 0 层：TensorFlow。

数值运算。特别是，LinearOperator 类支持无矩阵实现，这些实现可以利用特殊结构（对角线、低秩等）进行高效计算。它由 TensorFlow 概率团队构建和维护，现在是核心 TF 中 tf.linalg 的一部分。

第 1 层：统计构建块

分布 (tf.contrib.distributions，tf.distributions)：一个包含大量概率分布和相关统计信息的集合，具有批处理和广播语义。
双射 (tf.contrib.distributions.bijectors)：随机变量的可逆且可组合的变换。双射提供了一类丰富的变换分布，从经典示例（如对数正态分布）到复杂的深度学习模型，如掩蔽自回归流。

（有关更多信息，请参阅 TensorFlow 分布 白皮书。）

第 2 层：模型构建

Edward2 (tfp.edward2)：一种概率编程语言，用于将灵活的概率模型指定为程序。
概率层 (tfp.layers)：神经网络层，对它们所代表的函数具有不确定性，扩展了 TensorFlow 层。
可训练分布 (tfp.trainable_distributions)：由单个张量参数化的概率分布，这使得构建输出概率分布的神经网络变得容易。

第 3 层：概率推理

马尔可夫链蒙特卡罗 (tfp.mcmc)：用于通过采样近似积分的算法。包括哈密顿蒙特卡罗、随机游走 Metropolis-Hastings 以及构建自定义转换核的能力。
变分推理 (tfp.vi)：用于通过优化近似积分的算法。
优化器 (tfp.optimizer)：随机优化方法，扩展了 TensorFlow 优化器。包括随机梯度朗之万动力学。
蒙特卡罗 (tfp.monte_carlo)：用于计算蒙特卡罗期望值的工具。

第 4 层：预制模型和推理（类似于 TensorFlow 的预制估计器）

贝叶斯结构时间序列（即将推出）：用于拟合时间序列模型（即类似于 R 的 BSTS 包）的高级接口。
广义线性混合模型（即将推出）：用于拟合混合效应回归模型（即类似于 R 的 lme4 包）的高级接口。

TensorFlow 概率团队致力于为用户和贡献者提供最先进的功能、持续的代码更新和错误修复。我们将继续添加端到端示例和教程。

让我们看一些例子！

使用 Edward2 的线性混合效应模型

线性混合效应模型是一种对数据中的结构化关系进行建模的简单方法。它也被称为分层线性模型，它跨数据点组共享统计强度，以便改进对任何单个数据点的推断。

作为演示，考虑来自 R 中流行的 lme4 包的 InstEval 数据集，该数据集包含大学课程及其评估评分。使用 TensorFlow 概率，我们将模型指定为 Edward2 概率程序 (tfp.edward2)，它扩展了 Edward。下面的程序根据其生成过程来具体化模型。

import tensorflow as tf
from tensorflow_probability import edward2 as ed
def model(features):
  # Set up fixed effects and other parameters.
  intercept = tf.get_variable("intercept", [])
  service_effects = tf.get_variable("service_effects", [])
  student_stddev_unconstrained = tf.get_variable(
      "student_stddev_pre", [])
  instructor_stddev_unconstrained = tf.get_variable(
      "instructor_stddev_pre", [])
  # Set up random effects.
  student_effects = ed.MultivariateNormalDiag(
      loc=tf.zeros(num_students),
      scale_identity_multiplier=tf.exp(
          student_stddev_unconstrained),
      name="student_effects")
  instructor_effects = ed.MultivariateNormalDiag(
      loc=tf.zeros(num_instructors),
      scale_identity_multiplier=tf.exp(
          instructor_stddev_unconstrained),
      name="instructor_effects")
  # Set up likelihood given fixed and random effects.
  ratings = ed.Normal(
      loc=(service_effects * features["service"] +
           tf.gather(student_effects, features["students"]) +
           tf.gather(instructor_effects, features["instructors"]) +
           intercept),
      scale=1.,
      name="ratings")
return ratings

该模型将“服务”、“学生”和“教师”的特征字典作为输入；它们是向量，其中每个元素描述一个单独的课程。该模型对这些输入进行回归，假设潜在的随机变量，并返回课程评估评分的分布。在这个输出上运行 TensorFlow 会话将返回评分的生成。

查看 ”线性混合效应模型“ 教程，了解有关如何使用 tfp.mcmc.HamiltonianMonteCarlo 算法训练模型以及如何使用后验预测来探索和解释模型的详细信息。

使用 TFP 双射的高斯 copula

一个 copula 是一个多元概率分布，其中每个变量的边际概率分布是均匀的。要使用 TFP 内部函数构建 copula，可以使用双射和 TransformedDistribution。这些抽象使得创建复杂分布变得容易，例如

import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.distributions.bijectors
# Example: Log-Normal Distribution
log_normal = tfd.TransformedDistribution(
    distribution=tfd.Normal(loc=0., scale=1.),
    bijector=tfb.Exp())
# Example: Kumaraswamy Distribution
Kumaraswamy = tfd.TransformedDistribution(
    distribution=tfd.Uniform(low=0., high=1.),
    bijector=tfb.Kumaraswamy(
        concentration1=2.,
        concentration0=2.))
# Example: Masked Autoregressive Flow
# https://arxiv.org/abs/1705.07057
shift_and_log_scale_fn = tfb.masked_autoregressive_default_template(
    hidden_layers=[512, 512],
    event_shape=[28*28])
maf = tfd.TransformedDistribution(
    distribution=tfd.Normal(loc=0., scale=1.),     
    bijector=tfb.MaskedAutoregressiveFlow(
        shift_and_log_scale_fn=shift_and_log_scale_fn))

“高斯 copula” 创建了一些自定义双射，然后展示了如何轻松构建几个不同的 copula。有关分布的更多背景信息，请参阅 “理解 TensorFlow 分布形状”。它描述了如何管理用于采样、批处理训练和建模事件的形状。

使用 TFP 实用程序的变分自动编码器

变分自动编码器是一种机器学习模型，它使用一个学习系统来在某个低维空间中表示数据，并使用第二个学习系统来将低维表示恢复为原本的输入。由于 TF 支持自动微分，黑盒变分推理变得轻而易举！例如

import tensorflow as tf
import tensorflow_probability as tfp
# Assumes user supplies `likelihood`, `prior`, `surrogate_posterior`
# functions and that each returns a 
# tf.distribution.Distribution-like object.
elbo_loss = tfp.vi.monte_carlo_csiszar_f_divergence(
    f=tfp.vi.kl_reverse,  # Equivalent to "Evidence Lower BOund"
    p_log_prob=lambda z: likelihood(z).log_prob(x) + prior().log_prob(z),
    q=surrogate_posterior(x),
    num_draws=1)
train = tf.train.AdamOptimizer(
    learning_rate=0.01).minimize(elbo_loss)

使用 TFP 概率层的贝叶斯神经网络

贝叶斯神经网络是神经网络，其权重和偏差具有先验分布。它通过这些先验提供了对其预测的改进的不确定性。贝叶斯神经网络也可以解释为无限个神经网络的集合：分配给每个神经网络配置的概率是根据先验得出的。

作为演示，考虑 CIFAR-10 数据集，它具有特征（形状为 32 x 32 x 3 的图像）和标签（从 0 到 9 的值）。为了拟合神经网络，我们将使用变分推理，这是一种用于近似神经网络对权重和偏差的后验分布的一组方法。也就是说，我们在 TensorFlow 概率层模块 (tfp.layers) 中使用最近发布的 Flipout 估计器。

import tensorflow as tf
import tensorflow_probability as tfp
model = tf.keras.Sequential([
    tf.keras.layers.Reshape([32, 32, 3]),
    tfp.layers.Convolution2DFlipout(
        64, kernel_size=5, padding='SAME', activation=tf.nn.relu),
    tf.keras.layers.MaxPooling2D(pool_size=[2, 2],
                                 strides=[2, 2],
                                 padding='SAME'),
    tf.keras.layers.Reshape([16 * 16 * 64]),
    tfp.layers.DenseFlipout(10)
])
logits = model(features)
neg_log_likelihood = tf.nn.softmax_cross_entropy_with_logits(
    labels=labels, logits=logits)
kl = sum(model.get_losses_for(inputs=None))
loss = neg_log_likelihood + kl
train_op = tf.train.AdamOptimizer().minimize(loss)

模型对象在输入张量上组合神经网络层，并且它针对概率卷积层和概率密集连接层执行随机正向传递。该函数返回一个输出张量，其形状由批次大小和 10 个值给出。此张量的每一行表示每个数据点属于 10 个类之一的 logits（无约束概率值）。

为了进行训练，我们构建损失函数，该函数包含两项：预期负对数似然和 KL 散度。我们通过蒙特卡罗近似预期负对数似然。KL 散度通过正则化项添加，这些正则化项是层的参数。

tfp.layers 也可以与使用 tf.keras.Model 类的急切执行一起使用。

class MNISTModel(tf.keras.Model):
  def __init__(self):
    super(MNISTModel, self).__init__()
    self.dense1 = tfp.layers.DenseFlipout(units=10)
    self.dense2 = tfp.layers.DenseFlipout(units=10)
  def call(self, input):
    """Run the model."""
    result = self.dense1(input)
    result = self.dense2(result)
    # reuse variables from dense2 layer
    result = self.dense2(result)  
    return result
model = MNISTModel()

入门

要开始在 TensorFlow 中使用概率机器学习，只需运行

pip install --user --upgrade tfp-nightly

有关所有代码和详细信息，请查看 github.com/tensorflow/probability。无论您是用户还是贡献者，我们都期待通过 GitHub 与您合作！

下一篇文章

TensorFlow Core · TensorFlow 概率 ·

TensorFlow 概率介绍

2018 年 4 月 11 日 — 由 Josh Dillon（软件工程师）、Mike Shwe（产品经理）和 Dustin Tran（研究科学家）代表 TensorFlow Probability 团队发布

在 2018 年 TensorFlow 开发者峰会上，我们宣布了 TensorFlow Probability：一个面向机器学习研究人员和从业者的概率编程工具箱，可以帮助他们快速可靠地构建利用最先进技术的复杂模型...