用张量流进行线性回归

Linear regression with tensorflow

我试图理解线性回归...这是我试图理解的脚本:

'''
A linear regression learning algorithm example using TensorFlow library.
Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''

from __future__ import print_function

import tensorflow as tf
from numpy import *
import numpy
import matplotlib.pyplot as plt
rng = numpy.random

# Parameters
learning_rate = 0.0001
training_epochs = 1000
display_step = 50

# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                         7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                         2.827,3.465,1.65,2.904,2.42,2.94,1.3])

train_X=numpy.asarray(train_X)
train_Y=numpy.asarray(train_Y)
n_samples = train_X.shape[0]


# tf Graph Input
X = tf.placeholder("float")
Y = tf.placeholder("float")

# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")

# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)


# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Fit all training data
    for epoch in range(training_epochs):
        for (x, y) in zip(train_X, train_Y):
            sess.run(optimizer, feed_dict={X: x, Y: y})

        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            c = sess.run(cost, feed_dict={X: train_X, Y:train_Y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
                "W=", sess.run(W), "b=", sess.run(b))

    print("Optimization Finished!")
    training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
    print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')

    # Graphic display
    plt.plot(train_X, train_Y, 'ro', label='Original data')
    plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line')
    plt.legend()
    plt.show()

问题是这部分代表什么:

# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")

为什么会有随机浮点数?

你还能告诉我一些数学公式吗?形式化表示成本、预测、优化器变量?

Variables allow us to add trainable parameters to a graph. They are constructed with a type and initial value:

W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W * x + b

类型为tf.Variable的变量是我们将学习使用TensorFlow的参数。假设您使用 gradient descent 来最小化损失函数。您需要先初始化这些参数。 rng.randn() 用于为此目的生成随机值。

我认为 Getting Started With TensorFlow 对你来说是一个很好的起点。

我先定义变量:

W is a multidimensional line that spans R^d (same dimensionality as X)
b is a scalar value (bias) 
Y is also a scalar value i.e. the value at X

pred = W (dot) X + b   # dot here refers to dot product

# cost equals the average squared error
cost = ((pred - Y)^2) / 2*num_samples

#finally optimizer
# optimizer computes the gradient with respect to each variable and the update

W += learning_rate * (pred - Y)/num_samples * X 
b += learning_rate * (pred - Y)/num_samples 

为什么将 W 和 b 设置为随机,这很好地根据成本计算的误差的梯度进行了更新,因此 W 和 b 可以初始化为任何值。它不是通过最小二乘法执行线性回归,尽管两者都会收敛到相同的解决方案。

查看此处了解更多信息:Getting Started

让我们尝试将一些直觉和来源与 tf 方法结合起来。

一般直觉:

这里介绍的回归是一个监督学习问题。其中,按照Russel&Norvig的Artificial Intelligence中的定义,任务是:

given a training set (X, y) of m input-output pairs (x1, y1), (x2, y2), ... , (xm, ym), where each output was generated by an unknown function y = f(x), discover a function h that approximates the true function f

为此,h hypothesis 函数以某种方式将每个 x 与 to-be-learned 参数结合起来,以获得输出尽可能接近相应的 y,这适用于整个数据集。希望结果函数将接近 f.

但是如何学习这个参数呢? in order to be able to learn, the model has to be able to evaluate. Here comes the cost (also called loss, energy, merit...) function to play: it is a metric functionh 的输出与相应的 y 进行比较,而 惩罚较大的差异 .

现在应该清楚这里的"learning"过程到底是什么了:改变参数以获得较低的成本函数值

线性回归:

您发布的示例执行参数线性回归,基于均值使用梯度下降优化平方误差 作为成本函数。这意味着:

  • Parametric:参数集固定。在整个学习过程中,它们都保存在完全相同的内存占位符中。

  • 线性h的输出仅仅是一个线性(实际上,仿射)组合在输入 x 和您的参数之间。所以如果xw是real-valued个同维向量,b是实数,则成立h(x,w, b)= w.transposed()*x+bDeep Learning Book 的第 107 页对此提出了更多高质量的见解和直觉。

  • 成本函数:现在这是有趣的部分。平均平方误差是一个 函数。这意味着它有一个单一的全局最优解,而且,它可以直接用一组 正规方程 找到(在 DLB 中也有解释)。在您的示例中,使用了随机(and/or 小批量)梯度下降法:这是优化 non-convex 成本函数时的首选方法(在神经网络等更高级的模型中就是这种情况)或者当您的数据集具有巨大的维度时(在 DLB 中也有解释)。

  • Gradient descent: tf为你解决了这个问题,所以可以说GD通过跟随其导数来最小化成本函数"downwards",小步前进,直到到达鞍点。 如果您完全需要知道,TF 应用的确切技术称为 automatic differentiation,是数字和符号方法之间的一种折衷 。对于像你这样的凸函数,这一点将是全局最优值,并且(如果你的学习率不是太大)它总是会收敛到它,所以你用哪个值初始化你的变量并不重要。在神经网络等更复杂的架构中,随机初始化是必要的。关于 minibatches 的管理还有一些额外的代码,但我不会深入讨论,因为它不是你问题的主要焦点。

TensorFlow 方法:

如今的深度学习框架是通过构建计算图来嵌套大量函数(如果您还没有看过 presentation on DL frameworks that I did some weeks ago). For constructing and running the graph, TensoFlow follows a declarative style, which means that the graph has to be first completely defined and compiled, before it is deployed and executed. It is very reccommended to read this 简短的 wiki 文章,您可能想看看。在这种情况下,设置分为两部分:

  1. 首先,定义计算 Graph,将数据集和参数放在内存占位符中,定义基于它们的假设和成本函数,然后告诉 tf应用哪种优化技术。

  2. 然后您 运行 在 Session 中进行计算,库将能够(重新)加载数据占位符并执行优化。

代码:

示例代码严格遵循此方法:

  1. 定义测试数据X和标签Y,并在Graph中为它们准备一个占位符(在feed_dict部分提供)。

  2. 定义参数的 'W' 和 'b' 占位符。它们必须是 Variables 因为它们将在 Session.

  3. 期间更新
  4. 如前所述定义pred(我们的假设)和cost


至此,剩下的代码应该就比较清楚了。关于优化器,正如我所说,tf 已经知道如何处理这个问题,但您可能需要研究梯度下降以获取更多详细信息(同样,DLB 是一个很好的参考)

干杯! 安德烈斯


代码示例:梯度下降 VS。正规方程

这个小片段生成简单的 multi-dimensional 数据集并测试这两种方法。请注意 normal equatons 方法不需要循环,并带来更好的结果。对于小维度 (DIMENSIONS<30k) 可能是首选方法:

from __future__ import absolute_import, division, print_function
import numpy as np
import tensorflow as tf

####################################################################################################
### GLOBALS
####################################################################################################
DIMENSIONS = 5
f = lambda(x): sum(x) # the "true" function: f = 0 + 1*x1 + 1*x2 + 1*x3 ...
noise = lambda: np.random.normal(0,10) # some noise

####################################################################################################
### GRADIENT DESCENT APPROACH
####################################################################################################
# dataset globals
DS_SIZE = 5000
TRAIN_RATIO = 0.6 # 60% of the dataset is used for training
_train_size = int(DS_SIZE*TRAIN_RATIO)
_test_size = DS_SIZE - _train_size
ALPHA = 1e-8 # learning rate
LAMBDA = 0.5 # L2 regularization factor
TRAINING_STEPS = 1000

# generate the dataset, the labels and split into train/test
ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)] # synthesize data
# ds = normalize_data(ds)
ds = [(x, [f(x)+noise()]) for x in ds] # add labels
np.random.shuffle(ds)
train_data, train_labels = zip(*ds[0:_train_size])
test_data, test_labels = zip(*ds[_train_size:])

# define the computational graph
graph = tf.Graph()
with graph.as_default():
  # declare graph inputs
  x_train = tf.placeholder(tf.float32, shape=(_train_size, DIMENSIONS))
  y_train = tf.placeholder(tf.float32, shape=(_train_size, 1))
  x_test = tf.placeholder(tf.float32, shape=(_test_size, DIMENSIONS))
  y_test = tf.placeholder(tf.float32, shape=(_test_size, 1))
  theta = tf.Variable([[0.0] for _ in range(DIMENSIONS)])
  theta_0 = tf.Variable([[0.0]]) # don't forget the bias term!
  # forward propagation
  train_prediction = tf.matmul(x_train, theta)+theta_0
  test_prediction  = tf.matmul(x_test, theta) +theta_0
  # cost function and optimizer
  train_cost = (tf.nn.l2_loss(train_prediction - y_train)+LAMBDA*tf.nn.l2_loss(theta))/float(_train_size)
  optimizer = tf.train.GradientDescentOptimizer(ALPHA).minimize(train_cost)
  # test results
  test_cost = (tf.nn.l2_loss(test_prediction - y_test)+LAMBDA*tf.nn.l2_loss(theta))/float(_test_size)

# run the computation
with tf.Session(graph=graph) as s:
  tf.initialize_all_variables().run()
  print("initialized"); print(theta.eval())
  for step in range(TRAINING_STEPS):
    _, train_c, test_c = s.run([optimizer, train_cost, test_cost],
                               feed_dict={x_train: train_data, y_train: train_labels,
                                          x_test: test_data, y_test: test_labels })
    if (step%100==0):
      # it should return bias close to zero and parameters all close to 1 (see definition of f)
      print("\nAfter", step, "iterations:")
      #print("   Bias =", theta_0.eval(), ", Weights = ", theta.eval())
      print("   train cost =", train_c); print("   test cost =", test_c)
  PARAMETERS_GRADDESC = tf.concat(0, [theta_0, theta]).eval()
  print("Solution for parameters:\n", PARAMETERS_GRADDESC)

####################################################################################################
### NORMAL EQUATIONS APPROACH
####################################################################################################
# dataset globals
DIMENSIONS = 5
DS_SIZE = 5000
TRAIN_RATIO = 0.6 # 60% of the dataset isused for training
_train_size = int(DS_SIZE*TRAIN_RATIO)
_test_size = DS_SIZE - _train_size
f = lambda(x): sum(x) # the "true" function: f = 0 + 1*x1 + 1*x2 + 1*x3 ...
noise = lambda: np.random.normal(0,10) # some noise
# training globals
LAMBDA = 1e6 # L2 regularization factor

# generate the dataset, the labels and split into train/test
ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)]
ds = [([1]+x, [f(x)+noise()]) for x in ds] # add x[0]=1 dimension and labels
np.random.shuffle(ds)
train_data, train_labels = zip(*ds[0:_train_size])
test_data, test_labels = zip(*ds[_train_size:])

# define the computational graph
graph = tf.Graph()
with graph.as_default():
  # declare graph inputs
  x_train = tf.placeholder(tf.float32, shape=(_train_size, DIMENSIONS+1))
  y_train = tf.placeholder(tf.float32, shape=(_train_size, 1))
  theta = tf.Variable([[0.0] for _ in range(DIMENSIONS+1)]) # implicit bias!
  # optimum
  optimum = tf.matrix_solve_ls(x_train, y_train, LAMBDA, fast=True)

# run the computation: no loop needed!
with tf.Session(graph=graph) as s:
  tf.initialize_all_variables().run()
  print("initialized")
  opt = s.run(optimum, feed_dict={x_train:train_data, y_train:train_labels})
  PARAMETERS_NORMEQ = opt
  print("Solution for parameters:\n",PARAMETERS_NORMEQ)

####################################################################################################
### PREDICTION AND ERROR RATE
####################################################################################################

# generate test dataset
ds = [[np.random.rand()*1000 for d in range(DIMENSIONS)] for _ in range(DS_SIZE)]
ds = [([1]+x, [f(x)+noise()]) for x in ds] # add x[0]=1 dimension and labels
test_data, test_labels = zip(*ds)
# define hypothesis
h_gd = lambda(x): PARAMETERS_GRADDESC.T.dot(x)
h_ne = lambda(x): PARAMETERS_NORMEQ.T.dot(x)
# define cost
mse = lambda pred, lab: ((pred-np.array(lab))**2).sum()/DS_SIZE
# make predictions!
predictions_gd = np.array([h_gd(x) for x in test_data])
predictions_ne = np.array([h_ne(x) for x in test_data])
# calculate and print total error
cost_gd = mse(predictions_gd, test_labels)
cost_ne = mse(predictions_ne, test_labels)
print("total cost with gradient descent:", cost_gd)
print("total cost with normal equations:", cost_ne)