使用 tf.GradientTape() 在张量流中一次计算多层的损失和计算梯度

Calculate losses and computing gradients for multiple layers at once in tensorflow with tf.GradientTape()

如果我对图层的了解是正确的,那么 Layers 使用 tf.Variable 作为权重变量,所以如果 Dense() 图层中有 3 个单元,则意味着它使用类似w = tf.Variable([0.2,5,0.9]) 对于单个实例,如果 input_shape 为 2,则变量会类似于 w = tf.Variable([[0.2,5,0.9],[2,3,0.4]])? 如有不妥请指正

我正在学习 tensorflow 的非常深入的基础知识,并找到了我修改为

的代码
weight = tf.Variable([3.2]) 


def get_lost_loss(w):
    '''
    A very hypothetical function since the name
    '''
    return (w**1.3)/3.1 # just felt like doing it


def calculate_gradient(w):
    with tf.GradientTape() as tape:
        loss = get_lost_loss(w) # calculate loss WITHIN tf.GradientTape()
        
    grad = tape.gradient(loss,w) # gradient of loss wrt. w
    
    return grad


# train and apply the things here
opt = tf.keras.optimizers.Adam(lr=0.01)

losses = []

for i in range(50):
    grad = calculate_gradient(weight)
    opt.apply_gradients(zip([grad],[weight]))
    
    losses.append(get_lost_loss(weight))

谁能告诉我 tf.GradientTape() 里面发生的事情的直觉。另外我最想问的是如果我必须为weight1weight2做这件事,它们的形状是[2,3]而不是weight,代码应该怎么修改

请做出任何假设。你们在这方面比我熟练得多。

是的,你是对的。 Layers 有两个变量。你提到的那个叫做内核。另一个叫做偏见。下面的例子详细解释了它:

import tensorflow as tf
w=tf.Variable([[3.2,5,6,7,5]],dtype=tf.float32)

d=tf.keras.layers.Dense(3,input_shape=(5,)) # Layer d gets inputs with shape (*,5) and generates outputs with shape (*,3)
                                            # It has kernel variable with shape (5,3) and bias variable with shape (3)
print("Output of applying d on w:", d(w))
print("\nLayer d trainable variables:\n", d.trainable_weights)

输出将类似于:

Output of applying d on w: tf.Tensor([[ -0.9845681 -10.321521    7.506028 ]], shape=(1, 3), dtype=float32)



Layer d trainable variables:
 [<tf.Variable 'dense_18/kernel:0' shape=(5, 3) dtype=float32, numpy=
array([[-0.8144073 , -0.8408185 , -0.2504158 ],
       [ 0.6073988 ,  0.09965736, -0.32579994],
       [ 0.04219657, -0.33530533,  0.71029276],
       [ 0.33406   , -0.673926  ,  0.77048916],
       [-0.8014116 , -0.27997494,  0.05623555]], dtype=float32)>, <tf.Variable 'dense_18/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]

tf.GradientTape()用于记录其上下文中对可训练权重(变量)的操作,用于自动微分。所以后面我们可以得到变量的导数。

假设您有两个权重变量 weight1 和 weight2。首先你需要改变你的损失函数来使用这两个变量(见下面的代码)。然后在每一步中你需要得到损失函数的导数。变量并更新它们以优化损失。请看下面的代码。

import tensorflow as tf

weight1 = tf.Variable([[3.2,5,6],[2,5,4]],dtype=tf.float32) #modified
weight2= tf.Variable([[1,2,3],[1,4,3]],dtype=tf.float32)    #modified

def get_lost_loss(w1, w2): #modified
    '''
    A very hypothetical function since the name
    '''
    return tf.reduce_sum(tf.math.add(w1**1.2/2,w2**2))  # just felt like doing it


def calculate_gradient(w1,w2):
    with tf.GradientTape() as tape:
        loss = get_lost_loss(w1,w2) # calculate loss WITHIN tf.GradientTape()
        
    dw1,dw2 = tape.gradient(loss,[w1,w2]) # gradient of loss wrt. w1,w2
    
    return dw1,dw2


# train and apply the things here
opt = tf.keras.optimizers.Adam(lr=0.01)

losses = []

for i in range(500):
    grad_weight1, grad_weight2 = calculate_gradient(weight1,weight2)
    opt.apply_gradients(zip([grad_weight1, grad_weight2],[weight1,weight2]))
    
    losses.append(get_lost_loss(weight1,weight2))
    print("loss: "+str(get_lost_loss(weight1,weight2).numpy()))