使用 tf.GradientTape() 在张量流中一次计算多层的损失和计算梯度
Calculate losses and computing gradients for multiple layers at once in tensorflow with tf.GradientTape()
如果我对图层的了解是正确的,那么 Layers
使用 tf.Variable
作为权重变量,所以如果 Dense()
图层中有 3 个单元,则意味着它使用类似w = tf.Variable([0.2,5,0.9])
对于单个实例,如果 input_shape
为 2,则变量会类似于 w = tf.Variable([[0.2,5,0.9],[2,3,0.4]])
?
如有不妥请指正
我正在学习 tensorflow 的非常深入的基础知识,并找到了我修改为
的代码
weight = tf.Variable([3.2])
def get_lost_loss(w):
'''
A very hypothetical function since the name
'''
return (w**1.3)/3.1 # just felt like doing it
def calculate_gradient(w):
with tf.GradientTape() as tape:
loss = get_lost_loss(w) # calculate loss WITHIN tf.GradientTape()
grad = tape.gradient(loss,w) # gradient of loss wrt. w
return grad
# train and apply the things here
opt = tf.keras.optimizers.Adam(lr=0.01)
losses = []
for i in range(50):
grad = calculate_gradient(weight)
opt.apply_gradients(zip([grad],[weight]))
losses.append(get_lost_loss(weight))
谁能告诉我 tf.GradientTape()
里面发生的事情的直觉。另外我最想问的是如果我必须为weight1
和weight2
做这件事,它们的形状是[2,3]
而不是weight
,代码应该怎么修改
请做出任何假设。你们在这方面比我熟练得多。
是的,你是对的。 Layers 有两个变量。你提到的那个叫做内核。另一个叫做偏见。下面的例子详细解释了它:
import tensorflow as tf
w=tf.Variable([[3.2,5,6,7,5]],dtype=tf.float32)
d=tf.keras.layers.Dense(3,input_shape=(5,)) # Layer d gets inputs with shape (*,5) and generates outputs with shape (*,3)
# It has kernel variable with shape (5,3) and bias variable with shape (3)
print("Output of applying d on w:", d(w))
print("\nLayer d trainable variables:\n", d.trainable_weights)
输出将类似于:
Output of applying d on w: tf.Tensor([[ -0.9845681 -10.321521 7.506028 ]], shape=(1, 3), dtype=float32)
Layer d trainable variables:
[<tf.Variable 'dense_18/kernel:0' shape=(5, 3) dtype=float32, numpy=
array([[-0.8144073 , -0.8408185 , -0.2504158 ],
[ 0.6073988 , 0.09965736, -0.32579994],
[ 0.04219657, -0.33530533, 0.71029276],
[ 0.33406 , -0.673926 , 0.77048916],
[-0.8014116 , -0.27997494, 0.05623555]], dtype=float32)>, <tf.Variable 'dense_18/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]
tf.GradientTape()用于记录其上下文中对可训练权重(变量)的操作,用于自动微分。所以后面我们可以得到变量的导数。
假设您有两个权重变量 weight1 和 weight2。首先你需要改变你的损失函数来使用这两个变量(见下面的代码)。然后在每一步中你需要得到损失函数的导数。变量并更新它们以优化损失。请看下面的代码。
import tensorflow as tf
weight1 = tf.Variable([[3.2,5,6],[2,5,4]],dtype=tf.float32) #modified
weight2= tf.Variable([[1,2,3],[1,4,3]],dtype=tf.float32) #modified
def get_lost_loss(w1, w2): #modified
'''
A very hypothetical function since the name
'''
return tf.reduce_sum(tf.math.add(w1**1.2/2,w2**2)) # just felt like doing it
def calculate_gradient(w1,w2):
with tf.GradientTape() as tape:
loss = get_lost_loss(w1,w2) # calculate loss WITHIN tf.GradientTape()
dw1,dw2 = tape.gradient(loss,[w1,w2]) # gradient of loss wrt. w1,w2
return dw1,dw2
# train and apply the things here
opt = tf.keras.optimizers.Adam(lr=0.01)
losses = []
for i in range(500):
grad_weight1, grad_weight2 = calculate_gradient(weight1,weight2)
opt.apply_gradients(zip([grad_weight1, grad_weight2],[weight1,weight2]))
losses.append(get_lost_loss(weight1,weight2))
print("loss: "+str(get_lost_loss(weight1,weight2).numpy()))
如果我对图层的了解是正确的,那么 Layers
使用 tf.Variable
作为权重变量,所以如果 Dense()
图层中有 3 个单元,则意味着它使用类似w = tf.Variable([0.2,5,0.9])
对于单个实例,如果 input_shape
为 2,则变量会类似于 w = tf.Variable([[0.2,5,0.9],[2,3,0.4]])
?
如有不妥请指正
我正在学习 tensorflow 的非常深入的基础知识,并找到了我修改为
的代码weight = tf.Variable([3.2])
def get_lost_loss(w):
'''
A very hypothetical function since the name
'''
return (w**1.3)/3.1 # just felt like doing it
def calculate_gradient(w):
with tf.GradientTape() as tape:
loss = get_lost_loss(w) # calculate loss WITHIN tf.GradientTape()
grad = tape.gradient(loss,w) # gradient of loss wrt. w
return grad
# train and apply the things here
opt = tf.keras.optimizers.Adam(lr=0.01)
losses = []
for i in range(50):
grad = calculate_gradient(weight)
opt.apply_gradients(zip([grad],[weight]))
losses.append(get_lost_loss(weight))
谁能告诉我 tf.GradientTape()
里面发生的事情的直觉。另外我最想问的是如果我必须为weight1
和weight2
做这件事,它们的形状是[2,3]
而不是weight
,代码应该怎么修改
请做出任何假设。你们在这方面比我熟练得多。
是的,你是对的。 Layers 有两个变量。你提到的那个叫做内核。另一个叫做偏见。下面的例子详细解释了它:
import tensorflow as tf
w=tf.Variable([[3.2,5,6,7,5]],dtype=tf.float32)
d=tf.keras.layers.Dense(3,input_shape=(5,)) # Layer d gets inputs with shape (*,5) and generates outputs with shape (*,3)
# It has kernel variable with shape (5,3) and bias variable with shape (3)
print("Output of applying d on w:", d(w))
print("\nLayer d trainable variables:\n", d.trainable_weights)
输出将类似于:
Output of applying d on w: tf.Tensor([[ -0.9845681 -10.321521 7.506028 ]], shape=(1, 3), dtype=float32)
Layer d trainable variables:
[<tf.Variable 'dense_18/kernel:0' shape=(5, 3) dtype=float32, numpy=
array([[-0.8144073 , -0.8408185 , -0.2504158 ],
[ 0.6073988 , 0.09965736, -0.32579994],
[ 0.04219657, -0.33530533, 0.71029276],
[ 0.33406 , -0.673926 , 0.77048916],
[-0.8014116 , -0.27997494, 0.05623555]], dtype=float32)>, <tf.Variable 'dense_18/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]
tf.GradientTape()用于记录其上下文中对可训练权重(变量)的操作,用于自动微分。所以后面我们可以得到变量的导数。
假设您有两个权重变量 weight1 和 weight2。首先你需要改变你的损失函数来使用这两个变量(见下面的代码)。然后在每一步中你需要得到损失函数的导数。变量并更新它们以优化损失。请看下面的代码。
import tensorflow as tf
weight1 = tf.Variable([[3.2,5,6],[2,5,4]],dtype=tf.float32) #modified
weight2= tf.Variable([[1,2,3],[1,4,3]],dtype=tf.float32) #modified
def get_lost_loss(w1, w2): #modified
'''
A very hypothetical function since the name
'''
return tf.reduce_sum(tf.math.add(w1**1.2/2,w2**2)) # just felt like doing it
def calculate_gradient(w1,w2):
with tf.GradientTape() as tape:
loss = get_lost_loss(w1,w2) # calculate loss WITHIN tf.GradientTape()
dw1,dw2 = tape.gradient(loss,[w1,w2]) # gradient of loss wrt. w1,w2
return dw1,dw2
# train and apply the things here
opt = tf.keras.optimizers.Adam(lr=0.01)
losses = []
for i in range(500):
grad_weight1, grad_weight2 = calculate_gradient(weight1,weight2)
opt.apply_gradients(zip([grad_weight1, grad_weight2],[weight1,weight2]))
losses.append(get_lost_loss(weight1,weight2))
print("loss: "+str(get_lost_loss(weight1,weight2).numpy()))