准确度为 1.0,同时训练损失和验证损失仍在下降
Accuracy of 1.0 while Training Loss and Validation Loss still decreasing
我创建了一个 LSTM RNN,以便根据 gps 坐标预测某人是否在开车。
这是数据示例(注意:x、y、z 是从 lat、lon 转换而来的 3d 坐标):
x y z trip_id,mode_cat,weekday,period_of_day
datetime id
2011-08-27 06:13:01 20 0.650429 0.043524 0.758319 1 1 1 0
2011-08-27 06:13:02 20 0.650418 0.043487 0.758330 1 1 1 0
2011-08-27 06:13:03 20 0.650421 0.043490 0.758328 1 1 1 0
2011-08-27 06:13:04 20 0.650427 0.043506 0.758322 1 1 1 0
2011-08-27 06:13:05 20 0.650438 0.043516 0.758312 1 1 1 0
当我训练我的网络时,我的 training_loss 和 validation_loss 都下降了,但准确率在第一个 epoch 达到了 1.0。我确保我的训练和测试数据不一样。
以下是我如何拆分训练和测试数据:
t_num_test = df["trip_id"].iloc[-1]*4//5
train_test_df = df.loc[df["trip_id"]<=t_num_test].copy(deep=True)
test_test_df = df.loc[df["trip_id"]>t_num_test].copy(deep=True)
features_train = train_test_df[["x","y","z","datetime","id","trip_id","mode_cat","weekday","period_of_day"]]
features_train.set_index(["datetime","id"],inplace=True)
dataset_train_x = features_train[["x","y","z","trip_id","weekday","period_of_day"]].values
dataset_train_y = features_train[["mode_cat"]].values
features_test = test_test_df[["x","y","z","datetime","id","trip_id","mode_cat","weekday","period_of_day"]]
features_test.set_index(["datetime","id"],inplace=True)
dataset_test_x = features_test[["x","y","z","trip_id","weekday","period_of_day"]].values
dataset_test_y = features_test[["mode_cat"]].values
下面是我建立网络的方式:
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(1,
input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dropout(0.2))
single_step_model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
single_step_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), loss='binary_crossentropy',
metrics=['accuracy'])
.
.
.
single_step_history = single_step_model.fit(train_data_single, epochs=epochs,
steps_per_epoch=evaluation_interval,
validation_data=test_data_single,
validation_steps=60)
And here is the graph displaying training_loss, validation_loss and accuracy
是什么导致了这个结果?
如果重要的话,我将使用大约 500,000 个数据点和大约 8000 个唯一 trip_id.
请指教
编辑:
# of Driving/Not Driving (Mode_cat: 1/0)
希望对您有所帮助!
我能想到的案例很少
您的数据集有偏差。它会导致大部分输入数据倾斜吗?检查其中 mode_cat 值的百分比。 (都是全1,还是大部分全1?)
你的 X 值可以有一个 feature/column 是函数 y 是 x 值的函数(比如 y_val = m * x_col2 + x_col3 ?)
Accuracy 很好学,但尝试使用像 f1 score/confusion_matrix 这样的东西。
Link:
我创建了一个 LSTM RNN,以便根据 gps 坐标预测某人是否在开车。 这是数据示例(注意:x、y、z 是从 lat、lon 转换而来的 3d 坐标):
x y z trip_id,mode_cat,weekday,period_of_day
datetime id
2011-08-27 06:13:01 20 0.650429 0.043524 0.758319 1 1 1 0
2011-08-27 06:13:02 20 0.650418 0.043487 0.758330 1 1 1 0
2011-08-27 06:13:03 20 0.650421 0.043490 0.758328 1 1 1 0
2011-08-27 06:13:04 20 0.650427 0.043506 0.758322 1 1 1 0
2011-08-27 06:13:05 20 0.650438 0.043516 0.758312 1 1 1 0
当我训练我的网络时,我的 training_loss 和 validation_loss 都下降了,但准确率在第一个 epoch 达到了 1.0。我确保我的训练和测试数据不一样。 以下是我如何拆分训练和测试数据:
t_num_test = df["trip_id"].iloc[-1]*4//5
train_test_df = df.loc[df["trip_id"]<=t_num_test].copy(deep=True)
test_test_df = df.loc[df["trip_id"]>t_num_test].copy(deep=True)
features_train = train_test_df[["x","y","z","datetime","id","trip_id","mode_cat","weekday","period_of_day"]]
features_train.set_index(["datetime","id"],inplace=True)
dataset_train_x = features_train[["x","y","z","trip_id","weekday","period_of_day"]].values
dataset_train_y = features_train[["mode_cat"]].values
features_test = test_test_df[["x","y","z","datetime","id","trip_id","mode_cat","weekday","period_of_day"]]
features_test.set_index(["datetime","id"],inplace=True)
dataset_test_x = features_test[["x","y","z","trip_id","weekday","period_of_day"]].values
dataset_test_y = features_test[["mode_cat"]].values
下面是我建立网络的方式:
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(1,
input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dropout(0.2))
single_step_model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
single_step_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), loss='binary_crossentropy',
metrics=['accuracy'])
.
.
.
single_step_history = single_step_model.fit(train_data_single, epochs=epochs,
steps_per_epoch=evaluation_interval,
validation_data=test_data_single,
validation_steps=60)
And here is the graph displaying training_loss, validation_loss and accuracy
是什么导致了这个结果? 如果重要的话,我将使用大约 500,000 个数据点和大约 8000 个唯一 trip_id.
请指教
编辑: # of Driving/Not Driving (Mode_cat: 1/0)
希望对您有所帮助!
我能想到的案例很少
您的数据集有偏差。它会导致大部分输入数据倾斜吗?检查其中 mode_cat 值的百分比。 (都是全1,还是大部分全1?)
你的 X 值可以有一个 feature/column 是函数 y 是 x 值的函数(比如 y_val = m * x_col2 + x_col3 ?)
Accuracy 很好学,但尝试使用像 f1 score/confusion_matrix 这样的东西。
Link: