random_state 一起洗牌
random_state and shuffle together
我对一起使用 random_state
和 shuffle
有点困惑。我想拆分数据而不洗牌。在我看来,当我将 shuffle 设置为 False 时,我为 random_state 选择的数字是多少并不重要,我有相同的输出(random_state 42 或 2 的拆分相同, 7、17 等)。为什么?
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42,shuffle=False )
但是如果 shuffle 为 True,我对不同的 random_state 有不同的输出(拆分),这是有道理的。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42)
如果您将 shuffle
设置为 False,train_test_split
只会按原始顺序读入您的数据。因此参数random_state
被完全忽略。
示例:
X = [k for k in range(0, 50)] # create array with numbers ranging from 0 to 49
y = X # just for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=False)
print(X_train) // prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]
一旦您将 shuffle
设置为 True,random_state
就会用作随机数生成器的种子。结果,您的数据集被随机分为训练集和测试集。
random_state=42 的示例:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=True)
print(X_train) // prints [8, 3, 6, 41, 46, 47, 15, 9, 16, 24, 34, 31, 0, 44, 27, 33, 5, 29, 11, 36, 1, 21, 2, 43, 35, 23, 40, 10, 22, 18, 49, 20, 7, 42, 14, 28, 38]
random_state=44 的示例:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=44, shuffle=True)
print(X_train) // prints [13, 11, 2, 12, 34, 41, 30, 16, 39, 28, 24, 8, 18, 9, 4, 10, 0, 19, 21, 29, 14, 1, 48, 38, 7, 43, 25, 22, 23, 42, 46, 49, 32, 3, 45, 35, 20]
我对一起使用 random_state
和 shuffle
有点困惑。我想拆分数据而不洗牌。在我看来,当我将 shuffle 设置为 False 时,我为 random_state 选择的数字是多少并不重要,我有相同的输出(random_state 42 或 2 的拆分相同, 7、17 等)。为什么?
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42,shuffle=False )
但是如果 shuffle 为 True,我对不同的 random_state 有不同的输出(拆分),这是有道理的。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42)
如果您将 shuffle
设置为 False,train_test_split
只会按原始顺序读入您的数据。因此参数random_state
被完全忽略。
示例:
X = [k for k in range(0, 50)] # create array with numbers ranging from 0 to 49
y = X # just for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=False)
print(X_train) // prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]
一旦您将 shuffle
设置为 True,random_state
就会用作随机数生成器的种子。结果,您的数据集被随机分为训练集和测试集。
random_state=42 的示例:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=True)
print(X_train) // prints [8, 3, 6, 41, 46, 47, 15, 9, 16, 24, 34, 31, 0, 44, 27, 33, 5, 29, 11, 36, 1, 21, 2, 43, 35, 23, 40, 10, 22, 18, 49, 20, 7, 42, 14, 28, 38]
random_state=44 的示例:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=44, shuffle=True)
print(X_train) // prints [13, 11, 2, 12, 34, 41, 30, 16, 39, 28, 24, 8, 18, 9, 4, 10, 0, 19, 21, 29, 14, 1, 48, 38, 7, 43, 25, 22, 23, 42, 46, 49, 32, 3, 45, 35, 20]