如何在 Scikit-learn 中获取 OrdinalEncoder 的笛卡尔积?
How to get cartesian product of OrdinalEncoder in Scikit-learn?
代码:
import itertools
import pandas as pd
from sklearn import preprocessing
data = pd.read_csv("data.csv")
feature_cols = ['country', 'city', 'pay_type']
X = data[feature_cols]
y = data.label
oe = preprocessing.OrdinalEncoder()
X = oe.fit_transform(X)
print(oe.categories_)
for element in itertools.product(oe.categories_):
print(element)
和输出:
[array(['Saudi Arabia'], dtype=object), array(['Dammam', 'Jeddah', 'Madinah', 'Makkah', 'Riyadh', 'Taif'],
dtype=object), array(['COD', 'PREPAID'], dtype=object)]
(array(['Saudi Arabia'], dtype=object),)
(array(['Dammam', 'Jeddah', 'Madinah', 'Makkah', 'Riyadh', 'Taif'],
dtype=object),)
(array(['COD', 'PREPAID'], dtype=object),)
从输出中,我们有 3 个特征,取值范围是:
Country: 'Saudi Arabia'
City: 'Dammam', 'Jeddah', 'Madinah', 'Makkah', 'Riyadh', 'Taif'
Pay type: 'COD', 'PREPAID'
我想得到所有特征值的笛卡尔积,即:
('Saudi Arabia', 'Dammam', 'COD')
('Saudi Arabia', 'Dammam', 'PREPAID')
('Saudi Arabia', 'Jeddah', 'COD')
...
我试过itertools.product
,但只输出 3 个元素而不是笛卡尔积。
谁能给我一些关于获取这些特征值的笛卡尔积的提示?
你很接近。您需要执行以下操作:
...
for element in itertools.product(*oe.categories_):
print(element)
基本上你必须解压 oe.categories_
以便它可以被视为 itertools.product
.
的单独迭代
代码:
import itertools
import pandas as pd
from sklearn import preprocessing
data = pd.read_csv("data.csv")
feature_cols = ['country', 'city', 'pay_type']
X = data[feature_cols]
y = data.label
oe = preprocessing.OrdinalEncoder()
X = oe.fit_transform(X)
print(oe.categories_)
for element in itertools.product(oe.categories_):
print(element)
和输出:
[array(['Saudi Arabia'], dtype=object), array(['Dammam', 'Jeddah', 'Madinah', 'Makkah', 'Riyadh', 'Taif'],
dtype=object), array(['COD', 'PREPAID'], dtype=object)]
(array(['Saudi Arabia'], dtype=object),)
(array(['Dammam', 'Jeddah', 'Madinah', 'Makkah', 'Riyadh', 'Taif'],
dtype=object),)
(array(['COD', 'PREPAID'], dtype=object),)
从输出中,我们有 3 个特征,取值范围是:
Country: 'Saudi Arabia'
City: 'Dammam', 'Jeddah', 'Madinah', 'Makkah', 'Riyadh', 'Taif'
Pay type: 'COD', 'PREPAID'
我想得到所有特征值的笛卡尔积,即:
('Saudi Arabia', 'Dammam', 'COD')
('Saudi Arabia', 'Dammam', 'PREPAID')
('Saudi Arabia', 'Jeddah', 'COD')
...
我试过itertools.product
,但只输出 3 个元素而不是笛卡尔积。
谁能给我一些关于获取这些特征值的笛卡尔积的提示?
你很接近。您需要执行以下操作:
...
for element in itertools.product(*oe.categories_):
print(element)
基本上你必须解压 oe.categories_
以便它可以被视为 itertools.product
.