UCI 数据集:如何在读取 python 上的数据后提取特征并将数据转换为可用格式
UCI dataset: How to extract features and change the data into usable format after reading the data on python
我希望对来自 https://archive.ics.uci.edu/ml/datasets/University 的数据集应用一些 ml 算法。
我注意到数据是非结构化的。实际上,我希望数据将特征作为列,并将观察结果作为行。因此,我需要解析此数据集的帮助。
如有任何帮助,我们将不胜感激。谢谢
下面是我试过的:
column_names = ["University-name"
,"State"
,"location"
,"Control"
,"number-of-students"
,"male:female (ratio)"
,"student:faculty (ratio)",
"sat-verbal"
,"sat-math"
,"expenses"
,"percent-financial-aid"
,"number-of-applicants"
,"percent-admittance"
,"percent-enrolled"
,"academics"
,"social"
,"quality-of-life"
,"academic-emphasis"]
data_list =[]
data = ['https://archive.ics.uci.edu/ml/machine-learning-
databases/university/university.data','https://archive.ics.uci.edu/ml/machine-
learning-databases/university/university.data',...]'
for file in in data:
df = pd.read_csv(file, names = column_names)
data_list.append(df)
数据的结构不是您可以使用 pandas 解析的方式。做这样的事情:
import requests
data = "https://archive.ics.uci.edu/ml/machine-learning-databases/university/university.data"
data = requests.get(data)
temp = data.text
import re
fdic = {'def-instance':[], 'state':[]}
for col in fdic.keys():
fdic[col].extend(re.findall(f'\({col} ([^\\n)]*)' , temp))
import pandas as pd
pd.DataFrame(fdic)
输出:
我希望对来自 https://archive.ics.uci.edu/ml/datasets/University 的数据集应用一些 ml 算法。 我注意到数据是非结构化的。实际上,我希望数据将特征作为列,并将观察结果作为行。因此,我需要解析此数据集的帮助。
如有任何帮助,我们将不胜感激。谢谢
下面是我试过的:
column_names = ["University-name"
,"State"
,"location"
,"Control"
,"number-of-students"
,"male:female (ratio)"
,"student:faculty (ratio)",
"sat-verbal"
,"sat-math"
,"expenses"
,"percent-financial-aid"
,"number-of-applicants"
,"percent-admittance"
,"percent-enrolled"
,"academics"
,"social"
,"quality-of-life"
,"academic-emphasis"]
data_list =[]
data = ['https://archive.ics.uci.edu/ml/machine-learning-
databases/university/university.data','https://archive.ics.uci.edu/ml/machine-
learning-databases/university/university.data',...]'
for file in in data:
df = pd.read_csv(file, names = column_names)
data_list.append(df)
数据的结构不是您可以使用 pandas 解析的方式。做这样的事情:
import requests
data = "https://archive.ics.uci.edu/ml/machine-learning-databases/university/university.data"
data = requests.get(data)
temp = data.text
import re
fdic = {'def-instance':[], 'state':[]}
for col in fdic.keys():
fdic[col].extend(re.findall(f'\({col} ([^\\n)]*)' , temp))
import pandas as pd
pd.DataFrame(fdic)
输出: