如何将 header 添加到我从机器学习数据库中提取的以下数据

Question

这是我从网上提取的数据：

import requests
r=requests.get('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data')
print(r.text[0:200])

这是打印的内容：

39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K 50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-man

我想将以下 headers 添加到数据中，以便构建分类器。

col_names = ['age', 'work_class', 'fnlwgt', 'education', 'marital_status', 'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'class']

...但我无法将名称输入数据。

我是运行我在 colab.research.google.com

上的数据

Answer 1

您可以使用内置的 python 数据结构。例如，模式 [{header1: value1, header2:value2, ...}, ...] 中的字典数组，其中每个字典代表一行。

标准库中的 csv 阅读器可以提供帮助，例如 DictReader：https://docs.python.org/3.7/library/csv.html#csv.DictReader

Pandas 可能是一种更重的方法，需要大量用户工具：

import pandas as pd
df = pd.read_csv(url, header=None, names=col_names)
# Colab will auto pretty print a df if it is the last line of the cell like so
df.head()

一般来说，这是我希望在 research/data 科学中看到的方法，其中 numpy/pandas 非常流行。

如何将 header 添加到我从机器学习数据库中提取的以下数据

How can I add header to the following data that I am pulling from the machine learning database

python

jupyter-notebook

google-colaboratory