将 ndarray 添加到数据框中，然后返回到 ndarray

Question

我有一个像这样的 ndarray:

我想将其添加到现有数据框中，以便将其导出为 csv，然后在单独的 python 脚本中使用该 csv，取出 ndarray 并进行一些分析，主要是所以我没有一个很长的 python 脚本。

为了将其添加到数据框，我执行了以下操作：

data["StandardisedFeatures"] = x.tolist()

我觉得这还不错。但是，在我的下一个脚本中，当我尝试提取数据并将其作为数组放回原处时，它看起来并不相同，它用单引号引起来并将其视为字符串：

data['StandardisedFeatures'].to_numpy()

我已经尝试过 astype(float) 但它似乎不起作用，有人可以建议解决此问题的方法吗？

谢谢。

Answer 1

您可以在 DataFrame 中保存任何类型的对象。

您保留了它们的类型，但它们将在 pandas.DataFrame.info() 中归类为“对象”。

示例：保存列表

df = pd.DataFrame(dict(my_list=[[1,2,3,4], [1,2,3,4]]))
print(type(df.loc[0, 'my_list']))
# Print: list

如果您直接将对象与 pandas.DataFrame.apply() 一起使用，这将很有用。

Answer 2

如果你的DataFrame中的列表对象在处理时变成了字符串（有时会发生），你可以使用eval或ast.literal_eval函数从字符串转换回列表，并使用map 对每个元素都这样做。

这里有一个例子，可以让您了解如何处理这个问题：

import pandas as pd
import numpy as np

dic = {"a": [1,2,3], "b":[4,5,6], "c": [[1,2,3], [4,5,6], [1,2,3]]}
df = pd.DataFrame(dic)

print("DataFrame:", df, sep="\n", end="\n\n")

print("Column of list to numpy:", df.c.to_numpy(), sep="\n", end="\n\n")
temp = df.c.astype(str).to_numpy()

print("Since your list objects have somehow become str objects while working with df:", temp, sep="\n", end="\n\n")

print("Magic for what you want:", np.array(list(map(eval, temp))), sep="\n", end="\n\n")

输出：

DataFrame:
a  b          c
0  1  4  [1, 2, 3]
1  2  5  [4, 5, 6]
2  3  6  [1, 2, 3]

Column of list to numpy:
[list([1, 2, 3]) list([4, 5, 6]) list([1, 2, 3])]

Since your list objects have somehow become str objects while working with df:
['[1, 2, 3]' '[4, 5, 6]' '[1, 2, 3]']

Magic for what you want:
[[1 2 3]
[4 5 6]
[1 2 3]]

注意：我在示例中使用了eval只是因为更多人熟悉它。每当您需要 eval 时，您应该更喜欢使用 ast.literal_eval。 This SO post 很好地解释了为什么要这样做。

Answer 3

也许解决此问题的另一种更简单的方法是使用 numpy.save 和 numpy.load 函数。然后你可以将数组保存为一个numpy数组对象，并在下一个脚本中直接作为一个numpy数组再次加载它：

import numpy as np
x = np.array([[1, 2], [3, 4]])
# Save the array in the working directory as "x.npy" (extension is automatically inserted)
np.save("x", x)
# Load "x.npy" as a numpy array
x_loaded = np.load("x.npy")

将 ndarray 添加到数据框中，然后返回到 ndarray

Adding ndarray into dataframe and then back to ndarray

python

series

dataframe

numpy-ndarray