Pandas 比较列 (A) 与其他列 (B) 和 return 列 (A) 中存在的唯一值

Question

我在比较包含大约 5 到 6 个缺少单元格的数据的两列时遇到了问题。我使用 countif 公式来检查 A 列和 B 列中的值是否存在。但是，计算需要花费大量时间，因此我停止使用 excel 完成该任务.我在 Pandas.

中找到了 n 种替代方法

是否可以通过比较B列找到A列中的唯一值列表。请建议。

A 列：585256

B 列：556245

Answer 1

嘿，使用默认的 python 集合数据结构非常简单。

下面是 returns 设置差异的简单片段。

def get_difference(file_1, file_2):
    data_1 = set(open(file_1, encoding='utf-8').read().splitlines())
    data_2 = set(open(file_2, encoding='utf-8').read().splitlines())
    return data_1 - data_2

我用大约 500000 行的数据检查了性能。脚本在 2 秒内生成结果。

Pandas 比较列 (A) 与其他列 (B) 和 return 列 (A) 中存在的唯一值

Pandas to compare the column (A) with other column (B) and return the unique values which is present in Column (A)

excel

performance

ipython

python-3.x

pandas