列出数据框列中每一行的拼接

Question

我有一列包含字符串。我想转换这个列，所以我最后只得到字符串的前 n 个词。

我知道我需要拆分字符串然后拼接列表以保留前 n 个单词。然后我可以使用 join 再次加入它们。但是，我运行在执行此操作时遇到了麻烦。

我希望以下方法有效：

data = [[1, "A complete sentence must have, at minimum, three things: a subject, verb, and an object. The subject is typically a noun or a pronoun."], [2, "And, if there's a subject, there's bound to be a verb because all verbs need a "], [3, "subject. Finally, the object of a sentence is the thing that's being acted upon by the subject."], [4, "So, you might say, Claire walks her dog. In this complete "]] 
df = pd.DataFrame(data, columns = ['id', 'text']) 

df['first_three'] = df['text'].str.split()[:3]

但这会对前 3 行执行拆分命令，而不是保留每行的前三个单词。

所以看起来像这样：

first_three
['A', 'complete', 'sentence', 'must', 'have,', 'at', 'minimum,', 'three', 'things:', 'a', 'subject,', 'verb,', 'and', 'an', 'object.', 'The', 'subject', 'is', 'typically', 'a', 'noun', 'or', 'a', 'pronoun.']
['And,', 'if', "there's", 'a', 'subject,', "there's", 'bound', 'to', 'be', 'a', 'verb', 'because', 'all', 'verbs', 'need', 'a']
['subject.', 'Finally,', 'the', 'object', 'of', 'a', 'sentence', 'is', 'the', 'thing', "that's", 'being', 'acted', 'upon', 'by', 'the', 'subject.']
NaN

我希望 first_three 列看起来像这样：

first_three
[A, complete, sentence]
[And, if, there's]
[subject, Finally, the]
[So, you, might]

所以我可以加入他们并继续。我知道这一定很容易修复，但我似乎找不到解决方案。非常感谢您的意见。

Answer 1

您可以使用 apply 函数从列表中提取所需数量的元素。

df['first_three'] = df['text'].str.split().apply(lambda x : x[:3])

如果你还想进行一些文本清理，那么你可以这样做：

df['first_three'] = df['text'].str.replace(",", " ")
df['first_three'] = df['first_three'].apply(lambda x : x.split()[:3])

输出

first_three
[A, complete, sentence]
[And, if, there's]
[subject., Finally, the]

列出数据框列中每一行的拼接

List splice for each row in column of dataframe

python

series

dataframe

pandas