在 Python Pandas 中使用正则表达式提取字符串的一部分

Extract a part of a string using Regex in Python Pandas

我是一名从事数据科学项目的学生，我需要从数据框的一列中提取一部分。数据框如下所示： column.

我想从“HOTHOTVIDEOHOT0501005107FilmVidéoClub”这样的字符串中提取 HOTHOTVIDEO 部分

所以我使用如下正则表达式编写了这条指令： facturation['annotation']=facturation['annotation'].str.findall('([A-Z0-9]{3}\d+)').apply(''.join)

它正确地提取所有内容，除了有时我有这样的字符串：“CTVCANALVODCTV0200052670CTV0200052670”，它 returns CTV0200052670CTV0200052670，但只想要第一次出现：Like this

有人可以帮我解决这个问题:)

我认为问题出在您的 apply + join 和 findall 方法中，因为您在数据中匹配了 2 次此模式接下来你加入了它。 findall returns 为您列出。从列表中您只需要第一项，而不是全部。

谢谢所有帮助过我的人:) 我找到了答案:

facturation['annotation'] = facturation['annotation'].str.findall('([A-Z0-9]{3}\d+)').apply(''.join)

facturation['annotation'] = facturation['annotation'].str.extract('(.{0,13})')