删除 Pandas 中的部分字符串：不起作用 + 错误

Question

我有一个名为 full_list 的 pandas DataFrame 和一个名为 domains 的字符串变量列。此处显示的部分片段

  domains
0 naturalhealth365.com
1 truththeory.com
2 themillenniumreport.com
3 https://www.cernovich.com
4 https://www.christianpost.com
5 http://evolutionnews.org
6 http://www.greenmedinfo.com
7 http://www.magapill.com8
8 https://needtoknow.news

我需要从网站名称中删除 https:// 或 http://。

我检查了多个 pandas post 处理模糊相似的问题，我已经尝试了所有这些方法：

full_list['domains'] = full_list['domains'].apply(lambda x: x.lstrip('http://')) 但这也错误地删除了字母 t、h 和 p，即“truththeory.com”（索引 1）变为“uththeory.com”
full_list['domains'] = full_list['domains'].replace(('http://', '')) 并且这根本不会更改字符串。与行运行前后一样，domains 中的值保持不变
full_list['domains'] = full_list['domains'].str.replace(('http://', '')) 给出错误 replace() missing 1 required positional argument: 'repl'
full_list['domains'] = full_list['domains'].str.lsplit('//', n=1).str.get(1) 使前 3 行（索引 0、1、2）nan

对于我这个世界，我看不出我做错了什么。感谢任何帮助。

Answer 1

尝试使用正则表达式 str.replace，如下所示：

>>> df['domains'].str.replace('http(s|)://', '')
0       naturalhealth365.com
1            truththeory.com
2    themillenniumreport.com
3          www.cernovich.com
4      www.christianpost.com
5          evolutionnews.org
6       www.greenmedinfo.com
7          www.magapill.com8
8            needtoknow.news
Name: domains, dtype: object
>>>

Answer 2

将 Series.str.replace 与正则表达式 ^ 一起用于字符串的开头，将 [s]* 用于可选的 s:

df['domains'] = df['domains'].str.replace(r'^http[s]*://', '', regex=True)
print (df)
                   domains
0     naturalhealth365.com
1          truththeory.com
2  themillenniumreport.com
3        www.cernovich.com
4    www.christianpost.com
5        evolutionnews.org
6     www.greenmedinfo.com
7        www.magapill.com8
8          needtoknow.news

删除 Pandas 中的部分字符串：不起作用 + 错误

Removing portions of string in Pandas: not working + errors

python

split

replace

pandas