使用正则表达式清理网址

Question

我有千行数据，如

http://xxxx.com/xxx-xxx-xxx-xxxx/ 60% 2 Weekly 2014-01-01 00:00

想要删除每个 url

之后/中的所有内容

（输出应该是干净的 url，如下所示）

http://xxxx.com/xxx-xxx-xxx-xxxx/

谢谢

Answer 1

一种方法是使用 linux 命令行：

cat file.txt |cut -f1 -d" "

如果您对正则表达式感兴趣，那么这将在一行中匹配 url：

[^\ ]+

Answer 2

通过按 Ctrl+H 使用替换菜单，并确保已启用正则表达式。那么，

查找(^.*\/).*和替换</code>： <a href="https://regex101.com/r/lJ4lF9/12" rel="nofollow">https://regex101.com/r/lJ4lF9/12</a> 或者，查找 <code>(?m)(^.*\/).* 和替换 </code>：<a href="https://regex101.com/r/lJ4lF9/13" rel="nofollow">https://regex101.com/r/lJ4lF9/13</a> 解释： 在 <a href="http://www.rexegg.com/regex-capture.html" rel="nofollow">capture group</a> 中， 找到 字符串的开头 (<code>^) 后跟任何次数 (.*)直到最后一个“/”，然后任意次数。将替换为捕获的组，将其引用为</code>。 <code>(?m)

使用正则表达式清理网址

Clean Urls with regular expression

regex

url

notepad++