通过正则表达式匹配替换为通用词 URL 消除所有 URL
eliminate all URLs via regular expression matching replace with generic word URL
我是正则表达式匹配的新手。假设我想在一个用逗号分隔的文本文件中找到所有 URL 并将它们替换为单词 "url".
user,user,' http://twitpic.com/2y1zl - awww, that\'s a bummer. you shoulda got david carr of third day to do it. ;d',0
user,user,'is upset that he can\'t update his facebook by texting it... and might cry as a result school today also. blah!',0
user,user,' i dived many times for the ball. http://twitpic.com/2y1zl managed to save 50\% the rest go out of bounds',0
user,user,'my whole body feels itchy and like its on fire ',0
user,user,' no, it\'s not behaving at all. i\'m mad. why am i here? because i can\'t see you all over there. ',0
user,user,' not the whole crew ',0
user,user,'need a hug ',0
user,user,' hey long time no see! yes.. rains a bit ,only a bit lol , i\'m fine thanks , how\'s you ?',0
user,user,'_k nope they didn\'t have it ',0
user,user,'que me muera ? ',0
user,user,'spring break in plain city... it\'s snowing ',0
user,user,'i just re-pierced my ears ',0
想这样实现一个输出
user,user,' *url*- awww, that\'s a bummer. you shoulda got david carr of third day to do it. ;d',0
user,user,'is upset that he can\'t update his facebook by texting it... and might cry as a result school today also. blah!',0
user,user,' i dived many times for the ball. *url* managed to save 50\% the rest go out of bounds',0
user,user,'my whole body feels itchy and like its on fire ',0
user,user,' no, it\'s not behaving at all. i\'m mad. why am i here? because i can\'t see you all over there. ',0
user,user,' not the whole crew ',0
user,user,'need a hug ',0
user,user,' hey long time no see! yes.. rains a bit ,only a bit lol , i\'m fine thanks , how\'s you ?',0
user,user,'nope they didn\'t have it ',0
user,user,'que me muera ? ',0
user,user,'spring break in plain city... it\'s snowing ',0
user,user,'i just re-pierced my ears ',0
我试过 sed
sed -e 's/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$//URL/' filename.txt |less
查找和替换正则表达式不起作用
默认的 GNU sed 正则表达式需要很多反斜杠(参考:https://www.gnu.org/software/gnulib/manual/html_node/Regular-expression-syntaxes.html#Regular-expression-syntaxes)。此外,sed 正则表达式不理解 perl \d
和 \w
.
匹配 URL 是一个相当困难的问题。从
开始
sed 's@https\?://[^[:blank:]]\+@*url*@g' file
这为 s///
命令使用了备用分隔符以避免需要转义斜杠。
如果您的 URL 与任何后跟空格或任何不能在 URL 中的内容分隔开,这应该有效。
我在这里没有处理非 http URLs 或 user/password 组合;只是一个 http/https 后跟一个字符序列,允许在 URL.
中
sed -e 's@https\?://[][0-9a-Z._~:/?#@!$&()*+,;=%'\''-]\+@URL@g'
- 我使用
@
作为分隔符以简化斜杠的处理。
- 由于允许使用方括号和破折号,我将它们分别直接放在字符 class 的开头和结尾。
- 为了捕捉单引号,这必须首先按字面意思插入,然后在内部转义,所以它最终是:
'\''
我是正则表达式匹配的新手。假设我想在一个用逗号分隔的文本文件中找到所有 URL 并将它们替换为单词 "url".
user,user,' http://twitpic.com/2y1zl - awww, that\'s a bummer. you shoulda got david carr of third day to do it. ;d',0
user,user,'is upset that he can\'t update his facebook by texting it... and might cry as a result school today also. blah!',0
user,user,' i dived many times for the ball. http://twitpic.com/2y1zl managed to save 50\% the rest go out of bounds',0
user,user,'my whole body feels itchy and like its on fire ',0
user,user,' no, it\'s not behaving at all. i\'m mad. why am i here? because i can\'t see you all over there. ',0
user,user,' not the whole crew ',0
user,user,'need a hug ',0
user,user,' hey long time no see! yes.. rains a bit ,only a bit lol , i\'m fine thanks , how\'s you ?',0
user,user,'_k nope they didn\'t have it ',0
user,user,'que me muera ? ',0
user,user,'spring break in plain city... it\'s snowing ',0
user,user,'i just re-pierced my ears ',0
想这样实现一个输出
user,user,' *url*- awww, that\'s a bummer. you shoulda got david carr of third day to do it. ;d',0
user,user,'is upset that he can\'t update his facebook by texting it... and might cry as a result school today also. blah!',0
user,user,' i dived many times for the ball. *url* managed to save 50\% the rest go out of bounds',0
user,user,'my whole body feels itchy and like its on fire ',0
user,user,' no, it\'s not behaving at all. i\'m mad. why am i here? because i can\'t see you all over there. ',0
user,user,' not the whole crew ',0
user,user,'need a hug ',0
user,user,' hey long time no see! yes.. rains a bit ,only a bit lol , i\'m fine thanks , how\'s you ?',0
user,user,'nope they didn\'t have it ',0
user,user,'que me muera ? ',0
user,user,'spring break in plain city... it\'s snowing ',0
user,user,'i just re-pierced my ears ',0
我试过 sed
sed -e 's/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$//URL/' filename.txt |less
查找和替换正则表达式不起作用
默认的 GNU sed 正则表达式需要很多反斜杠(参考:https://www.gnu.org/software/gnulib/manual/html_node/Regular-expression-syntaxes.html#Regular-expression-syntaxes)。此外,sed 正则表达式不理解 perl \d
和 \w
.
匹配 URL 是一个相当困难的问题。从
开始sed 's@https\?://[^[:blank:]]\+@*url*@g' file
这为 s///
命令使用了备用分隔符以避免需要转义斜杠。
如果您的 URL 与任何后跟空格或任何不能在 URL 中的内容分隔开,这应该有效。
我在这里没有处理非 http URLs 或 user/password 组合;只是一个 http/https 后跟一个字符序列,允许在 URL.
中sed -e 's@https\?://[][0-9a-Z._~:/?#@!$&()*+,;=%'\''-]\+@URL@g'
- 我使用
@
作为分隔符以简化斜杠的处理。 - 由于允许使用方括号和破折号,我将它们分别直接放在字符 class 的开头和结尾。
- 为了捕捉单引号,这必须首先按字面意思插入,然后在内部转义,所以它最终是:
'\''