如何在电子邮件后删除字符串中的字符

Question

我正在使用此代码列出 HTML 页面中的电子邮件地址。

require 'nokogiri'

selector = "//a[starts-with(@href, \"mailto:\")]/@href"

doc = Nokogiri::HTML.parse File.read 'in.rb'

nodes = doc.xpath selector

addresses = nodes.collect {|n| n.value[7..-1]}

puts addresses

这是我正在解析的示例代码：

<a href="mailto:joe@example.com?subject=My Business Is Dying">

但我得到的不仅仅是电子邮件地址。我在结果中得到了这个：

joe@example.com?subject=My Business Is Dying

如何去掉问号后的所有内容，使其只包含电子邮件地址？

Answer 1

您总是可以在 ? 字符之后删除任何内容：

addresses.map! do |address|
  address.sub(/\?.*/, '')
end

Answer 2

我可能会使用这两个之一：

str = 'joe@example.com?subject=My Business Is Dying'

str.split('?').first # => "joe@example.com"
str[/^[^?]+/] # => "joe@example.com"

第二个是嵌入在String's [] (slice) method中的简单正则表达式。该模式基本上说 "start at the beginning and grab everything up until a question mark."

就速度而言，它们是等效的。我可能会使用第一个，因为它更容易阅读。

如何在电子邮件后删除字符串中的字符

How to remove characters in string after email

ruby

nokogiri