如何找到包含所有元音的同一数组的两个元素

How to find two elements of the same array that contain all vowels

我想迭代给定的数组,例如:

["goat", "action", "tear", "impromptu", "tired", "europe"]

我想查看所有可能的配对。

所需的输出是一个新数组,其中包含所有对,组合起来包含所有元音。此外,这些对应连接为输出数组的一个元素:

["action europe", "tear impromptu"]

我尝试了以下代码,但收到一条错误消息:

No implicit conversion of nil into string.
def all_vowel_pairs(words)
  pairs = []

  (0..words.length).each do |i|                       # iterate through words
    (0..words.length).each do |j|                   # for every word, iterate through words again
      pot_pair = words[i].to_s + words[j]         # build string from pair
      if check_for_vowels(pot_pair)               # throw string to helper-method.
        pairs << words[i] + " " + words[j]      # if gets back true, concatenade and push to output array "pairs"
      end
    end
  end
  pairs
end

# helper-method to check for if a string has all vowels in it
def check_for_vowels(string)
  vowels = "aeiou"
  founds = []
  string.each_char do |char|
    if vowels.include?(char) && !founds.include?(char)
      founds << char
    end
  end
  if founds.length == 5
    return true
  end
  false
end

你说的是对,所以我认为它是两个元素的组合。我已经使用 #combination method. Then I #select-ed 将数组中的每两个元素组合在一起,只有那些在它们连接后包含所有元音的对。最后,我确保加入这些对:

["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)
.select { |c| c.join('') =~ /\b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)[a-zA-Z]+\b/ }
.map{ |w| w.join(' ') }

#=> ["action europe", "tear impromptu"]

正则表达式来自“”。

与 Viktor 的开始类似,我会使用一个简单的测试来查看单词中存在哪些元音,并比较它们是否匹配 "aeiou" 在去除重复项并对其进行排序后:

def ttm1(ary)
  ary.combination(2).select { |a| 
    a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou' 
  }.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]

分解它以便您了解发生了什么。

["goat", "action", "tear", "impromptu", "tired", "europe"]  # => ["goat", "action", "tear", "impromptu", "tired", "europe"]
            .combination(2)                                 
            .select { |a| a                                 # => ["goat", "action"],        ["goat", "tear"],     ["goat", "impromptu"],     ["goat", "tired"],    ["goat", "europe"],             ["action", "tear"],        ["action", "impromptu"],        ["action", "tired"],       ["action", "europe"],                ["tear", "impromptu"],     ["tear", "tired"],    ["tear", "europe"],             ["impromptu", "tired"],    ["impromptu", "europe"],             ["tired", "europe"]
                .join                                       # => "goataction",              "goattear",           "goatimpromptu",           "goattired",          "goateurope",                   "actiontear",              "actionimpromptu",              "actiontired",             "actioneurope",                      "tearimpromptu",           "teartired",          "teareurope",                   "impromptutired",          "impromptueurope",                   "tiredeurope"
                .scan(/[aeiou]/)                            # => ["o", "a", "a", "i", "o"], ["o", "a", "e", "a"], ["o", "a", "i", "o", "u"], ["o", "a", "i", "e"], ["o", "a", "e", "u", "o", "e"], ["a", "i", "o", "e", "a"], ["a", "i", "o", "i", "o", "u"], ["a", "i", "o", "i", "e"], ["a", "i", "o", "e", "u", "o", "e"], ["e", "a", "i", "o", "u"], ["e", "a", "i", "e"], ["e", "a", "e", "u", "o", "e"], ["i", "o", "u", "i", "e"], ["i", "o", "u", "e", "u", "o", "e"], ["i", "e", "e", "u", "o", "e"]
                .uniq                                       # => ["o", "a", "i"],           ["o", "a", "e"],      ["o", "a", "i", "u"],      ["o", "a", "i", "e"], ["o", "a", "e", "u"],           ["a", "i", "o", "e"],      ["a", "i", "o", "u"],           ["a", "i", "o", "e"],      ["a", "i", "o", "e", "u"],           ["e", "a", "i", "o", "u"], ["e", "a", "i"],      ["e", "a", "u", "o"],           ["i", "o", "u", "e"],      ["i", "o", "u", "e"],                ["i", "e", "u", "o"]
                .sort                                       # => ["a", "i", "o"],           ["a", "e", "o"],      ["a", "i", "o", "u"],      ["a", "e", "i", "o"], ["a", "e", "o", "u"],           ["a", "e", "i", "o"],      ["a", "i", "o", "u"],           ["a", "e", "i", "o"],      ["a", "e", "i", "o", "u"],           ["a", "e", "i", "o", "u"], ["a", "e", "i"],      ["a", "e", "o", "u"],           ["e", "i", "o", "u"],      ["e", "i", "o", "u"],                ["e", "i", "o", "u"]
                .join == 'aeiou'                            # => false,                     false,                false,                     false,                false,                          false,                     false,                          false,                     true,                                true,                      false,                false,                          false,                     false,                               false
            }                                               # => [["action", "europe"], ["tear", "impromptu"]]

看代码是不是所有的元音字母都存在,绕了一大圈。每次检查时,它都必须通过许多方法才能确定是否找到了所有元音;换句话说,它不能短路和失败,直到最后都不好。

此代码将:

def ttm2(ary)
  ary.combination(2).select { |a| 
    str = a.join 
    str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
  }.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]

但我不喜欢以这种方式使用正则表达式引擎,因为它比直接查找慢,这会导致:

def ttm3(ary)
  ary.combination(2).select { |a| 
    str = a.join 
    str['a'] && str['e'] && str['i'] && str['o'] && str['u']
  }.map { |a| a.join(' ') }
end

这是基准:

require 'fruity'

words = ["goat", "action", "tear", "impromptu", "tired", "europe"]                                  

def viktor(ary)
  ary.combination(2)                                                                             
  .select { |c| c.join('') =~ /\b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)[a-zA-Z]+\b/ }  
  .map{ |w| w.join(' ') }                                                                     
end
viktor(words) # => ["action europe", "tear impromptu"]

def ttm1(ary)
  ary.combination(2).select { |a| 
    a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou' 
  }.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]

def ttm2(ary)
  ary.combination(2).select { |a| 
    str = a.join 
    str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
  }.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]

def ttm3(ary)
  ary.combination(2).select { |a| 
    str = a.join 
    str['a'] && str['e'] && str['i'] && str['o'] && str['u']
  }.map { |a| a.join(' ') }
end
ttm3(words) # => ["action europe", "tear impromptu"]

compare {
  _viktor { viktor(words) }
  _ttm1 { ttm1(words) }
  _ttm2 { ttm2(words) }
  _ttm3 { ttm3(words) }
}

结果:

# >> Running each test 256 times. Test will take about 1 second.
# >> _ttm3 is similar to _viktor
# >> _viktor is similar to _ttm2
# >> _ttm2 is faster than _ttm1 by 2x ± 0.1

现在,因为这看起来很像家庭作业,所以了解学校了解 Stack Overflow 并寻找寻求帮助的学生很重要,因此您可能不想重复使用此代码,尤其不是逐字的。

您的代码包含两个错误,其中之一导致了错误消息。

(0..words.length) 从 0 到 6 循环。 words[6] 但是不存在(数组是从零开始的),所以你得到 nil。用 (0..words.length-1) 替换(两次)应该可以解决这个问题。

您将两次获得每个正确的结果,一次是 "action europe",一次是 "europe action"。这是由于循环太多,每个组合都要循环两次造成的。将第二个循环从 (0..words.length-1) 替换为 (i..words.length-1).

这种繁琐的索引簿记很无聊,而且经常导致错误。这就是为什么 Ruby 程序员通常更喜欢更简单的方法(如 combination 和其他答案一样),完全避免使用索引。

以下代码旨在提供一种在单词数量较大时构造所需数组的有效方法。请注意,与其他答案不同,它没有使用 Array#combination.

方法

说明部分的第一部分(下文)概述了该算法所采用的方法。然后填写详细信息。

代码

require 'set'

VOWELS = ["a", "e", "i", "o", "u"]
VOWELS_SET = VOWELS.to_set

def all_vowel_pairs(words)
  h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
  h.each_with_object([]) do |(k,v),a|
    vowels_needed = VOWELS_SET-k
    h.each do |kk,vv|
      next unless kk.superset?(vowels_needed)
      v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
    end
  end
end

例子

words = ["goat", "action", "tear", "impromptu", "tired", "europe", "hear"]

all_vowel_pairs(words)
  #=> ["action europe", "hear impromptu", "impromptu tear"]

说明

对于给定的示例,步骤如下。

VOWELS_SET = VOWELS.to_set
  #=> #<Set: {"a", "e", "i", "o", "u"}> 

h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
  #=> {#<Set: {"o", "a"}>=>["goat"],
  #    #<Set: {"a", "i", "o"}>=>["action"],
  #    #<Set: {"e", "a"}>=>["tear", "hear"],
  #    #<Set: {"i", "o", "u"}>=>["impromptu"],
  #    #<Set: {"i", "e"}>=>["tired"],
  #    #<Set: {"e", "u", "o"}>=>["europe"]}

可见h的键是五个元音的子集。这些值是 words(单词)的元素数组,其中包含键给出的元音,没有其他元音。因此,这些值共同构成 words 的分区。当字数很大时,人们会期望 h 有 31 个键 (2**5 - 1)。

我们现在遍历 h 的键值对。对于每个,使用键 k 和值 v,确定缺失的元音集 (vowels_needed),然后我们循环遍历 [=] 的那些键值对 [kk, vv] 30=] 其中 kkvowels_needed 的超集。然后将 vvv 元素的所有组合添加到返回的数组中(经过调整以避免重复计算每对单词)。

继续,

enum = h.each_with_object([])
  #=> #<Enumerator: {#<Set: {"o", "a"}>=>["goat"],
  #                  #<Set: {"a", "i", "o"}>=>["action"],
  #                  ...
  #                  #<Set: {"e", "u", "o"}>=>["europe"]}: 
  #     each_with_object([])> 

第一个值由enum生成并传递给块,块变量赋值:

(k,v), a = enum.next
  #=> [[#<Set: {"o", "a"}>, ["goat"]], []]

参见 Enumerator#next

各个变量的赋值方式为array decomposition:

k #=> #<Set: {"o", "a"}> 
v #=> ["goat"] 
a #=> [] 

现在执行块计算。

vowels_needed = VOWELS_SET-k
  #=> #<Set: {"e", "i", "u"}> 
h.each do |kk,vv|
  next unless kk.superset?(vowels_needed)
  v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end

单词“goat”(v)有元音“o”和“a”,因此只能匹配包含元音“e”、“i”和“u”的单词(也可能是“o”and/or“a”)。表达式

next unless kk.superset?(vowels_needed)

跳过 h (kk) 中那些不是 vowels_needed 超集的键。参见 Set#superset?

None words 中的单词包含“e”、“i”和“u”,因此数组 a 不变。

下一个元素现在由 enum 生成,传递给块并为块变量赋值:

(k,v), a = enum.next
  #=> [[#<Set: {"a", "i", "o"}>, ["action"]], []] 
k #=> #<Set: {"a", "i", "o"}> 
v #=> ["action"] 
a #=> [] 

区块计算开始:

vowels_needed = VOWELS_SET-k
  #=> #<Set: {"e", "u"}> 

我们看到 h 只有一个键值对,其键是 vowels_needed 的超集:

kk = %w|e u o|.to_set
  #=> #<Set: {"e", "u", "o"}> 
vv = ["europe"]

因此我们执行:

v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}

将一个元素添加到 a:

a #=> ["action europe"]

子句if w1 < w2是为了确保在后面的计算中"europe action"不会添加到a

如果v(包含'a'、'i'和'u'的单词)和vv(包含'e'、[=150的单词=] 和 'o') 改为:

v  #=> ["action", "notification"]
vv #=> ["europe", "route"]

我们会在 a 中添加 "action europe""action route""notification route"。 (”europe notification” 稍后会添加,当 k #=> #<Set: {"e", "u", "o"} 时。)

基准

我将我的方法与其他建议使用@theTinMan 的 Fruity 基准代码进行了基准测试。唯一的区别在于要测试的单词数组和将我的方法添加到基准测试中,我将其命名为 cary。对于要考虑的单词数组,我从计算机上的英语单词文件中选择了位于 运行dom 的 600 个单词:

words = IO.readlines('/usr/share/dict/words', chomp: true).sample(600)
words.first 10
  #=> ["posadaship", "explosively", "expensilation", "conservatively", "plaiting",
  #    "unpillared", "intertwinement", "nonsolidified", "uraemic", "underspend"]

发现此数组包含 46,436 对包含所有五个元音的单词。

结果如下图

compare {
  _viktor { viktor(words) }
  _ttm1 { ttm1(words) }
  _ttm2 { ttm2(words) }
  _ttm3 { ttm3(words) }
  _cary { cary(words) }
}
Running each test once. Test will take about 44 seconds.
_cary is faster than _ttm3 by 5x ± 0.1
_ttm3 is faster than _viktor by 50.0% ± 1.0%
_viktor is faster than _ttm2 by 30.000000000000004% ± 1.0%
_ttm2 is faster than _ttm1 by 2.4x ± 0.1

然后我将 caryttm3 进行了 1,000 个 运行domly 选择的单词的比较。这个数组被发现包含 125,068 对包含所有五个元音的单词。结果如下:

Running each test once. Test will take about 19 seconds.
_cary is faster than _ttm3 by 3x ± 1.0

为了感受基准的可变性,我 运行 最后比较了两次,每次都有一个新的 运行dom 选择 1,000 个单词。这给了我以下结果:

Running each test once. Test will take about 17 seconds.
_cary is faster than _ttm3 by 5x ± 1.0
Running each test once. Test will take about 18 seconds.
_cary is faster than _ttm3 by 4x ± 1.0

可见样本之间存在相当大的差异。