为什么这个 List Comprehension 不会产生与这个 for/in 循环相同的结果?
How come this List Comprehension does not produce the same results as this for/in loop?
生成随机字符串列表,然后使用 for/in 循环和列表理解表达式来计算最长的字符串和该字符串的长度。
两种技术都能正确计算最大长度,但有时 for/in 循环会找到与列表理解相同的最长单词,有时则不会。为什么?什么逻辑错误?
import random
import string
def cobble_large_dataset(dataset_number_of_elements):
'''
Build a list of Lists, each List is a String of a random sequence of 1-10 characters
'''
myList = [] # Empty List
for i in range(0,dataset_number_of_elements):
string_length = random.randint(1, 10)
tmp = ''.join(random.choices(string.ascii_uppercase + string.digits, k=string_length)) #
tmp = [tmp]
#print(tmp)
myList.extend([tmp])
return myList
def list_comprehension_test(wordsList):
'''
Process a List of Lists using List Comprehension.
Each List in the List of Lists is a single String
'''
start_time = time.time()
maximumWordLength, longest_word = max([(len(x[0]), x[0]) for x in wordsList]) # This works because x is a List of strings
return ((time.time() - start_time), longest_word, maximumWordLength)
def brute_force_test(wordsList):
'''
Process a List of Lists using a brute-force for/in loop.
Each List in the List of Lists is a single String
'''
start_time = time.time()
maximumWordLength = 0
for word in wordsList:
tmp = word[0]
#print(tmp)
if (len(tmp) >= maximumWordLength):
maximumWordLength = len(tmp)
longest_word = tmp
#print(tmp)
#print(longest_word + " : " + str(maximumWordLength))
return ((time.time() - start_time), longest_word, maximumWordLength)
import time
start_time = time.time()
dataset = cobble_large_dataset(100)
print (str(len(dataset)) + ' Strings generated in ' + str((time.time() - start_time)) + ' seconds.')
# Let's see if both techniques produce the same results:
result_brute_force = brute_force_test(dataset)
print('Results from Brute Force = ' + result_brute_force[1] + ', ' + str(result_brute_force[2]) + ' characters' )
result_list_comprehension = list_comprehension_test(dataset)
print('Results from List Comprehension = ' + result_list_comprehension[1] + ', ' + str(result_list_comprehension[2]) + ' characters' )
if (result_list_comprehension[1] == result_brute_force[1]):
print("Techniques produced the same results.")
else:
print("Techniques DID NOT PRODUCE the same results
您将元组列表传递给 max
,没有 key
函数,因此 max
正在比较元组,而不仅仅是长度。当长度相等时,元组比较会继续比较第二个元素,即字符串本身,因此在长度相等的情况下,最大值是比较最大的字符串(通过字典序代码点比较)。
相比之下,在长度并列的情况下,您的循环会选择最后出现的候选者。 (如果您在 if (len(tmp) >= maximumWordLength):
中使用 >
而不是 >=
,它将选择第一个候选人。)
(此外,您正在使用 tmp
做一些奇怪的事情。您正在构建的 1 元素列表毫无意义 - cobble_large_dataset
应该只是 return 一个平面列表字符串。)
在您的列表理解案例中,您想告诉 max
仅对列表中每对值中的第一项进行操作。这等同于 for 循环的情况,因为它只考虑每个字符串的长度。所以你想要:
maximumWordLength, longest_word = max(
[(len(x[0]), x[0]) for x in wordsList],
key = lambda x: x[0]) # This works because x is a List of strings
正如其他人已经指出的,您还想将暴力案例中的 >=
比较更改为 >
。如果您进行这两项更改,您将通过这两种方法获得相同的结果。
生成随机字符串列表,然后使用 for/in 循环和列表理解表达式来计算最长的字符串和该字符串的长度。
两种技术都能正确计算最大长度,但有时 for/in 循环会找到与列表理解相同的最长单词,有时则不会。为什么?什么逻辑错误?
import random
import string
def cobble_large_dataset(dataset_number_of_elements):
'''
Build a list of Lists, each List is a String of a random sequence of 1-10 characters
'''
myList = [] # Empty List
for i in range(0,dataset_number_of_elements):
string_length = random.randint(1, 10)
tmp = ''.join(random.choices(string.ascii_uppercase + string.digits, k=string_length)) #
tmp = [tmp]
#print(tmp)
myList.extend([tmp])
return myList
def list_comprehension_test(wordsList):
'''
Process a List of Lists using List Comprehension.
Each List in the List of Lists is a single String
'''
start_time = time.time()
maximumWordLength, longest_word = max([(len(x[0]), x[0]) for x in wordsList]) # This works because x is a List of strings
return ((time.time() - start_time), longest_word, maximumWordLength)
def brute_force_test(wordsList):
'''
Process a List of Lists using a brute-force for/in loop.
Each List in the List of Lists is a single String
'''
start_time = time.time()
maximumWordLength = 0
for word in wordsList:
tmp = word[0]
#print(tmp)
if (len(tmp) >= maximumWordLength):
maximumWordLength = len(tmp)
longest_word = tmp
#print(tmp)
#print(longest_word + " : " + str(maximumWordLength))
return ((time.time() - start_time), longest_word, maximumWordLength)
import time
start_time = time.time()
dataset = cobble_large_dataset(100)
print (str(len(dataset)) + ' Strings generated in ' + str((time.time() - start_time)) + ' seconds.')
# Let's see if both techniques produce the same results:
result_brute_force = brute_force_test(dataset)
print('Results from Brute Force = ' + result_brute_force[1] + ', ' + str(result_brute_force[2]) + ' characters' )
result_list_comprehension = list_comprehension_test(dataset)
print('Results from List Comprehension = ' + result_list_comprehension[1] + ', ' + str(result_list_comprehension[2]) + ' characters' )
if (result_list_comprehension[1] == result_brute_force[1]):
print("Techniques produced the same results.")
else:
print("Techniques DID NOT PRODUCE the same results
您将元组列表传递给 max
,没有 key
函数,因此 max
正在比较元组,而不仅仅是长度。当长度相等时,元组比较会继续比较第二个元素,即字符串本身,因此在长度相等的情况下,最大值是比较最大的字符串(通过字典序代码点比较)。
相比之下,在长度并列的情况下,您的循环会选择最后出现的候选者。 (如果您在 if (len(tmp) >= maximumWordLength):
中使用 >
而不是 >=
,它将选择第一个候选人。)
(此外,您正在使用 tmp
做一些奇怪的事情。您正在构建的 1 元素列表毫无意义 - cobble_large_dataset
应该只是 return 一个平面列表字符串。)
在您的列表理解案例中,您想告诉 max
仅对列表中每对值中的第一项进行操作。这等同于 for 循环的情况,因为它只考虑每个字符串的长度。所以你想要:
maximumWordLength, longest_word = max(
[(len(x[0]), x[0]) for x in wordsList],
key = lambda x: x[0]) # This works because x is a List of strings
正如其他人已经指出的,您还想将暴力案例中的 >=
比较更改为 >
。如果您进行这两项更改,您将通过这两种方法获得相同的结果。