让二进制搜索在 Python 中工作
Getting a binary search to work in Python
我正在尝试让二进制搜索在 Python 中工作。我有一个庞大的、经过排序的密码列表。计划是从用户那里获取密码输入并查看它是否在列表中。由于列表的大小,我决定实施二进制搜索。
这是我的代码:
Found = False
Password = user_input("Enter a password: ")
with io.open('final.txt', encoding='latin-1') as myfile:
data = myfile.readlines()
low = 0
high = (int(len(data))+1)
while (low < high) and not Found:
mid = int((low+high)/2)
if data[mid] == Password:
Found = True
break
elif Password < str(data[mid]):
high = mid - 1
elif Password > str(data[mid]):
low = mid + 1
我猜是因为字符串比较?有任何想法吗?二进制搜索永远不会 returns 正确,即使我明确搜索我知道列表中的内容也是如此。
我用这段代码对密码列表进行了排序。
import io
with io.open('result.txt', encoding='latin-1') as myfile:
data = myfile.readlines()
def partition(data, start, end):
pivot = data[end] # Partition around the last value
bottom = start-1 # Start outside the area to be partitioned
top = end # Ditto
done = 0
while not done: # Until all elements are partitioned...
while not done: # Until we find an out of place element...
bottom = bottom+1 # ... move the bottom up.
if bottom == top: # If we hit the top...
done = 1 # ... we are done.
break
if data[bottom] > pivot: # Is the bottom out of place?
data[top] = data[bottom] # Then put it at the top...
break # ... and start searching from the top.
while not done: # Until we find an out of place element...
top = top-1 # ... move the top down.
if top == bottom: # If we hit the bottom...
done = 1 # ... we are done.
break
if data[top] < pivot: # Is the top out of place?
data[bottom] = data[top] # Then put it at the bottom...
break # ...and start searching from the bottom.
data[top] = pivot # Put the pivot in its place.
return top # Return the split point
def quicksort(data, start, end):
if start < end: # If there are two or more elements...
split = partition(data, start, end) # ... partition the sublist...
quicksort(data, start, split-1)
quicksort(data, split+1, end)
quicksort(data, 0, (int(len(data))-1))
with io.open('final.txt', 'w', encoding='latin-1') as f:
for s in data:
f.write(s)
排序后的列表看起来像这样:空格,然后是符号,然后是数字,然后是大写字母(按字母顺序排序),然后是普通字母(按字母顺序排序)。
不要自己写二分查找,要正确使用它们有点棘手。请改用 bisect
模块。
from bisect import bisect_left
def binary_search(lst, el):
# returns lower bound of key `el` in list `lst`
index = bisect_left(lst, el)
# check that: (1) the lower bound is not at the end of the list and
# (2) the element at the index matches `el`
return index < len(lst) and lst[index] == el
用法:
test = ["abc", "def", "ghi"]
print(binary_search(test, "def")) # True
print(binary_search(test, "xyz")) # False
如果您只想搜索列表中的密码,那么在您的代码中
data = myfile.readlines()
您已经将所有密码记入内存。
所以如果你只想检查给定的密码是否存在于你的列表中,你可以直接使用
进行检查
if Password in data:
print "yes it is present in the list"
else:
print "Not present in the list"
希望对您有所帮助。
您可能在调用readlines
后每个密码的末尾都有一个换行符,请使用rstrip()
将其删除
Found = False
Password = user_input("Enter a password: ")
with io.open('final.txt', encoding='latin-1') as myfile:
data = myfile.readlines()
low = 0
high = len(data)-1 #no need to cast to int, should be len()-1
while (low <= high) and not Found: #less than or equal to
mid = int((low+high)/2)
if data[mid].rstrip() == Password: #Remove newline character before compare
Found = True
break
elif Password < str(data[mid]):
high = mid - 1
elif Password > str(data[mid]):
low = mid + 1
这是二分查找的例子
def binarySearch(alist, item):
first = 0
last = len(alist)-1
found = False
while first<=last and not found:
midpoint = (first + last)//2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1
return found
mylist1 = [0, 1, 2, 8, 9, 17, 19, 32, 42,]
print(binarySearch(mylist1, 3))
print(binarySearch(mylist1, 13))
mylist2 = [0, 1, 2, 8, 9, 17, 19, 32, 42, 99]
print(binarySearch(mylist2, 2))
print(binarySearch(mylist2, 42))
我当时就知道了
False
False
True
True
是的,正如 Eamon 指出的那样,我确信在调用 readlines 后您需要在每个密码的末尾换行。
有两个问题。
- 你的二进制搜索算法是错误的。
重复条件应该是
while (low <= high)
或者您找不到第一个和最后一个元素。
- readlines() 将读取
\n
但 user_input() 不会读取 .
这导致 `Password` == `Password\n'
永远为假。
由于您设置 low
和 high
的方式,您正在跳过部分列表。正因为如此,low == high
发生在更新之后和检查之前,导致你过早地跳出循环。
有两个简单的解决方案:
要么..
- 设置
high = mid
或low = mid
而不是mid -/+ 1
,触发额外的迭代,
或..
- 循环后检查是否
high == low and data[low] == Password
终止,因为您可能仍会在那里找到 Password
。
我正在尝试让二进制搜索在 Python 中工作。我有一个庞大的、经过排序的密码列表。计划是从用户那里获取密码输入并查看它是否在列表中。由于列表的大小,我决定实施二进制搜索。
这是我的代码:
Found = False
Password = user_input("Enter a password: ")
with io.open('final.txt', encoding='latin-1') as myfile:
data = myfile.readlines()
low = 0
high = (int(len(data))+1)
while (low < high) and not Found:
mid = int((low+high)/2)
if data[mid] == Password:
Found = True
break
elif Password < str(data[mid]):
high = mid - 1
elif Password > str(data[mid]):
low = mid + 1
我猜是因为字符串比较?有任何想法吗?二进制搜索永远不会 returns 正确,即使我明确搜索我知道列表中的内容也是如此。
我用这段代码对密码列表进行了排序。
import io
with io.open('result.txt', encoding='latin-1') as myfile:
data = myfile.readlines()
def partition(data, start, end):
pivot = data[end] # Partition around the last value
bottom = start-1 # Start outside the area to be partitioned
top = end # Ditto
done = 0
while not done: # Until all elements are partitioned...
while not done: # Until we find an out of place element...
bottom = bottom+1 # ... move the bottom up.
if bottom == top: # If we hit the top...
done = 1 # ... we are done.
break
if data[bottom] > pivot: # Is the bottom out of place?
data[top] = data[bottom] # Then put it at the top...
break # ... and start searching from the top.
while not done: # Until we find an out of place element...
top = top-1 # ... move the top down.
if top == bottom: # If we hit the bottom...
done = 1 # ... we are done.
break
if data[top] < pivot: # Is the top out of place?
data[bottom] = data[top] # Then put it at the bottom...
break # ...and start searching from the bottom.
data[top] = pivot # Put the pivot in its place.
return top # Return the split point
def quicksort(data, start, end):
if start < end: # If there are two or more elements...
split = partition(data, start, end) # ... partition the sublist...
quicksort(data, start, split-1)
quicksort(data, split+1, end)
quicksort(data, 0, (int(len(data))-1))
with io.open('final.txt', 'w', encoding='latin-1') as f:
for s in data:
f.write(s)
排序后的列表看起来像这样:空格,然后是符号,然后是数字,然后是大写字母(按字母顺序排序),然后是普通字母(按字母顺序排序)。
不要自己写二分查找,要正确使用它们有点棘手。请改用 bisect
模块。
from bisect import bisect_left
def binary_search(lst, el):
# returns lower bound of key `el` in list `lst`
index = bisect_left(lst, el)
# check that: (1) the lower bound is not at the end of the list and
# (2) the element at the index matches `el`
return index < len(lst) and lst[index] == el
用法:
test = ["abc", "def", "ghi"]
print(binary_search(test, "def")) # True
print(binary_search(test, "xyz")) # False
如果您只想搜索列表中的密码,那么在您的代码中
data = myfile.readlines()
您已经将所有密码记入内存。 所以如果你只想检查给定的密码是否存在于你的列表中,你可以直接使用
进行检查if Password in data:
print "yes it is present in the list"
else:
print "Not present in the list"
希望对您有所帮助。
您可能在调用readlines
后每个密码的末尾都有一个换行符,请使用rstrip()
将其删除
Found = False
Password = user_input("Enter a password: ")
with io.open('final.txt', encoding='latin-1') as myfile:
data = myfile.readlines()
low = 0
high = len(data)-1 #no need to cast to int, should be len()-1
while (low <= high) and not Found: #less than or equal to
mid = int((low+high)/2)
if data[mid].rstrip() == Password: #Remove newline character before compare
Found = True
break
elif Password < str(data[mid]):
high = mid - 1
elif Password > str(data[mid]):
low = mid + 1
这是二分查找的例子
def binarySearch(alist, item):
first = 0
last = len(alist)-1
found = False
while first<=last and not found:
midpoint = (first + last)//2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1
return found
mylist1 = [0, 1, 2, 8, 9, 17, 19, 32, 42,]
print(binarySearch(mylist1, 3))
print(binarySearch(mylist1, 13))
mylist2 = [0, 1, 2, 8, 9, 17, 19, 32, 42, 99]
print(binarySearch(mylist2, 2))
print(binarySearch(mylist2, 42))
我当时就知道了
False
False
True
True
是的,正如 Eamon 指出的那样,我确信在调用 readlines 后您需要在每个密码的末尾换行。
有两个问题。
- 你的二进制搜索算法是错误的。
重复条件应该是
while (low <= high)
或者您找不到第一个和最后一个元素。
- readlines() 将读取
\n
但 user_input() 不会读取 .
这导致 `Password` == `Password\n'
永远为假。
由于您设置 low
和 high
的方式,您正在跳过部分列表。正因为如此,low == high
发生在更新之后和检查之前,导致你过早地跳出循环。
有两个简单的解决方案:
要么..
- 设置
high = mid
或low = mid
而不是mid -/+ 1
,触发额外的迭代,
或..
- 循环后检查是否
high == low and data[low] == Password
终止,因为您可能仍会在那里找到Password
。