让二进制搜索在 Python 中工作

Getting a binary search to work in Python

我正在尝试让二进制搜索在 Python 中工作。我有一个庞大的、经过排序的密码列表。计划是从用户那里获取密码输入并查看它是否在列表中。由于列表的大小,我决定实施二进制搜索。

这是我的代码:

Found = False
Password = user_input("Enter a password: ")


with io.open('final.txt', encoding='latin-1') as myfile:

    data = myfile.readlines()
    low = 0
    high = (int(len(data))+1)
    while (low < high) and not Found:

        mid = int((low+high)/2)

        if data[mid] == Password:
            Found = True
            break
        elif Password < str(data[mid]):
            high = mid - 1
        elif Password > str(data[mid]):
            low = mid + 1

我猜是因为字符串比较?有任何想法吗?二进制搜索永远不会 returns 正确,即使我明确搜索我知道列表中的内容也是如此。

我用这段代码对密码列表进行了排序。

import io

with io.open('result.txt', encoding='latin-1') as myfile:
    data = myfile.readlines()

def partition(data, start, end):
    pivot = data[end]                          # Partition around the last value
    bottom = start-1                           # Start outside the area to be partitioned
    top = end                                  # Ditto

    done = 0
    while not done:                            # Until all elements are partitioned...

        while not done:                        # Until we find an out of place element...
            bottom = bottom+1                  # ... move the bottom up.

            if bottom == top:                  # If we hit the top...
                done = 1                       # ... we are done.
                break

            if data[bottom] > pivot:           # Is the bottom out of place?
                data[top] = data[bottom]       # Then put it at the top...
                break                          # ... and start searching from the top.

        while not done:                        # Until we find an out of place element...
            top = top-1                        # ... move the top down.

            if top == bottom:                  # If we hit the bottom...
                done = 1                       # ... we are done.
                break

            if data[top] < pivot:              # Is the top out of place?
                data[bottom] = data[top]       # Then put it at the bottom...
                break                          # ...and start searching from the bottom.

    data[top] = pivot                          # Put the pivot in its place.
    return top                                 # Return the split point


def quicksort(data, start, end):
    if start < end:                            # If there are two or more elements...
        split = partition(data, start, end)    # ... partition the sublist...
        quicksort(data, start, split-1)
        quicksort(data, split+1, end)


quicksort(data, 0, (int(len(data))-1))

with io.open('final.txt', 'w', encoding='latin-1') as f:
    for s in data:
        f.write(s)

排序后的列表看起来像这样:空格,然后是符号,然后是数字,然后是大写字母(按字母顺序排序),然后是普通字母(按字母顺序排序)。

不要自己写二分查找,要正确使用它们有点棘手。请改用 bisect 模块。

from bisect import bisect_left
def binary_search(lst, el):
   # returns lower bound of key `el` in list `lst`
   index = bisect_left(lst, el)
   # check that: (1) the lower bound is not at the end of the list and
   # (2) the element at the index matches `el`
   return index < len(lst) and lst[index] == el

用法:

test = ["abc", "def", "ghi"]
print(binary_search(test, "def")) # True
print(binary_search(test, "xyz")) # False

如果您只想搜索列表中的密码,那么在您的代码中

data = myfile.readlines()

您已经将所有密码记入内存。 所以如果你只想检查给定的密码是否存在于你的列表中,你可以直接使用

进行检查
if Password in data:
     print "yes it is present in the list"
else:
    print "Not present in the list"

希望对您有所帮助。

您可能在调用readlines后每个密码的末尾都有一个换行符,请使用rstrip()将其删除

    Found = False
    Password = user_input("Enter a password: ")


    with io.open('final.txt', encoding='latin-1') as myfile:

        data = myfile.readlines()
        low = 0
        high = len(data)-1   #no need to cast to int, should be len()-1
        while (low <= high) and not Found:  #less than or equal to

            mid = int((low+high)/2)

            if data[mid].rstrip() == Password:   #Remove newline character before compare
                Found = True
                break
            elif Password < str(data[mid]):
                high = mid - 1
            elif Password > str(data[mid]):
                low = mid + 1

这是二分查找的例子

def binarySearch(alist, item):
        first = 0
        last = len(alist)-1
        found = False

        while first<=last and not found:
            midpoint = (first + last)//2
            if alist[midpoint] == item:
                found = True
            else:
                if item < alist[midpoint]:
                    last = midpoint-1
                else:
                    first = midpoint+1

        return found

mylist1 = [0, 1, 2, 8, 9, 17, 19, 32, 42,]
print(binarySearch(mylist1, 3))
print(binarySearch(mylist1, 13))

mylist2 = [0, 1, 2, 8, 9, 17, 19, 32, 42, 99]
print(binarySearch(mylist2, 2))
print(binarySearch(mylist2, 42))

我当时就知道了

False
False
True
True

是的,正如 Eamon 指出的那样,我确信在调用 readlines 后您需要在每个密码的末尾换行。

有两个问题。

  1. 你的二进制搜索算法是错误的。

重复条件应该是

while (low <= high)

或者您找不到第一个和最后一个元素。

  1. readlines() 将读取 \n 但 user_input() 不会读取 .

这导致 `Password` == `Password\n' 永远为假。

由于您设置 lowhigh 的方式,您正在跳过部分列表。正因为如此,low == high发生在更新之后和检查之前,导致你过早地跳出循环。

有两个简单的解决方案:

要么..

  • 设置high = midlow = mid而不是mid -/+ 1,触发额外的迭代,

或..

  • 循环后检查是否high == low and data[low] == Password 终止,因为您可能仍会在那里找到 Password