对整数数组进行排序的非常快速的方法？

Question

我带着我刚写的解决方案来到这里，但我绝对找不到更好的方法来尽可能快地对我的数组进行排序。

我实际上需要算法在不到 1 秒的时间内在包含 100'000 个整数的数组上给出答案。

代码如下：

read N
for (( i=0; i<N; i++ )); do
    read arr[i]
done

sorted=($(printf '%s\n' "${arr[@]}"|sort -n))

min=$(( sorted[1]-sorted[0] ))

for (( i=1; i<N-1; i++ )); do
    diff=$(( sorted[i+1] - sorted[i] ))
    if [ "$diff" -lt "$min" ]; then
        min=$diff
    fi
done

echo $min

N 是元素的数量，如我所说，在我们的例子中是 100'000。

问题是，在我的数组排序后，我需要遍历它并计算两个整数之间的最小距离：

例如，对于(3, 5, 8, 9)，最小距离为1（介于8和9之间）。

我是Bash的新手，所以我什至不知道这是否是一种有效的方法；

实际上，速度的提升可能来自代码的第二部分，而不是排序部分...

有什么想法吗？

提前致谢，

Answer 1

数字范围<0,1000000)足够小

因为你的数字不重复（最小距离>=40）
然后你只需要每个值一点（如果它存在或不存在）
因此 Byte/value 最多需要 1MB，bit/value
(因为K,M都是以1024为基础所以你的储备够大)

所以桶排序是可能的：

为每个可能的值创建标志数组
- bool flag[1000000];
清除标志O(M)
- for (int i=0;i<1000000;i++) flag[i]=0;
通过处理数组计算标志 arr[N] ... O(N)
- for (int i=0;i<N;i++) flag[arr[i]]=1;
- 对于重复，只需增加标志[]，除非它太大而无法避免溢出
- 如果 flag[arr[i]] 已经是 1 那么 return distance = 0
重构数组O(M)
- for (int i=0,j=0;i<1000000;i++) if (flag[i]) { arr[j]=i; j++; } N=j;
现在计算距离O(N)
- 您可以将这些步骤混合在一起，这样就不需要内存中的 arr[N]
- 而是只计算 flag[] 中的后续零 ...

[备注]

M=1000000
N<=M
如果 N 比 M 小得多，那么这可能不是最快的方法...

Answer 2

鉴于 sorting algorithm complexity I would use the Radix Sort (here in Python 上的这个很棒的页面，我没有找到 Bash 实现，但我仍在寻找）：

#python2.6 <
from math import log

def getDigit(num, base, digit_num):
    # pulls the selected digit
    return (num // base ** digit_num) % base  

def makeBlanks(size):
    # create a list of empty lists to hold the split by digit
    return [ [] for i in range(size) ]  

def split(a_list, base, digit_num):
    buckets = makeBlanks(base)
    for num in a_list:
        # append the number to the list selected by the digit
        buckets[getDigit(num, base, digit_num)].append(num)  
    return buckets

# concatenate the lists back in order for the next step
def merge(a_list):
    new_list = []
    for sublist in a_list:
       new_list.extend(sublist)
    return new_list

def maxAbs(a_list):
    # largest abs value element of a list
    return max(abs(num) for num in a_list)

def split_by_sign(a_list):
    # splits values by sign - negative values go to the first bucket,
    # non-negative ones into the second
    buckets = [[], []]
    for num in a_list:
        if num < 0:
            buckets[0].append(num)
        else:
            buckets[1].append(num)
    return buckets

def radixSort(a_list, base):
    # there are as many passes as there are digits in the longest number
    passes = int(round(log(maxAbs(a_list), base)) + 1) 
    new_list = list(a_list)
    for digit_num in range(passes):
        new_list = merge(split(new_list, base, digit_num))
    return merge(split_by_sign(new_list))

Answer 3

我终于想出了一个简单而优雅的解决方案：

read N

for (( i=0; i<N; i++ )); do
    read tab[i]
done

printf "%i\n" ${tab[@]} | sort -n | awk 'BEGIN{dmin=1000000;} (NR>1){d=-prec; if (d<dmin) dmin=d;} {prec=;} END{print dmin;}'

就是这样。 :) 感谢大家花时间帮助我！ ;)

对整数数组进行排序的非常快速的方法？

Very fast way to sort array of integers?

arrays

sorting

algorithm

bash

integer