基于数组的二进制搜索越界

Question

问题是我有一个基于数组的二进制搜索树，它需要从从文件 IO 读取的文本文件中获取近 2000 行信息。

然而，我不断得到java.lang.ArrayIndexOutOfBoundsException: 3012。

我试图在不超过 Java VM 中的限制的情况下使数组尽可能大。但即使这样也不足以存储文件。

我用较小的文件进行了测试，它工作正常。

文本文件示例位于：https://www.asxhistoricaldata.com/

public class ArrayBinary implements Serializable
{
    private class Entry implements Serializable
    {
        private int key;
        private Object element;
        public Entry (int k, Object e)
        {
            this.key = k;
            this.element = e;
        }
    }
    private Entry [] tree;
    private int size;
    private int height;
    private int left;
    private int right;
    private static final int MAXCAPACITY =  2000;
    public ArrayBinary()
    {
        size = 0;
        height = 1;
        left = 0;
        right = 0;
        tree = new Entry[MAXCAPACITY];
        for (int i = 0; i < MAXCAPACITY; i++)
        {
            tree[i] = null;
        }
    }
    public void insert(int key, Object value)
    {
        size++;
        insert(key, value, 0);
    }
    public void insert (int key, Object value, int index)
    {
        boolean added = false;
        //System.out.println(key);
        if (tree[index] == null)
        {
            Entry node = new Entry(key, value);
            tree[index] = node;
            added = true;
        }
        else if (key < tree[index].key)
        {
            insert(key, value, index * 2 + 1);
        }
        else if (key == tree[index].key)
        {
            insert(key, value, index * 2 + 2);
        }
        else
        {
            insert(key, value, index * 2 + 2);
        }        
    }
}

这就是将文件读入树中的方法（忽略其他两棵树）。

import java.io.*;
import java.util.*;
public class TreeFileIO
{
    private BTree4 tempBt;
    private BinarySearchTree tempBst;
    private ArrayBinary tempArraybst;
    public Object read(String fileName, int type, int degree)
    {
        switch(type)
        {
            case 1:
                //degree is only needed for b-tree
                tempBt = new BTree4(degree);
                break;
            case 2:
                tempBst = new BinarySearchTree(); 
                break;
            case 3:
                tempArraybst = new ArrayBinary();
                break;
        }
        Scanner sc = new Scanner(System.in);
        FileInputStream fileStrm = null;
        String line;
        int key;
        try
        {
            //open the file
            fileStrm = new FileInputStream (fileName + ".txt");
            InputStreamReader rdr = new InputStreamReader(fileStrm);
            BufferedReader bufRdr = new BufferedReader (rdr);
            line = bufRdr.readLine();
            while (line != null)
            {
                switch(type)
                {
                    case 1:
                        tempBt.insert(getKey(line), line);
                        break;
                    case 2:
                        tempBst.insert(getKey(line), line);
                        break;
                    case 3:
                        tempArraybst.insert(getKey(line), line);
                        break;
                }
                line = bufRdr.readLine();
            }
            //Closes the file once we're done
            fileStrm.close();
        }
        catch (IOException e)
        {
            if (fileStrm != null)
            {
                try 
                {
                    fileStrm.close();
                }
                catch (IOException ex2)
                {
                }
            }
            System.out.println("Error");
        }
        //Now send this tree to TreeProfiler for use
        switch(type)
        {
            case 1:
                return tempBt;                 
            case 2:
                return tempBst;    
            case 3:
                return tempArraybst;
        }
        return null;
    }
    //create a key using value from each line to avoid degenerate
    public int getKey(String csvRow)
    {
        StringTokenizer strTok = new StringTokenizer(csvRow, ",");
        int key = 0;
            try 
            {
                strTok.nextToken();
                strTok.nextToken();
                strTok.nextToken();
                strTok.nextToken();
                strTok.nextToken();
                strTok.nextToken();
                //Skip to last value to use as a key
               return key = Integer.parseInt(strTok.nextToken());
            }    
            catch (Exception e) 
            {
                System.out.println(e);
                throw new IllegalStateException("CSV row had invalid format");
            }
    }
}

我希望读取文件时不会报告任何数组越界，并且可以容纳整个 2000 int 文件。

Answer 1

主要问题是您使用的数据似乎是有序的。

通过遍历有序的值数组来填充树数据结构将导致树退化为列表，这就是您对索引的巨大需求的原因；每个新项目都添加到树的右侧，导致索引不断加倍。

解决这个问题最有效的方法是通过取data-set中间的元素来填充树，然后用剩下的两半递归地重复这个过程；下面的元素和上面的元素。这样，数组将完全填充。

另一种选择是从 data-set 中以随机顺序获取元素。在一般情况下，您可能需要比您提供的 2000 个容量更多的容量，但这实际上可能是可行的。

最后一个替代方案是保留相同的代码并打乱数据。

由于您使用流来读取CSV，前两种解决方案可能过于复杂，因此最好的解决方案是打乱文本文件的行并增加数组的容量。您可以在线找到各种文本文件行洗牌器。

基于数组的二进制搜索越界

Array based Binary Search out of bounds

java

arrays

file-io

binary-search-tree