原始比较器与 WritableComparable

Question

compare() 和 compareTo() 如果我们谈论排序键是同义的但我只想知道在高配置机器的时代是否需要考虑什么时候使用 compare() 和什么时候使用 compareTo()?

如果需要考虑 compare(byte b1[],int s1,int l1, byte b2[],int s2,int l2) 优于 compareTo(object key1,Object key2) 的任何场景，请提出我们真正需要决定的领域或用例或问题类型使用哪一个？

谢谢!!

Answer 1

RawComparator的使用：

如果您仍想优化 Map Reduce Job 所花费的时间，则必须使用 RawComparator。

中间键值对已从 Mapper 传递到 Reducer。在这些值从 Mapper 到达 Reducer 之前，将执行洗牌和排序步骤。

排序得到改进，因为 RawComparator 将按字节比较键。如果我们不使用 RawComparator，则必须完全反序列化中间键才能执行比较。

示例：

public class IndexPairComparator extends WritableComparator {
protected IndexPairComparator() {
    super(IndexPair.class);
}

@Override
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
    int i1 = readInt(b1, s1);
    int i2 = readInt(b2, s2);

    int comp = (i1 < i2) ? -1 : (i1 == i2) ? 0 : 1;
    if(0 != comp)
        return comp;

    int j1 = readInt(b1, s1+4);
    int j2 = readInt(b2, s2+4);
    comp = (j1 < j2) ? -1 : (j1 == j2) ? 0 : 1;

    return comp;
}

}

在上面的例子中，我们没有直接实现RawComparator。相反，我们扩展了 WritableComparator，它在内部实现了 RawComparator。

看看 article 作者 Jee Vang

RawComparator()在WritableComparator中的实现：只需比较键

public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
     try {
      buffer.reset(b1, s1, l1);                   // parse key1
      key1.readFields(buffer);

      buffer.reset(b2, s2, l2);                   // parse key2
      key2.readFields(buffer);

    } catch (IOException e) {
      throw new RuntimeException(e);
    }

    return compare(key1, key2);                   // compare them
}

看看source

原始比较器与 WritableComparable

Raw Comparator vs WritableComparable

java

hadoop

mapreduce

comparator