为大数据生成最佳 UUID

Generate best UUIDs for big data

我想为大数据生成唯一标识符并以 UUID 结尾。我参考了维基百科 (https://en.wikipedia.org/wiki/Universally_unique_identifier) 的 UUID，文章提到 哈希冲突可能是概率 和 "The identifier size and generation process need to be selected so as to make this sufficiently improbable in practice"。

"...这些概率仅在使用足够的熵生成 UUID 时成立。否则，重复的概率可能会高得多，...""=35=.

"...如果这不可行，RFC4122 建议改用名称空间变体，例如 Type 5 UUID。".

我计划使用 Java 生成 UUID 并引用 API https://docs.oracle.com/javase/8/docs/api/java/util/UUID.html

根据维基百科：

如何设置标识符大小和select生成过程正如维基百科中指出的那样？

我应该怎么做才能满足那边提到的"sufficient entropy"？

谁能简化一下，让我知道我到底应该怎么做生成最好的 UUID？

How can I set the identifier size and select the generation process as pointed out in Wikipedia?

什么标识符大小？ UUID的大小是有标准规定的

What should i do to meet the "sufficient entropy" mentioned over there?

没有。只需使用 java.util.UUID。来自 documentation of randomUUID:

The UUID is generated using a cryptographically strong pseudo random number generator.

如果它的加密强度很高，那么它对你来说就足够了:)

嘿，如果您有疑问，只需生成大量 UUID 并检查它们中的任何两个是否相同:)

what exactly I should do to generate the best UUIDs?

好吧，如果您不知道，请使用 UUID 版本 1。但是如果您需要不可预测或随机的值，请使用 UUID 版本 4。

另外请记住，如果您需要从很多 UUID 值构建数据库索引，那么最好让这些值稍微增加一些以获得更好的插入性能 - UUID 版本 1 在这种情况下比版本 4 更好。

编辑：java.util.UUID API 似乎没有提供生成版本 1 UUID 的简单方法。希望这会有所帮助：

How to generate time based UUIDs?

为大数据生成最佳 UUID

Generate best UUIDs for big data

java

uuid

hadoop

bigdata