使用 MongoDB 存储不可变数据?
Using MongoDB to store immutable data?
我们研究了存储和读取大量 immutable 数据(事件)的选项,我希望得到一些关于 MongoDB 是否合适的反馈。
要求:
- 我们需要每秒存储大约 10 个事件(但速率会增加)。每个事件都很小,大约 1 Kb。将所有这些事件存储在同一个集合中可以吗?
- 一个非常重要的要求是我们需要能够按顺序重播所有事件。我读过 here MongoDB 在使用游标对文档进行排序时有 32 Mb 的限制。对我们来说,按插入顺序读取所有数据会很好(比如 table 扫描),因此可能不需要显式排序?游标是可行的方法吗?它们能否满足此要求?
如果 MongoDB 很适合这个,那么可以调整一些配置或设置来提高 immutable 数据的性能或可靠性吗?
这与存储日志非常相似:大量写入,然后按顺序读回数据。幸运的是 Mongo 网站有一个食谱:
https://docs.mongodb.org/ecosystem/use-cases/storing-log-data/
关于数据的不变性,这对 MongoDB 来说不是问题。
编辑 2022-02-19:
替换link:
https://web.archive.org/web/20150917095005/docs.mongodb.org/ecosystem/use-cases/storing-log-data/
网页内容片段:
This document outlines the basic patterns and principles for using
MongoDB as a persistent storage engine for log data from servers and
other machine data.
Problem Servers generate a large number of events (i.e. logging,) that
contain useful information about their operation including errors,
warnings, and users behavior. By default, most servers, store these
data in plain text log files on their local file systems.
While plain-text logs are accessible and human-readable, they are
difficult to use, reference, and analyze without holistic systems for
aggregating and storing these data.
Solution The solution described below assumes that each server
generates events also consumes event data and that each server can
access the MongoDB instance. Furthermore, this design assumes that the
query rate for this logging data is substantially lower than common
for logging applications with a high-bandwidth event stream.
NOTE
This case assumes that you’re using a standard uncapped collection for
this event data, unless otherwise noted. See the section on capped
collections
Schema Design The schema for storing log data in MongoDB depends on
the format of the event data that you’re storing. For a simple
example, consider standard request logs in the combined format from
the Apache HTTP Server. A line from these logs may resemble the
following:
我们研究了存储和读取大量 immutable 数据(事件)的选项,我希望得到一些关于 MongoDB 是否合适的反馈。
要求:
- 我们需要每秒存储大约 10 个事件(但速率会增加)。每个事件都很小,大约 1 Kb。将所有这些事件存储在同一个集合中可以吗?
- 一个非常重要的要求是我们需要能够按顺序重播所有事件。我读过 here MongoDB 在使用游标对文档进行排序时有 32 Mb 的限制。对我们来说,按插入顺序读取所有数据会很好(比如 table 扫描),因此可能不需要显式排序?游标是可行的方法吗?它们能否满足此要求?
如果 MongoDB 很适合这个,那么可以调整一些配置或设置来提高 immutable 数据的性能或可靠性吗?
这与存储日志非常相似:大量写入,然后按顺序读回数据。幸运的是 Mongo 网站有一个食谱:
https://docs.mongodb.org/ecosystem/use-cases/storing-log-data/
关于数据的不变性,这对 MongoDB 来说不是问题。
编辑 2022-02-19:
替换link: https://web.archive.org/web/20150917095005/docs.mongodb.org/ecosystem/use-cases/storing-log-data/
网页内容片段:
This document outlines the basic patterns and principles for using MongoDB as a persistent storage engine for log data from servers and other machine data.
Problem Servers generate a large number of events (i.e. logging,) that contain useful information about their operation including errors, warnings, and users behavior. By default, most servers, store these data in plain text log files on their local file systems.
While plain-text logs are accessible and human-readable, they are difficult to use, reference, and analyze without holistic systems for aggregating and storing these data.
Solution The solution described below assumes that each server generates events also consumes event data and that each server can access the MongoDB instance. Furthermore, this design assumes that the query rate for this logging data is substantially lower than common for logging applications with a high-bandwidth event stream.
NOTE
This case assumes that you’re using a standard uncapped collection for this event data, unless otherwise noted. See the section on capped collections
Schema Design The schema for storing log data in MongoDB depends on the format of the event data that you’re storing. For a simple example, consider standard request logs in the combined format from the Apache HTTP Server. A line from these logs may resemble the following: