Knex with PostgreSQL select 查询多个并行请求时性能极度下降
Knex with PostgreSQL select query extremely performance degradation on multiple parallel requests
简述
我正在开发一款(梦想的)游戏,我的后端堆栈是 Node.js 和带有 Knex 的 PostgreSQL(9.6)。我在这里保存所有玩家数据,我需要经常请求它。
其中一个请求需要发出 10 个简单的 select 来拉取数据,这就是问题开始的地方:如果服务器同时只处理 1 个请求,这些查询非常快(~1 毫秒)。但是如果服务器并行处理许多请求(100-400),查询执行时间会大大降低(每个查询可能长达几秒)
详情
为了更objective,我将描述服务器的请求目标,select查询和我收到的结果。
关于系统
我 运行 Digital Ocean 4cpu/8gb droplet 和 Postgres 上的节点代码在同一个 conf(2 个不同的 droplet,相同的配置)
关于请求
它需要做一些游戏操作,为此他select从 DB
为 2 个玩家提供数据
DDL
5个表格表示的玩家数据:
CREATE TABLE public.player_profile(
id integer NOT NULL DEFAULT nextval('player_profile_id_seq'::regclass),
public_data integer NOT NULL,
private_data integer NOT NULL,
current_active_deck_num smallint NOT NULL DEFAULT '0'::smallint,
created_at bigint NOT NULL DEFAULT '0'::bigint,
CONSTRAINT player_profile_pkey PRIMARY KEY (id),
CONSTRAINT player_profile_private_data_foreign FOREIGN KEY (private_data)
REFERENCES public.profile_private_data (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT player_profile_public_data_foreign FOREIGN KEY (public_data)
REFERENCES public.profile_public_data (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
CREATE TABLE public.player_character_data(
id integer NOT NULL DEFAULT nextval('player_character_data_id_seq'::regclass),
owner_player integer NOT NULL,
character_id integer NOT NULL,
experience_counter integer NOT NULL,
level_counter integer NOT NULL,
character_name character varying(255) COLLATE pg_catalog."default" NOT NULL,
created_at bigint NOT NULL DEFAULT '0'::bigint,
CONSTRAINT player_character_data_pkey PRIMARY KEY (id),
CONSTRAINT player_character_data_owner_player_foreign FOREIGN KEY (owner_player)
REFERENCES public.player_profile (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
CREATE TABLE public.player_cards(
id integer NOT NULL DEFAULT nextval('player_cards_id_seq'::regclass),
card_id integer NOT NULL,
owner_player integer NOT NULL,
card_level integer NOT NULL,
first_deck boolean NOT NULL,
consumables integer NOT NULL,
second_deck boolean NOT NULL DEFAULT false,
third_deck boolean NOT NULL DEFAULT false,
quality character varying(10) COLLATE pg_catalog."default" NOT NULL DEFAULT 'none'::character varying,
CONSTRAINT player_cards_pkey PRIMARY KEY (id),
CONSTRAINT player_cards_owner_player_foreign FOREIGN KEY (owner_player)
REFERENCES public.player_profile (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
CREATE TABLE public.player_character_equipment(
id integer NOT NULL DEFAULT nextval('player_character_equipment_id_seq'::regclass),
owner_character integer NOT NULL,
item_id integer NOT NULL,
item_level integer NOT NULL,
item_type character varying(20) COLLATE pg_catalog."default" NOT NULL,
is_equipped boolean NOT NULL,
slot_num integer,
CONSTRAINT player_character_equipment_pkey PRIMARY KEY (id),
CONSTRAINT player_character_equipment_owner_character_foreign FOREIGN KEY (owner_character)
REFERENCES public.player_character_data (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
CREATE TABLE public.player_character_runes(
id integer NOT NULL DEFAULT nextval('player_character_runes_id_seq'::regclass),
owner_character integer NOT NULL,
item_id integer NOT NULL,
slot_num integer,
decay_start_timestamp bigint,
CONSTRAINT player_character_runes_pkey PRIMARY KEY (id),
CONSTRAINT player_character_runes_owner_character_foreign FOREIGN KEY (owner_character)
REFERENCES public.player_character_data (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
有索引
knex.raw('create index "player_cards_owner_player_first_deck_index" on "player_cards"("owner_player") WHERE first_deck = TRUE');
knex.raw('create index "player_cards_owner_player_second_deck_index" on "player_cards"("owner_player") WHERE second_deck = TRUE');
knex.raw('create index "player_cards_owner_player_third_deck_index" on "player_cards"("owner_player") WHERE third_deck = TRUE');
knex.raw('create index "player_character_equipment_owner_character_is_equipped_index" on "player_character_equipment" ("owner_character") WHERE is_equipped = TRUE');
knex.raw('create index "player_character_runes_owner_character_slot_num_not_null_index" on "player_character_runes" ("owner_character") WHERE slot_num IS NOT NULL');
密码
第一次查询
async.parallel([
cb => tx('player_character_data')
.select('character_id', 'id')
.where('owner_player', playerId)
.limit(1)
.asCallback(cb),
cb => tx('player_character_data')
.select('character_id', 'id')
.where('owner_player', enemyId)
.limit(1)
.asCallback(cb)
], callbackFn);
第二次查询
async.parallel([
cb => tx('player_profile')
.select('current_active_deck_num')
.where('id', playerId)
.asCallback(cb),
cb => tx('player_profile')
.select('current_active_deck_num')
.where('id', enemyId)
.asCallback(cb)
], callbackFn);
第三问
playerQ = { first_deck: true }
enemyQ = { first_deck: true }
MAX_CARDS_IN_DECK = 5
async.parallel([
cb => tx('player_cards')
.select('card_id', 'card_level')
.where('owner_player', playerId)
.andWhere(playerQ)
.limit(MAX_CARDS_IN_DECK)
.asCallback(cb),
cb => tx('player_cards')
.select('card_id', 'card_level')
.where('owner_player', enemyId)
.andWhere(enemyQ)
.limit(MAX_CARDS_IN_DECK)
.asCallback(cb)
], callbackFn);
第四问
MAX_EQUIPPED_ITEMS = 3
async.parallel([
cb => tx('player_character_equipment')
.select('item_id', 'item_level')
.where('owner_character', playerCharacterUniqueId)
.andWhere('is_equipped', true)
.limit(MAX_EQUIPPED_ITEMS)
.asCallback(cb),
cb => tx('player_character_equipment')
.select('item_id', 'item_level')
.where('owner_character', enemyCharacterUniqueId)
.andWhere('is_equipped', true)
.limit(MAX_EQUIPPED_ITEMS)
.asCallback(cb)
], callbackFn);
第五个
runeSlotsMax = 3
async.parallel([
cb => tx('player_character_runes')
.select('item_id', 'decay_start_timestamp')
.where('owner_character', playerCharacterUniqueId)
.whereNotNull('slot_num')
.limit(runeSlotsMax)
.asCallback(cb),
cb => tx('player_character_runes')
.select('item_id', 'decay_start_timestamp')
.where('owner_character', enemyCharacterUniqueId)
.whereNotNull('slot_num')
.limit(runeSlotsMax)
.asCallback(cb)
], callbackFn);
解释(分析)
只有索引扫描和 <1 毫秒的计划和执行时间。有需要可以发表(没发表是为了省事space)
时间本身
(total是请求数,min/max /avg/median 用于响应时间)
- 4 个并发请求:
{ "total": 300, "avg": 1.81, "median": 2, "min": 1, "max": 6 }
- 400 个并发请求:
{ "total": 300, "avg": 209.57666666666665, "median": 176, "min": 9, "max": 1683 }
- 第一个select
{ "total": 300, "avg": 2105.9, "median": 2005, "min": 1563, "max": 4074 }
- 最后一个 select
我试图将执行时间超过 100 毫秒的缓慢查询放入日志中 - 没有。还尝试将连接池大小增加到并行请求数 - 什么也没有。
我可以在这里看到三个潜在的问题:
- 400个并发请求其实挺多的,你的机器规格也没什么好激动的。也许这与我的 MSSQL 背景有关,但我想这是您可能需要加强硬件的情况。
- 两台服务器之间的通信应该非常快,但可能会导致您看到的部分延迟。一台功能强大的服务器可能是更好的解决方案。
- 我假设您有合理的数据量(400 个并发连接应该有很多要存储)。也许发布一些实际生成的 SQL 可能会有用。很大程度上取决于 SQL Knex 提出的,并且可能有可用的优化可以使用。索引浮现在脑海中,但需要看到 SQL 才能确定。
您的测试似乎不包括来自客户端的网络延迟,因此这可能是您尚未考虑的另一个问题。
很快就找到了解决方案,但是忘了在这里回复(很忙,抱歉)。
慢查询没有魔法,只有节点的事件循环性质:
- 所有 silimar 请求都是并行发出的;
- 我有一个执行时间很慢的代码块(~150-200ms);
- 如果您有约 800 个并行请求,150 毫秒的代码块将转换为约 10000 毫秒的事件循环延迟;
- 你所看到的只是缓慢请求的可见性,但这只是回调函数的延迟,而不是数据库;
结论:使用pgBadger
检测慢查询和isBusy
模块检测事件循环滞后
简述
我正在开发一款(梦想的)游戏,我的后端堆栈是 Node.js 和带有 Knex 的 PostgreSQL(9.6)。我在这里保存所有玩家数据,我需要经常请求它。 其中一个请求需要发出 10 个简单的 select 来拉取数据,这就是问题开始的地方:如果服务器同时只处理 1 个请求,这些查询非常快(~1 毫秒)。但是如果服务器并行处理许多请求(100-400),查询执行时间会大大降低(每个查询可能长达几秒)
详情
为了更objective,我将描述服务器的请求目标,select查询和我收到的结果。
关于系统
我 运行 Digital Ocean 4cpu/8gb droplet 和 Postgres 上的节点代码在同一个 conf(2 个不同的 droplet,相同的配置)
关于请求
它需要做一些游戏操作,为此他select从 DB
为 2 个玩家提供数据DDL
5个表格表示的玩家数据:
CREATE TABLE public.player_profile(
id integer NOT NULL DEFAULT nextval('player_profile_id_seq'::regclass),
public_data integer NOT NULL,
private_data integer NOT NULL,
current_active_deck_num smallint NOT NULL DEFAULT '0'::smallint,
created_at bigint NOT NULL DEFAULT '0'::bigint,
CONSTRAINT player_profile_pkey PRIMARY KEY (id),
CONSTRAINT player_profile_private_data_foreign FOREIGN KEY (private_data)
REFERENCES public.profile_private_data (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT player_profile_public_data_foreign FOREIGN KEY (public_data)
REFERENCES public.profile_public_data (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
CREATE TABLE public.player_character_data(
id integer NOT NULL DEFAULT nextval('player_character_data_id_seq'::regclass),
owner_player integer NOT NULL,
character_id integer NOT NULL,
experience_counter integer NOT NULL,
level_counter integer NOT NULL,
character_name character varying(255) COLLATE pg_catalog."default" NOT NULL,
created_at bigint NOT NULL DEFAULT '0'::bigint,
CONSTRAINT player_character_data_pkey PRIMARY KEY (id),
CONSTRAINT player_character_data_owner_player_foreign FOREIGN KEY (owner_player)
REFERENCES public.player_profile (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
CREATE TABLE public.player_cards(
id integer NOT NULL DEFAULT nextval('player_cards_id_seq'::regclass),
card_id integer NOT NULL,
owner_player integer NOT NULL,
card_level integer NOT NULL,
first_deck boolean NOT NULL,
consumables integer NOT NULL,
second_deck boolean NOT NULL DEFAULT false,
third_deck boolean NOT NULL DEFAULT false,
quality character varying(10) COLLATE pg_catalog."default" NOT NULL DEFAULT 'none'::character varying,
CONSTRAINT player_cards_pkey PRIMARY KEY (id),
CONSTRAINT player_cards_owner_player_foreign FOREIGN KEY (owner_player)
REFERENCES public.player_profile (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
CREATE TABLE public.player_character_equipment(
id integer NOT NULL DEFAULT nextval('player_character_equipment_id_seq'::regclass),
owner_character integer NOT NULL,
item_id integer NOT NULL,
item_level integer NOT NULL,
item_type character varying(20) COLLATE pg_catalog."default" NOT NULL,
is_equipped boolean NOT NULL,
slot_num integer,
CONSTRAINT player_character_equipment_pkey PRIMARY KEY (id),
CONSTRAINT player_character_equipment_owner_character_foreign FOREIGN KEY (owner_character)
REFERENCES public.player_character_data (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
CREATE TABLE public.player_character_runes(
id integer NOT NULL DEFAULT nextval('player_character_runes_id_seq'::regclass),
owner_character integer NOT NULL,
item_id integer NOT NULL,
slot_num integer,
decay_start_timestamp bigint,
CONSTRAINT player_character_runes_pkey PRIMARY KEY (id),
CONSTRAINT player_character_runes_owner_character_foreign FOREIGN KEY (owner_character)
REFERENCES public.player_character_data (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
有索引
knex.raw('create index "player_cards_owner_player_first_deck_index" on "player_cards"("owner_player") WHERE first_deck = TRUE');
knex.raw('create index "player_cards_owner_player_second_deck_index" on "player_cards"("owner_player") WHERE second_deck = TRUE');
knex.raw('create index "player_cards_owner_player_third_deck_index" on "player_cards"("owner_player") WHERE third_deck = TRUE');
knex.raw('create index "player_character_equipment_owner_character_is_equipped_index" on "player_character_equipment" ("owner_character") WHERE is_equipped = TRUE');
knex.raw('create index "player_character_runes_owner_character_slot_num_not_null_index" on "player_character_runes" ("owner_character") WHERE slot_num IS NOT NULL');
密码
第一次查询
async.parallel([
cb => tx('player_character_data')
.select('character_id', 'id')
.where('owner_player', playerId)
.limit(1)
.asCallback(cb),
cb => tx('player_character_data')
.select('character_id', 'id')
.where('owner_player', enemyId)
.limit(1)
.asCallback(cb)
], callbackFn);
第二次查询
async.parallel([
cb => tx('player_profile')
.select('current_active_deck_num')
.where('id', playerId)
.asCallback(cb),
cb => tx('player_profile')
.select('current_active_deck_num')
.where('id', enemyId)
.asCallback(cb)
], callbackFn);
第三问
playerQ = { first_deck: true }
enemyQ = { first_deck: true }
MAX_CARDS_IN_DECK = 5
async.parallel([
cb => tx('player_cards')
.select('card_id', 'card_level')
.where('owner_player', playerId)
.andWhere(playerQ)
.limit(MAX_CARDS_IN_DECK)
.asCallback(cb),
cb => tx('player_cards')
.select('card_id', 'card_level')
.where('owner_player', enemyId)
.andWhere(enemyQ)
.limit(MAX_CARDS_IN_DECK)
.asCallback(cb)
], callbackFn);
第四问
MAX_EQUIPPED_ITEMS = 3
async.parallel([
cb => tx('player_character_equipment')
.select('item_id', 'item_level')
.where('owner_character', playerCharacterUniqueId)
.andWhere('is_equipped', true)
.limit(MAX_EQUIPPED_ITEMS)
.asCallback(cb),
cb => tx('player_character_equipment')
.select('item_id', 'item_level')
.where('owner_character', enemyCharacterUniqueId)
.andWhere('is_equipped', true)
.limit(MAX_EQUIPPED_ITEMS)
.asCallback(cb)
], callbackFn);
第五个
runeSlotsMax = 3
async.parallel([
cb => tx('player_character_runes')
.select('item_id', 'decay_start_timestamp')
.where('owner_character', playerCharacterUniqueId)
.whereNotNull('slot_num')
.limit(runeSlotsMax)
.asCallback(cb),
cb => tx('player_character_runes')
.select('item_id', 'decay_start_timestamp')
.where('owner_character', enemyCharacterUniqueId)
.whereNotNull('slot_num')
.limit(runeSlotsMax)
.asCallback(cb)
], callbackFn);
解释(分析)
只有索引扫描和 <1 毫秒的计划和执行时间。有需要可以发表(没发表是为了省事space)
时间本身
(total是请求数,min/max /avg/median 用于响应时间)
- 4 个并发请求:
{ "total": 300, "avg": 1.81, "median": 2, "min": 1, "max": 6 }
- 400 个并发请求:
{ "total": 300, "avg": 209.57666666666665, "median": 176, "min": 9, "max": 1683 }
- 第一个select{ "total": 300, "avg": 2105.9, "median": 2005, "min": 1563, "max": 4074 }
- 最后一个 select
我试图将执行时间超过 100 毫秒的缓慢查询放入日志中 - 没有。还尝试将连接池大小增加到并行请求数 - 什么也没有。
我可以在这里看到三个潜在的问题:
- 400个并发请求其实挺多的,你的机器规格也没什么好激动的。也许这与我的 MSSQL 背景有关,但我想这是您可能需要加强硬件的情况。
- 两台服务器之间的通信应该非常快,但可能会导致您看到的部分延迟。一台功能强大的服务器可能是更好的解决方案。
- 我假设您有合理的数据量(400 个并发连接应该有很多要存储)。也许发布一些实际生成的 SQL 可能会有用。很大程度上取决于 SQL Knex 提出的,并且可能有可用的优化可以使用。索引浮现在脑海中,但需要看到 SQL 才能确定。
您的测试似乎不包括来自客户端的网络延迟,因此这可能是您尚未考虑的另一个问题。
很快就找到了解决方案,但是忘了在这里回复(很忙,抱歉)。
慢查询没有魔法,只有节点的事件循环性质:
- 所有 silimar 请求都是并行发出的;
- 我有一个执行时间很慢的代码块(~150-200ms);
- 如果您有约 800 个并行请求,150 毫秒的代码块将转换为约 10000 毫秒的事件循环延迟;
- 你所看到的只是缓慢请求的可见性,但这只是回调函数的延迟,而不是数据库;
结论:使用pgBadger
检测慢查询和isBusy
模块检测事件循环滞后