Shell 用于查找空 Hive 数据库的脚本

Question

我正在处理删除空 Hive 数据库的审计过程。我有大量的数据库需要访问，我想在 Linux 中使用 shell 脚本 (.sh)，它可以运行 hive -e 查询来识别清空数据库并将它们列在一些输出文件或日志中（想知道是否可以选择 .txt 文件？）。然后我会将此列表发送给我们的管理员以“删除”那些空数据库。我们所有的数据库都遵循完全相同的命名约定：始终使用“环境”和“区域”……只有“状态”有所不同。 >>> environment_area_<state>

现在我正在使用以下查询来完成工作，但它非常手动且非常慢......我最终会在 Linux 命令行上花费大量时间。

我首先在 PuTTY 中连接到 Hive，一旦连接，我运行:

show databases;
use environment_area_<state>;
show tables;

如果数据库中没有显示任何表，我将其添加到需要删除的数据库列表中。我运行对每个数据库反复进行“使用”和“显示表”查询。

如您所知，这是一种非常耗时的方法，创建 shell 脚本会很有帮助。

我在网上搜索并观看了很多 YouTube 教程，但没有遇到可以帮助我的用例。希望对 shell 脚本更有经验的人可以帮助我超越 #!/bin/bash 然后是我上面列出的查询。

Answer 1

要开始一些事情，您可以修改此脚本。我没有检查。也许 show tables returns 一些 header 或额外的换行符，然后相应地修改脚本（wc -l 计算输出中的换行符）。

脚本：

#!/bin/bash

for db in $(hive -S -e "show databases;") 
do
   tbl_count=$(hive -S -e "use $db; show tables;" | wc -l)
   echo "Database $db contains $tbl_count tables."
   
   if [ ${tbl_count} -eq 0 ]; then
     # Add db name to the file
     echo "$db" >> empty_databases_list.txt 
     # Do something else, for example drop db, etc
   fi
done

Answer 2

让 Hive 在后台响应命令可能会显着提高性能：

#!/usr/bin/env bash

tempdir=$(mktemp -d)
# Cleanup at end of execution
trap 'rm -fr -- "$tempdir";exit' EXIT INT

hivein="$tempdir/hivein"
hiveout="$tempdir/hiveout"
mkfifo "$hivein" "$hiveout"

# Prepare file descriptors IO to talk to hive
exec 3<>"$hivein"
exec 4<>"$hiveout"

# Launch hive in the background
hive -S <&3 >&4 &

# Initialise hive
printf '%s\n' 'set hive.cli.print.header=false;' >&3

# Wait for hive response and get databases list
printf '%s\n' "SHOW DATABASES LIKE 'environment_area_*';" >&3
mapfile -u 4 -t databases

empty_databases=()

for db in "${databases[@]}"; do
  printf 'USE %s; SHOW TABLES;\n' "$db" >&3
  mapfile -u 4 -t tables
  tbl_count="${#tables[@]}"
  printf 'Database %s contains %d tables.\n' "$db" "$tbl_count"

  if [ "$tbl_count" -eq 0 ]; then
    # record empty db
    empty_databases+=("$db")
  fi
done

# Close the hive-cli in case closing the file descriptors is not enough
printf '%s\n' '!exit' >&3

printf '%s\n' "${empty_databases[@]}" >empty_databases_list.txt

Shell 用于查找空 Hive 数据库的脚本

Shell Script to find empty Hive databases

bash

shell

hive

sh