使用 Ruby 拆分 MySQL 查询的行并写入 CSV 文件

Splitting the Rows of a MySQL Query Using Ruby and Writing to a CSV File

使用 Ruby 自动执行来自远程数据库的 MySQL 查询,我希望根据下面找到的 month 查询的值拆分行。

这是为了根据开始日期为所有客户生成 2014 年 6 月份的 week-by-week(星期三到下一个星期二)报告。虽然报告中的其他内容不会发生变化,但行的重复是基于该开始日期(在下面的 case 语句中解释)。

请注意此处 mysql2watircsv gem 的使用。

简化代码:

#!/usr/local/bin/ruby
require "mysql2"
require "watir"
require "csv"

puts "Initializing Report"

Mysql2::Client.default_query_options.merge!(:as => :array)

mysql = Mysql2::Client.new(:host => "1.2.3.4", :username => "user", :pass => "password", :database => "db")

puts "Successfully accessed db"

month = mysql.query("SELECT DATE_FORMAT(db.table.start, '%m') FROM db.table WHERE db.start.group = 1;")

day = mysql.query("SELECT DATE_FORMAT(db.table.start, '%d') FROM db.table WHERE db.start.group = 1;")

report = mysql.query("SELECT db.table.client, SELECT DATE_FORMAT(db.table.start, '%m/%d/%Y'), SELECT DATE_FORMAT(db.table.end, '%m/%d/%Y') FROM db.table WHERE db.start.group = 1;")

case month
when 5
  # code splitting one row into four
when 6
  if day <= 4
    # code splitting one row into four using weekOf
  elsif day >= 11 and day <= 17
    # code splitting one row into three using weekOf
  elsif day >= 18 and day <= 24
    # code splitting one row into two using weekOf
  else
    # no splitting; only one row using weekOf
  end
end

CSV.open("Report.csv", "wb") do |csv|
  csv << ["Week of", "Client", "Start Date", "End Date"]
  weekOf.zip(report).each {|row| csv << row.flatten}
end

puts "Results can be found in Report.csv"

当前输出(如果我要注释掉 case 语句,删除 CSV header 中的 "Week of", 并且只将 report 查询写入 CSV ):

Client, Start Date, End Date
companyrecordlabel, 05/20/2014, 07/09/2015
beeUrself, 05/27/2014, 02/01/2016
overflowStack, 06/04/2014, 12/11/2015
chapoChaps, 06/11/2014, 01/16/2016
Meds4U, 06/18/2014, NULL
  .
  .
  .

我希望得到以下输出:

Week of, Client, Start Date, End Date
06/04/2014, companyrecordlabel, 05/20/2014, 07/09/2015
06/11/2014, companyrecordlabel, 05/20/2014, 07/09/2015
06/18/2014, companyrecordlabel, 05/20/2014, 07/09/2015
06/25/2014, companyrecordlabel, 05/20/2014, 07/09/2015
06/04/2014, beeUrself, 05/27/2014, 02/01/2016
06/11/2014, beeUrself, 05/27/2014, 02/01/2016
06/18/2014, beeUrself, 05/27/2014, 02/01/2016
06/25/2014, beeUrself, 05/27/2014, 02/01/2016
06/04/2014, overflowStack, 06/04/2014, 12/11/2015
06/11/2014, overflowStack, 06/04/2014, 12/11/2015
06/18/2014, overflowStack, 06/04/2014, 12/11/2015
06/25/2014, overflowStack, 06/04/2014, 12/11/2015
06/11/2014, chapoChaps, 06/11/2014, 01/16/2016
06/18/2014, chapoChaps, 06/11/2014, 01/16/2016
06/25/2014, chapoChaps, 06/11/2014, 01/16/2016
06/18/2014, Meds4U, 06/18/2014, NULL
06/25/2014, Meds4U, 06/18/2014, NULL
  .
  .
  .

为清楚起见:"Client"companyrecordlabel 有四行,因为它的 "Start Date" 在五月,而 "Client" Meds4U 只分成两行行,因为它的 "Start Date" 是在 6 月 18 日。

我根据几个假设为以下答案构建了 FULL 代码:

  • 没有DATE_FORMAT(db.table.end, '%m') = 6
  • 你希望所有的 列出的公司按其所在的顺序排列(即 db.table.id)
  • 查询时间对您来说不是什么大问题
  • 您想要但不能或忘记包含一个名为 weekOf
  • 的数组

您在查询中似乎也多次使用 SELECT 一词。即使对于像您提供的示例一样小的查询,您也可能希望将其分开并避免将其全部放在一行中:

month = mysql.query("SELECT DATE_FORMAT(db.table.start, '%m')
    FROM db.table
    WHERE db.start.group = 1;")

而不是:

month = mysql.query("SELECT DATE_FORMAT(db.table.start, '%m') FROM db.table WHERE db.start.group = 1;")

现在是代码本身:

#!/usr/local/bin/ruby
require "mysql2"
require "watir"
require "csv"

puts "Initializing Report"

Mysql2::Client.default_query_options.merge!(:as => :array)

mysql = Mysql2::Client.new(:host => "1.2.3.4", :username => "user", :pass => "password", :database => "db")

puts "Successfully accessed db"

date = mysql.query("SELECT DATE_FORMAT(db.table.start, '%m'),
  DATE_FORMAT(db.table.start, '%d')
  FROM db.table
  WHERE db.start.group = 1;")

report = mysql.query("SELECT c, s, e FROM (SELECT * FROM (SELECT db.table.id
  db.table.client AS c,
  DATE_FORMAT(db.table.start, '%m/%d/%Y') AS s,
  DATE_FORMAT(db.table.end, '%m/%d/%Y') AS e
  FROM db.table
  WHERE db.start.group = 1
  UNION ALL
  SELECT db.table.id
  db.table.client AS c,
  DATE_FORMAT(db.table.start, '%m/%d/%Y') AS s,
  DATE_FORMAT(db.table.end, '%m/%d/%Y') AS e
  FROM db.table
  WHERE db.start.group = 1
  HAVING ((DATE_FORMAT(db.table.start, '%m') = 5) OR (DATE_FORMAT(db.table.start, '%d') <= 4))
  UNION ALL
  SELECT db.table.id
  db.table.client AS c,
  DATE_FORMAT(db.table.start, '%m/%d/%Y') AS s,
  DATE_FORMAT(db.table.end, '%m/%d/%Y') AS e
  FROM db.table
  WHERE db.start.group = 1
  HAVING ((DATE_FORMAT(db.table.start, '%m') = 5) OR (DATE_FORMAT(db.table.start, '%d') <= 11))
  UNION ALL
  SELECT db.table.id
  db.table.client AS c,
  DATE_FORMAT(db.table.start, '%m/%d/%Y') AS s,
  DATE_FORMAT(db.table.end, '%m/%d/%Y') AS e
  FROM db.table
  WHERE db.start.group = 1
  HAVING ((DATE_FORMAT(db.table.start, '%m') = 5) OR (DATE_FORMAT(db.table.start, '%d') <= 18))) AS alias
  ORDER BY db.table.id) AS alias2;")

weekOf = []

date.each do |mon, day|
  if mon === 5
    weekOf << "06/04/2014"
    weekOf << "06/11/2014"
    weekOf << "06/18/2014"
    weekOf << "06/25/2014"
  elsif mon === 6
    if (day.to_i <= 4)
      weekOf << "06/04/2014"
      weekOf << "06/11/2014"
      weekOf << "06/18/2014"
      weekOf << "06/25/2014"
    elsif ((day.to_i >= 11) && (day.to_i <= 17))
      weekOf << "06/11/2014"
      weekOf << "06/18/2014"
      weekOf << "06/25/2014"
    elsif ((day.to_i >= 18) && (day.to_i <= 24))
      weekOf << "06/18/2014"
      weekOf << "06/25/2014"
    else
      weekOf << "06/25/2014"
    end
  else
    puts "Error: #{mon} is before May"
  end
end

CSV.open("Report.csv", "wb") do |csv|
  csv << ["Week of", "Client", "Start Date", "End Date"]
  weekOf.zip(report).each {|row| csv << row.flatten}
end

puts "Results can be found in Report.csv"

解释:

我假设查询时间对您来说不是大问题,因为您的示例查询相当小并且不包含 JOIN。如果您发现您的查询变得大于 10 个左右 INNER JOIN(例如 table,每个条目都有数十万个条目),那么这可能不再是您的最佳解决方案。

此解决方案有 两个 部分。

第一个 是使用UNION ALL 从数据库本身复制行。这意味着重复整个查询并在下面添加条件以指定何时发生这种重复。

这就是 HAVING 子句的用武之地。当使用 UNION ALL 时,必须以这种方式使用 HAVING 而不是 WHERE;因为后者会导致 MySQL.

错误

还要记住,作为子查询结果创建的每个 MySQL table 都必须有一个别名:aliasalias2。我使用的不是一个而是两个嵌套查询,以便 ORDER BY db.table.id(脱离我的一个假设)然后 select 仅我们下一部分需要的列。

最后,我将两个单独的 monthday 组合在一起,而不是将它们变成一个 date:这将在迭代时 return 一个二维数组。

第二个:我创建了您可能想要但忘记包含的 weekOf 数组。

然后我迭代 date 以便将右边的 "06/#{day}/2014" 推入 weekOf 数组。

就是这样!希望对您有所帮助。