如何在 Ruby 的 CSV 模块中 "observe" 一个流？

Question

我正在写一个 class 获取 CSV 文件，转换它，然后写出新数据。

module Transformer
  class Base
    def initialize(file)
      @file = file
    end

    def original_data(&block)
      opts = { headers: true }
      CSV.open(file, 'rb', opts, &block)
    end

    def transformer
      # complex manipulations here like modifying columns, picking only certain
      # columns to put into new_data, etc but simplified to `+10` to keep
      # example concise
      -> { |row| new_data << row['some_header'] + 10 }
    end

    def transformed_data
      self.original_data(self.transformer) 
    end

    def write_new_data
      CSV.open('new_file.csv', 'wb', opts) do |new_data|
        transformed_data
      end
    end
  end
end

我希望能够做的是：

在不写出的情况下查看转换后的数据（因此我可以测试它是否正确转换了数据，而且我不需要立即将其写入文件：也许我想在写出来之前做更多的操作）
不要一次吞噬所有文件，因此无论原始数据的大小如何都有效
将其作为基础 class 和一个空的 transformer 这样实例只需要实现自己的转换器，而读写行为由基础 class 给出.

但显然上面的方法不起作用，因为我在 transformer.

中并没有真正引用 new_data

我怎样才能优雅地实现这一点？

Answer 1

我可以根据您的需要和个人品味推荐两种方法中的一种。

为了清楚起见，我特意将代码提炼到最低限度（没有包装 class）。

1。简单 read-modify-write 循环

因为你不想吞噬文件，所以使用CSV::Foreach。例如，为了快速调试 session，执行：

CSV.foreach "source.csv", headers: true do |row|
  row["name"] = row["name"].upcase
  row["new column"] = "new value"
  p row
end

如果您希望在同一迭代期间写入文件：

require 'csv'

csv_options = { headers: true }

# Open the target file for writing
CSV.open("target.csv", "wb") do |target|
  # Add a header
  target << %w[new header column names]

  # Iterate over the source CSV rows
  CSV.foreach "source.csv", **csv_options do |row|
    # Mutate and add columns
    row["name"] = row["name"].upcase
    row["new column"] = "new value"

    # Push the new row to the target file
    target << row
  end
end

2。使用 `CSV::Converters`

有一个可能有用的内置功能 - CSV::Converters -（请参阅 CSV::New 文档中的 :converters 定义）

require 'csv'

# Register a converter in the options hash
csv_options = { headers: true, converters: [:stripper] }

# Define a converter
CSV::Converters[:stripper] = lambda do |value, field|
  value ? value.to_s.strip : value
end

CSV.open("target.csv", "wb") do |target|
  # same as above

  CSV.foreach "source.csv", **csv_options do |row|
    # same as above - input data will already be converted
    # you can do additional things here if needed
  end
end

3。将转换器的输入和输出分开 classes

根据您的评论，并且由于您希望最小化 I/O 和迭代，从转换器的职责中提取 read/write 操作可能会很有趣。像这样。

require 'csv'

class NameCapitalizer
  def self.call(row)
    row["name"] = row["name"].upcase
  end
end

class EmailRemover
  def self.call(row)
    row.delete 'email'
  end
end

csv_options = { headers: true }
converters = [NameCapitalizer, EmailRemover]

CSV.open("target.csv", "wb") do |target|
  CSV.foreach "source.csv", **csv_options do |row|
    converters.each { |c| c.call row }
    target << row
  end
end

请注意，上面的代码仍然没有处理 header，以防它被更改。您可能必须保留最后一行（在所有转换之后）并将其 #headers 添加到输出 CSV 中。

可能还有很多其他方法可以做到这一点，但是 Ruby 中的 CSV class 没有最干净的界面，所以我尽量让处理它的代码尽可能简单可以。

如何在 Ruby 的 CSV 模块中 "observe" 一个流？

How to "observe" a stream in Ruby's CSV module?

ruby

export-to-csv

1。简单 read-modify-write 循环

2。使用 `CSV::Converters`

3。将转换器的输入和输出分开 classes

如何在 Ruby 的 CSV 模块中 "observe" 一个流？

How to "observe" a stream in Ruby's CSV module?

ruby

export-to-csv

1。简单 read-modify-write 循环

2。使用 CSV::Converters

3。将转换器的输入和输出分开 classes

2。使用 `CSV::Converters`