如何编写导入数据和处理删除的 Rake 任务？

Question

我想做与这个问题 How to write Rake task to import data to Rails app? 中解释的相同的事情。

但是，我对接受的答案不满意，因为它没有考虑源中已删除的项目。

什么是最简单、最 rails 符合考虑源中已删除条目的方法？

备注：

当使用 .find_or_initialize_by_identifier 并且从不删除时，多余的条目将保留在 table.
据我所知，在每次导入前使用 .delete_all 时，主键不会重置并很快接近其限制。
我可以删除 table 并在 rake 任务中使用 ::Migrations.create_table 但是模式和迁移中的定义必须与 rake 任务中的代码保持同步，这似乎不受欢迎。

Answer 1

您绝对不应该删除所有记录，然后从数据中重新创建它们。这将产生各种问题，例如破坏其他表中的任何外键字段，这些外键字段在对象被删除之前用于指向该对象。这就像推倒房屋并重建它以拥有一扇不同颜色的门。因此，"see if it's there, if it is then update it (if it's different), if it's not then create it" 是正确的策略。

你没有说你的删除标准是什么，但如果它是 "any record which isn't mentioned in the import data should be deleted" 那么你只需要跟踪输入数据中的一些唯一字段，然后删除所有具有自己唯一字段的记录不在该列表中。

因此，您执行导入的代码可能看起来像这样（从另一个问题复制代码：此代码以非常笨拙的方式设置数据，但我不打算在这里解决）

namespace :data do
  desc "import data from files to database"
  task :import => :environment do
    file = File.open(<file to import>)
    identifiers = []
    file.each do |line|
      #disclaimer: this way of setting the data from attrs[0], attrs[1] etc is crappy and fragile and is not how i would do it
      attrs = line.split(":")
      identifier = attrs[0]
      identifiers << identifier
      if p = Product.find_or_initialize_by_identifier(identifier)
        p.name = attrs[1]
        etc...
        p.save!
      end
    end
    #destroy any which didn't appear in the import data
    Product.where("identifier not in (?)", identifiers).each(&:destroy)
  end
end

Answer 2

我使用的是 .delete_all 和不带 rails 默认 id auto_increment 列的 table 架构，以避免在 . delete_all.

create_table :airport_locations, id: false do |t|
  t.string :iata_faa_code, :primary_key
  t.float :latitude
  t.float :longitude
end
add_index :airport_locations, :iata_faa_code

笔记

数据集相当小（约 5000 个条目）并且更新不频繁。
如果 table 很小，则可以像 Max Williams 回答中解释的那样跟踪已删除的项目。尽管 tables 包含数千个条目可能需要大量内存，并且可能需要更复杂的策略（例如使用临时 tables）来查找已删除的条目。

如何编写导入数据和处理删除的 Rake 任务？

How to write a Rake task that imports data and handles deletions?

import

rake

ruby-on-rails

recreate