将大型数据集导入核心数据,建立 Swift 中的关系

Importing large datasets into core data, making the relationships in Swift

我有一个 CoreData 数据库,其中包含大约 500.000 枚邮票和 86.000 个系列。我必须从使用 JSON 的网络 api 下载它们。将邮票和系列添加到核心数据中没有问题。但是我在处理两者之间的关系时遇到了麻烦。

每个邮票有一个系列,每个系列可以有多个邮票。如上面我的数据模型的图片所示。

我需要高效快速地建立两者之间的关系。当我在做一些研究时,我偶然发现了这个网站 https://www.objc.io/issues/4-core-data/importing-large-data-sets-into-core-data/ 我最感兴趣的文章:

A similar problem often arises when establishing relationships between the newly imported objects. Using a fetch request to get each related object independently is vastly inefficient. There are two possible ways out of this: either we resolve relationships in batches similar to how we imported the objects in the first place, or we cache the objectIDs of the already-imported objects. Resolving relationships in batches allows us to greatly reduce the number of fetch requests required by fetching many related objects at once. Don’t worry about potentially long predicates like:

[NSPredicate predicateWithFormat:@"identifier IN %@", identifiersOfRelatedObjects];

Resolving a predicate with many identifiers in the IN (...) clause is always way more efficient than going to disk for each object independently. However, there is also a way to avoid fetch requests altogether (at least if you only need to establish relationships between newly imported objects). If you cache the objectIDs of all imported objects (which is not a lot of data in most cases really), you can use them later to retrieve faults for related objects using objectWithID:.

// after a batch of objects has been imported and saved
for (MyManagedObject *object in importedObjects) {
    objectIDCache[object.identifier] = object.objectID;
}

// ... later during resolving relationships 
NSManagedObjectID objectID = objectIDCache[object.foreignKey];
MyManagedObject *relatedObject = [context objectWithID:objectId];
object.toOneRelation = relatedObject;

Note that this example assumes that the identifier property is unique across all entity types, otherwise we would have to account for duplicate identifiers for different types in the way we cache the object IDs.

但是我不知道他们的意思,谁能对此给出更多解释。最好使用 Swift,因为这是我最了解的语言,也是我创建应用程序所用的语言。 当然其他的建议也可以。 注意,离开 CoreData 不再是一种选择。

在两个对象之间建立关系的任务涉及手头有这两个对象。考虑到它们已经在 Core Data 中创建,您可以使用像

这样的谓词执行获取请求
@"countryID == %@", countryObjectData[@"id"]

你会得到它们的。但是如果你需要建立五十万个关系,你将不得不执行一百万个获取请求。很慢。

通过 NSManagedObjectID 检索 NSManagedObject 比通过 属性 值检索快得多。在开始解析之前,您可以按 server key -> objectID 对的形式按实体构建所有核心数据对象的缓存。

self.cache = [NSMutableDictionary dictionaryWithCapacity:self.managedObjectModel.entities.count];

NSExpressionDescription *objectIdDescription = [[NSExpressionDescription alloc] init];
objectIdDescription.name = @"objectID";
objectIdDescription.expression = [NSExpression expressionForEvaluatedObject];
objectIdDescription.expressionResultType = NSObjectIDAttributeType;

NSString *key = @"serverID";

for (NSEntityDescription *entity in self.managedObjectModel.entities) {
    NSMutableDictionary *entityCache = [NSMutableDictionary dictionary];
    self.cache[entity.name] = entityCache;

    NSFetchRequest *request = [NSFetchRequest fetchRequestWithEntityName:entity.name];
    request.resultType = NSDictionaryResultType;
    request.propertiesToFetch = @[key, objectIdDescription];
    NSArray *result = [self.context executeFetchRequest:request error:nil];

    for (NSDictionary *item in result) {
        id value = item[key];
        NSManagedObjectID *objectID = item[@"objectID"];
        entityCache[value] = objectID;
    }
}

拥有该缓存后,您可以像这样获取对象:

id serverKey = countryObjectData[@"id"];
NSManagedObjectID *objectID = self.cache[@"Country"][serverKey];
Country *country = [self.context objectWithID:objectID]

快多了。

当您在解析 JSON 时创建新对象时,您需要将它们的服务器密钥和 objectID 对添加到缓存 - 在获得 permanent ID 之后。删除对象时从缓存中删除该对。