使用 indexOfObject:inSortedRange:options:usingComparator 基于属性前缀过滤 NSArray:二进制搜索

Filter NSArray based on attribute prefix using indexOfObject:inSortedRange:options:usingComparator: binary search

我正在尝试使用 indexOfObject:inSortedRange:options:usingComparator: 在数组上实现简单的二进制搜索,但此方法的行为并不完全符合我的预期,我不知道遗漏了什么。

让我们深入了解细节:

 + (FilterRange *)findRangeHeadAndTailForPrefix:(NSString *)prefix inCityArray:(NSArray *)array {
    FilterRange *result = [[FilterRange alloc] init];
    result.startIndex = [self findRangeBordersForPrefix:prefix inArray:array lookingForHead:YES];
    result.endIndex = [self findRangeBordersForPrefix:prefix inArray:array lookingForHead:NO];
    return result;
}


 + (long)findRangeBordersForPrefix:(NSString *)prefix inArray:(NSArray *)array lookingForHead:(BOOL)shouldLookForHead {
    NSRange searchRange = NSMakeRange(0, [array count]);
    long foundIndex = [array indexOfObject:prefix
                             inSortedRange:searchRange
                                   options:(shouldLookForHead ? NSBinarySearchingFirstEqual : NSBinarySearchingLastEqual)
                           usingComparator:^(id obj1, id obj2)
                       {
        City *castedCity = (City *)([obj1 isKindOfClass:[City class]] ? obj1 : obj2);
        NSString *castedPrefix = (NSString *)([obj1 isKindOfClass:[City class]] ? obj2 : obj1);
        NSComparisonResult comparisonResult = ([[[castedCity readableName] lowercaseString] hasPrefix:[castedPrefix lowercaseString]] ? NSOrderedSame :
                                               [[[castedCity readableName] lowercaseString] compare:[castedPrefix lowercaseString]]);
        return comparisonResult;
    }];
    
    return foundIndex;
}

问题是 indexOfObject:inSortedRange:options:usingComparator: 方法的行为,下面是它的行为方式(使用断点和比较器的逐步执行看到了这一点):

因此搜索从未正确执行。 请注意,我不想使用 filterUsingPredicate 因为它的时间复杂性。数组已经排序,因此通过二进制搜索可以实现更高的效率水平。

有谁知道我可能错过了什么。我想有些事情真的很明显,但我没有注意它。 非常感谢任何帮助或想法:)

规范化

我看到的第一个问题是您使用的 lowercase 字符串不能很好地处理重音字符,...首先,让我们编写一些帮助程序来规范化字符串。

@interface NSString(Normalize)

- (NSString *)normalized;

@end

@implementation NSString(Normalize)

- (NSString *)normalized {
    NSMutableString *result = [NSMutableString stringWithString:self];
    CFStringTransform((__bridge CFMutableStringRef)result, NULL, kCFStringTransformStripCombiningMarks, NO);
    return [result lowercaseString];
}

@end

此方法 returns 小写字符串,并去除组合标记。不是一个非常高效的版本,但你知道这里需要做什么。

缓存

规范化可能很昂贵,让我们缓存它。

@interface City: NSObject

@property(nonatomic, strong) NSString *readableName;
@property(nonatomic, strong, readonly) NSString *normalizedReadableName;

@end

@implementation City {
    NSString *_normalizedReadableName;
}

- (instancetype)initWithName:(NSString *)name {
    if ((self = [super init]) == nil) { return nil; }
    _readableName = name;
    _normalizedReadableName = nil;    
    return self;
}

- (NSString *)normalizedReadableName {
    if (_normalizedReadableName == nil) {
        _normalizedReadableName = [_readableName normalized];
    }
    return _normalizedReadableName;
}

- (void)setReadableName:(NSString *)readableName {
    _readableName = readableName;
    _normalizedReadableName = nil;
}

+(instancetype)cityWithName:(NSString *)name {
    return [[self alloc] initWithName:name];
}

@end

再次强调,这取决于你想如何在这里进行。举个例子。

搜索

indexOfObject:inSortedRange:options:usingComparator: 说:

The elements in the array must have already been sorted using the comparator cmp (it's the usingComparator argument). If the array is not sorted, the result is undefined.

您写道:

The array is already sorted alphabetically based on the readableName property.

但在您的比较器中,您使用的是 lowercaseString。目前还不清楚它是否按小写字符串排序,可能是另一个问题。否则结果未定义。我们必须对同一个字符串进行操作(排序、比较、hasPrefix、...)——这就是规范化舞蹈的原因。

让我们创建一个示例数组,将其打乱并排序。

NSArray *shuffledCities = [@[
    [City cityWithName:@"Čáslav"],
    [City cityWithName:@"Čelákovice"],
    [City cityWithName:@"Černošice"],
    [City cityWithName:@"Černošín"],
    [City cityWithName:@"Černovice"],
    [City cityWithName:@"Červená Řečice"],
    [City cityWithName:@"Červený Kostelec"],
    [City cityWithName:@"Česká Kamenice"],
    [City cityWithName:@"Česká Lípa"],
    [City cityWithName:@"Česká Skalice"],
    [City cityWithName:@"Česká Třebová"],
    [City cityWithName:@"České Budějovice"],
    [City cityWithName:@"České Velenice"],
    [City cityWithName:@"Český Brod"],
    [City cityWithName:@"Český Dub"],
    [City cityWithName:@"Český Krumlov"],
    [City cityWithName:@"Český Těšín"],
    [City cityWithName:@"Chodová Planá"]
] shuffledArray]; // It's from the GameplayKit.framework

NSArray *sortedCities = [shuffledCities sortedArrayUsingComparator:^NSComparisonResult(City *_Nonnull city1, City *_Nonnull city2) {
    return [city1.normalizedReadableName compare:city2.normalizedReadableName];
}];

这里重要的一点是我们按 normalizedReadableName 属性.

排序

让我们假设 prefix 是您函数的一个参数 - 我们也必须对其进行归一化 ...

NSString *prefix = @"čEsKÝ dub";
NSString *normalizedPrefix = [prefix normalized];

...否则我们的比较器将无法工作:

NSComparisonResult (^comparator)(id  _Nonnull, id  _Nonnull) = ^(id _Nonnull obj1, id  _Nonnull obj2) {
    // One has to be City and another one NSString
    assert([obj1 isKindOfClass:[NSString class]] || [obj2 isKindOfClass:[NSString class]]);
    assert([obj1 isKindOfClass:[City class]] || [obj2 isKindOfClass:[City class]]);

    
    if ([obj1 isKindOfClass:[City class]]) {
        return [[obj1 normalizedReadableName] hasPrefix:obj2] ? NSOrderedSame : [[obj1 normalizedReadableName] compare:obj2];
    } else {
        return [[obj2 normalizedReadableName] hasPrefix:obj1] ? NSOrderedSame : [obj1 compare:[obj2 normalizedReadableName]];
    }
};

我看到的另一个问题是如果 obj2City,你的比较器是错误的。比较器期望比较 [obj1 compare:obj2],但在这种情况下,您的比较器返回 [obj2 compare:obj1]obj2Cityobj1NSString)。

我们已经修复了比较器,让我们搜索第一个城市:

NSUInteger first = [sortedCities indexOfObject:normalizedPrefix
                                 inSortedRange:NSMakeRange(0, sortedCities.count)
                                       options:NSBinarySearchingFirstEqual
                               usingComparator:comparator];

if (first == NSNotFound) {
    NSLog(@"Prefix \"%@\" not found", prefix);
    return;
}

如果找到,搜索第二个:

NSUInteger last = [sortedCities indexOfObject:normalizedPrefix
                                inSortedRange:NSMakeRange(first, sortedCities.count - first)
                                      options:NSBinarySearchingLastEqual
                              usingComparator:comparator];

// Shouldn't happen as our search range includes the first one
assert(last != NSNotFound);

NSLog(@"Prefix \"%@\" found", prefix);
NSLog(@" - First %lu: \"%@\"", (unsigned long)first, [sortedCities[first] readableName]);
NSLog(@" - Last %lu: \"%@\"", (unsigned long)last, [sortedCities[last] readableName]);

示例输出

全部正确。

Prefix "čEsKÝ dub" found
 - First 14: "Český Dub"
 - Last 14: "Český Dub"
Prefix "Praha" not found
Prefix "ceskÝ" found
 - First 13: "Český Brod"
 - Last 16: "Český Těšín"
Prefix "cernos" found
 - First 2: "Černošice"
 - Last 3: "Černošín"