从字符串列表中找到几乎匹配的短语

Find Nearly Matching Phrase from a List of String

我想从字符串列表中进行搜索,而我的查找短语有时只能说是列表中的 70% 匹配,但我仍然想将其视为已找到。在下面的代码中,我的查找短语是“xxx in the middle sample xxx”。如果我使用 Contains 或 Any,则不会产生任何结果。从我的查找词中,我想搜索包含单词“在中间”(不区分大小写)的匹配项。我更喜欢 3 个或更多匹配的连续单词,例如“在中间”。请帮忙。

C#

  static void Main(string[] args)
    {
        var wordListToLookUp = new List<string> {"In the middle","This is a sample text in the middle","There is secret in the middle of the forest","None of your business"};
        var lookupWord = "xxx in the middle sample xxx";
        foreach(var word in wordListToLookUp)
        {
            var exist = word.Contains(lookupWord);
            //even if my look up has only 70% match or nearly match, I would like to consider them as found
            Console.WriteLine("Found match: {0}", exist);
        }
        Console.ReadLine();
    }

输出

Found match: False
Found match: False
Found match: False
Found match: False

预期输出

Found match: True
Found match: True
Found match: True
Found match: False

您可以通过创建一个具有“中间”值的子字符串然后在字符串列表中找到该子字符串来实现此目的,字符串列表是否包含该子字符串。

尝试这样做:

var lookupWord = "in the middle";
var exist = word.ToLower().Contains(lookupWord.ToLower());

每当你比较字符串喜欢转换为小写或大写时,你就不会 运行 区分大小写的问题。

我想我已经为您找到了解决方案,但是代码没有优化。但是,您可以优化它。这段代码正是您要求的结果。这是我的代码=>

static void Main(string[] args)
{
            bool exist = false;
            var wordListToLookUp = new List<string> { "In the middle", "This is a sample text in the middle", "There is secret in the middle of the forest", "None of your business" };
            var lookupWord = "xxx in the middle sample xxx";
            List<string> checkerarrary = lookupWord.ToLower().Split(' ').ToList();
            foreach (var word in wordListToLookUp)
            {
                exist = false;
                List<string> currentStringarrary = word.ToLower().Split(' ').ToList();
                
                if (checkerarrary.Count >= 3 && currentStringarrary.Count>=3)
                {
                    for(int i=0; i<= checkerarrary.Count-3;i++)
                    {
                        for (int c = 0; c <= currentStringarrary.Count - 3; c++)
                        {
                            if(checkerarrary[i]== currentStringarrary[c] 
                                && checkerarrary[i+1] == currentStringarrary[c+1]
                                && checkerarrary[i + 2] == currentStringarrary[c+2])
                            {
                                exist = true;
                            }
                        }
                    }
                }
                Console.WriteLine("Found match: {0}", exist);
            }
}

注意:我已经使用了至少3个词的搜索文本。您可以根据需要进行调整。请检查代码并告诉我。