Django 3:过滤字母或 non-letter 的查询集

Django 3: Filter queryset for letter or non-letter

在我的数据库中,我有一个 table,其标题可以以字母或 non-letter-character 开头。例如数字或“@”或“#”。该模型如下所示:

class Items(models.Model):
    title = models.CharField(max_length=255)
    body = models.TextField

在我看来,我想将模型分成两个 object。一个 object 包含标题以字母开头的所有项目,另一个 object 包含所有其他项目:

class ItemsView(TemplateView):
    template_name = "index.html"

    def get_context_data(self, **kwargs):
        alpha_list = Items.objects.filter(title__startswith=<a letter>)
        other_list = Items.objects.filter(title__startswith=<not a letter>)

        context = {
            "list_a": alpha_list,
            "list_b": other_list
        }

        return context

一直在查阅文档,Whosebug和大圣google,但至今没能找到解决办法

非常感谢任何帮助。

您可以使用 regex to filter (use regex101.com to test your regex) and exclude 查找其他不以字母开头的项目

以字母开头:

alpha_list = Items.objects.filter(title__regex=r'^[a-zA-Z].*$')

其他情况:

other_list = Items.objects.exclude(title__regex=r'^[a-zA-Z].*$')

解释:

/^[a-zA-Z].*$/

^ asserts position at start of the string 

a-z a single character in the range between a (index 97) and z (index 122) (case sensitive) 

A-Z a single character in the range between A (index 65) and Z (index 90) (case sensitive) 

.* matches any character (except for line terminators)

* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed 

$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)

您可以将正则表达式略微优化为:

# starts with an A-Za-z
Item.objects.filter(<b>title__regex='^[A-Za-z]'</b>)

# starts not with A-Za-z
Item.objects.exclude(<b>title__regex='^[A-Za-z]'</b>)

但更重要的问题当然是,这里的字母是什么。这里它不会匹配 non-latin 个字符,例如西里尔字母、阿拉伯语等。此外,它不会匹配带有变音符号的字符,例如 äöüßÄÖÜ 等。您可以在字符块中添加额外的字符或字符范围 ( [A-Za-z…]) 部分用于处理额外的字符。

例如,您可以使用 Latin-1 Supplement [wiki], Latin Extended-A [wiki], Latin Extended-B [wiki] and Latin Extended Addition [wiki] 以及:

# starts with a Latin character
Item.objects.filter(<b>title__regex='^[A-Za-z\uC0-\u024F\u1E00-\u1EFF]'</b>)

但我们可能还想添加阿拉伯语、西里尔语等字符。