为什么 `'{x[1:3]}'.format(x="asd")` 会导致 TypeError？

Question

考虑一下：

>>> '{x[1]}'.format(x="asd")
's'
>>> '{x[1:3]}'.format(x="asd")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string indices must be integers

此行为的原因可能是什么？

Answer 1

'{x[1]}'.format(x="asd") 这里的 [1] 语法不是“正常”的字符串索引语法，即使在这种情况下它似乎以相同的方式工作。

正在使用Format Specification Mini-Language。允许传递对象和访问格式化字符串内的任意属性的相同机制（例如 '{x.name}'.format(x=some_object)）。

这种“假”索引语法还允许将可索引对象传递给 format 并直接从格式化字符串中获取所需的元素：

'{x[0]}'.format(x=('a', 'tuple'))
# 'a'
'{x[1]}'.format(x=('a', 'tuple'))
# 'tuple'

文档中对此的唯一参考（至少我能找到）是这一段：

The field_name itself begins with an arg_name that is either a number or a keyword. If it’s a number, it refers to a positional argument, and if it’s a keyword, it refers to a named keyword argument. If the numerical arg_names in a format string are 0, 1, 2, … in sequence, they can all be omitted (not just some) and the numbers 0, 1, 2, … will be automatically inserted in that order. Because arg_name is not quote-delimited, it is not possible to specify arbitrary dictionary keys (e.g., the strings '10' or ':-]') within a format string. The arg_name can be followed by any number of index or attribute expressions. An expression of the form '.name' selects the named attribute using getattr(), while an expression of the form '[index]' does an index lookup using __getitem__().

虽然它提到

while an expression of the form '[index]' does an index lookup using __getitem__().

它没有提到任何关于不支持切片语法的内容。

对我来说，这感觉像是对文档的疏忽，特别是因为 '{x[1:3]}'.format(x="asd") 生成了这样一个神秘的错误消息，更重要的是由于 __getitem__ 已经支持切片。

Answer 2

基于的实验，检查对象的 __getitem__ 方法实际接收的值：

class C:
    def __getitem__(self, index):
        print(repr(index))

'{c[4]}'.format(c=C())
'{c[4:6]}'.format(c=C())
'{c[anything goes!@#$%^&]}'.format(c=C())
C()[4:6]

输出（Try it online!）：

4
'4:6'
'anything goes!@#$%^&'
slice(4, 6, None)

因此，当 4 转换为 int 时，4:6 不会像通常的切片那样转换为 slice(4, 6, None)。相反，它仍然只是 字符串 '4:6'。这不是 indexing/slicing 字符串的有效类型，因此你得到了 TypeError: string indices must be integers。

更新：

有记录吗？好吧...我没有看到真正清楚的东西，但是@GACy20 something subtle. The grammar 有这些规则

field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
element_index     ::=  digit+ | index_string
index_string      ::=  <any source character except "]"> +

我们的 c[4:6] 是 field_name，我们对 element_index 部分感兴趣 4:6。我认为如果 digit+ 有自己的名称有意义的规则会更清楚：

field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
element_index     ::=  index_integer | index_string
index_integer     ::=  digit+
index_string      ::=  <any source character except "]"> +

我想说 index_integer 和 index_string 会更清楚地表明 digit+ 被转换为整数（而不是停留一个数字字符串），而 <any source character except "]"> + 将保持 string.

也就是说，按原样看规则，也许我们应该思考 “将数字大小写从 any-characters 大小写中分离出来的意义是什么？好吗？" 并认为重点是区别对待纯数字，大概是将它们转换为整数。或者也许文档的其他部分甚至指出 digit 或 digits+ 通常被转换为整数。

为什么 `'{x[1:3]}'.format(x="asd")` 会导致 TypeError？

Why does `'{x[1:3]}'.format(x="asd")` cause a TypeError?

python

typeerror

f-string