python : 使用正则表达式获取剧集

python : Using regex to get episode

此代码适用于以下文件名:

Terkel in Trouble 2004

它应该 return 'null' 而不是匹配 returns 'e 200' 因为:

      e|x|episode|Ep|^        

    (\d{2,3})                 

我该如何预防?

def getEpisode(filename):
match = re.search(
    r'''(?ix)                 
    (?:                       
      e|x|episode|Ep|^        
      )                       
    \s*                       
    (\d{2,3})                 
    ''', filename)
if match:
    print (match)
    return match.group(1)


**EDIT:**
    test = (
    "0x01 GdG LO Star Lord  Part 1",             #1 
    "S01E01 GdG  Verso Nowhere",                 #2 
    "Wacky Races Episode 20 X264 Ac3",           #3
    "Terkel in Trouble 2004",                    #4 return None, it's ok
    "Yu Yu Hakusho  Ep 100  secret",             #5
    "Kakegurui S1 Ep11 La donna che scommette",  #6
    "Kakegurui S1 Ep12 La donna che gioca",      #7
    "ep 01 wolf's rain",                         #8
    "Toradora! 08"                               #9
)

尝试使用单词边界 \b

正则表达式已更新

\b(?:e(?:p(?:isode)?)?|0x|S\d\dE)?\s*?(\d{2,3})\b

结果

1 ->  0x01
2 ->  S01E01
3 ->  Episode 20
4 ->  
5 ->  Ep 100
6 ->  Ep11
7 ->  Ep12
8 ->  ep 01
9 ->  08

(请注意示例中代码的缩进不正确。)

您可以稍微重构交替 | 部分,然后在整个模式周围使用单词边界。使用 re.search 的示例将 return 第一个匹配的位置。

\b(?:e(?:p(?:isode)?)?|x)?\s*(\d{2,3})\b

模式匹配:

  • \b一个单词边界
  • (?:非捕获组
    • e(?:p(?:isode)?)? 匹配 e 可选后跟 p 可选后跟 isode
    • |
    • x 匹配 x
  • )? 关闭非捕获组并使其可选
  • \s* 匹配可选的空白字符
  • (\d{2,3})抓取2-3位数字第1组
  • \b一个单词边界

Regex demo | Python demo

import re

def getEpisode(filename):
    return re.search(r"\b(?:e(?:p(?:isode)?)?|x)?\s*(\d{2,3})\b", filename, re.IGNORECASE)

episodes = [
    "Terkel in Trouble 2004",
    "eisode11",
    "episode12",
    "e13",
    "ep14",
    "EP999 this is x888",
    "  234",
    "235"
]
for episode in episodes:
    match = getEpisode(episode)
    if match:
        print(match.group(1))

输出

12
13
14
999
234
235