python : 使用正则表达式获取剧集
python : Using regex to get episode
此代码适用于以下文件名:
Terkel in Trouble 2004
它应该 return 'null' 而不是匹配 returns 'e 200'
因为:
e|x|episode|Ep|^
和
(\d{2,3})
我该如何预防?
def getEpisode(filename):
match = re.search(
r'''(?ix)
(?:
e|x|episode|Ep|^
)
\s*
(\d{2,3})
''', filename)
if match:
print (match)
return match.group(1)
**EDIT:**
test = (
"0x01 GdG LO Star Lord Part 1", #1
"S01E01 GdG Verso Nowhere", #2
"Wacky Races Episode 20 X264 Ac3", #3
"Terkel in Trouble 2004", #4 return None, it's ok
"Yu Yu Hakusho Ep 100 secret", #5
"Kakegurui S1 Ep11 La donna che scommette", #6
"Kakegurui S1 Ep12 La donna che gioca", #7
"ep 01 wolf's rain", #8
"Toradora! 08" #9
)
尝试使用单词边界 \b
正则表达式已更新
\b(?:e(?:p(?:isode)?)?|0x|S\d\dE)?\s*?(\d{2,3})\b
结果
1 -> 0x01
2 -> S01E01
3 -> Episode 20
4 ->
5 -> Ep 100
6 -> Ep11
7 -> Ep12
8 -> ep 01
9 -> 08
(请注意示例中代码的缩进不正确。)
您可以稍微重构交替 |
部分,然后在整个模式周围使用单词边界。使用 re.search 的示例将 return 第一个匹配的位置。
\b(?:e(?:p(?:isode)?)?|x)?\s*(\d{2,3})\b
模式匹配:
\b
一个单词边界
(?:
非捕获组
e(?:p(?:isode)?)?
匹配 e
可选后跟 p
可选后跟 isode
|
或
x
匹配 x
)?
关闭非捕获组并使其可选
\s*
匹配可选的空白字符
(\d{2,3})
抓取2-3位数字第1组
\b
一个单词边界
import re
def getEpisode(filename):
return re.search(r"\b(?:e(?:p(?:isode)?)?|x)?\s*(\d{2,3})\b", filename, re.IGNORECASE)
episodes = [
"Terkel in Trouble 2004",
"eisode11",
"episode12",
"e13",
"ep14",
"EP999 this is x888",
" 234",
"235"
]
for episode in episodes:
match = getEpisode(episode)
if match:
print(match.group(1))
输出
12
13
14
999
234
235
此代码适用于以下文件名:
Terkel in Trouble 2004
它应该 return 'null' 而不是匹配 returns 'e 200' 因为:
e|x|episode|Ep|^
和
(\d{2,3})
我该如何预防?
def getEpisode(filename):
match = re.search(
r'''(?ix)
(?:
e|x|episode|Ep|^
)
\s*
(\d{2,3})
''', filename)
if match:
print (match)
return match.group(1)
**EDIT:**
test = (
"0x01 GdG LO Star Lord Part 1", #1
"S01E01 GdG Verso Nowhere", #2
"Wacky Races Episode 20 X264 Ac3", #3
"Terkel in Trouble 2004", #4 return None, it's ok
"Yu Yu Hakusho Ep 100 secret", #5
"Kakegurui S1 Ep11 La donna che scommette", #6
"Kakegurui S1 Ep12 La donna che gioca", #7
"ep 01 wolf's rain", #8
"Toradora! 08" #9
)
尝试使用单词边界 \b
正则表达式已更新
\b(?:e(?:p(?:isode)?)?|0x|S\d\dE)?\s*?(\d{2,3})\b
结果
1 -> 0x01
2 -> S01E01
3 -> Episode 20
4 ->
5 -> Ep 100
6 -> Ep11
7 -> Ep12
8 -> ep 01
9 -> 08
(请注意示例中代码的缩进不正确。)
您可以稍微重构交替 |
部分,然后在整个模式周围使用单词边界。使用 re.search 的示例将 return 第一个匹配的位置。
\b(?:e(?:p(?:isode)?)?|x)?\s*(\d{2,3})\b
模式匹配:
\b
一个单词边界(?:
非捕获组e(?:p(?:isode)?)?
匹配e
可选后跟p
可选后跟isode
|
或x
匹配x
)?
关闭非捕获组并使其可选\s*
匹配可选的空白字符(\d{2,3})
抓取2-3位数字第1组\b
一个单词边界
import re
def getEpisode(filename):
return re.search(r"\b(?:e(?:p(?:isode)?)?|x)?\s*(\d{2,3})\b", filename, re.IGNORECASE)
episodes = [
"Terkel in Trouble 2004",
"eisode11",
"episode12",
"e13",
"ep14",
"EP999 this is x888",
" 234",
"235"
]
for episode in episodes:
match = getEpisode(episode)
if match:
print(match.group(1))
输出
12
13
14
999
234
235