Python 重新排列并删除 html 页面标题中的字符
Python Rearrange & remove character from html page title
我是 运行 Python 2.7.11 |在 Windows 10 上使用 beautifulsoup4 和 lxml。
import urllib2
import re
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen("http://www.daisuki.net/us/en/anime/watch.GUNDAMUNICORNRE0096.13142.html"), "lxml")
Name = soup.title.string
print(Name.replace('#', ""))
输出:
01 DEPARTURE 0096 - 机动战士高达独角兽 RE:0096 - DAISUKI
期望的输出:
MOBILE SUIT GUNDAM UNICORN RE:0096 - 01 DEPARTURE 0096
我将如何删除末尾的“- DAISUKI”并重新排序字符串?
拆分为 -
并重新排列部分标题:
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>>
>>> soup = BeautifulSoup(urllib2.urlopen("http://www.daisuki.net/us/en/anime/watch.GUNDAMUNICORNRE0096.13142.html"), "lxml")
>>> Name = soup.title.string
>>>
>>> " - ".join(Name.replace('#', "").split(" - ")[1::-1])
u'MOBILE SUIT GUNDAM UNICORN RE:0096 - 01 DEPARTURE 0096'
Hacky解决方案来袭:
Name = "01 DEPARTURE 0096 - MOBILE SUIT GUNDAM UNICORN RE:0096 - DAISUKI"
print ("- ".join(reversed(Name.split('-')[:2]))).strip()
我是 运行 Python 2.7.11 |在 Windows 10 上使用 beautifulsoup4 和 lxml。
import urllib2
import re
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen("http://www.daisuki.net/us/en/anime/watch.GUNDAMUNICORNRE0096.13142.html"), "lxml")
Name = soup.title.string
print(Name.replace('#', ""))
输出:
01 DEPARTURE 0096 - 机动战士高达独角兽 RE:0096 - DAISUKI
期望的输出:
MOBILE SUIT GUNDAM UNICORN RE:0096 - 01 DEPARTURE 0096
我将如何删除末尾的“- DAISUKI”并重新排序字符串?
拆分为 -
并重新排列部分标题:
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>>
>>> soup = BeautifulSoup(urllib2.urlopen("http://www.daisuki.net/us/en/anime/watch.GUNDAMUNICORNRE0096.13142.html"), "lxml")
>>> Name = soup.title.string
>>>
>>> " - ".join(Name.replace('#', "").split(" - ")[1::-1])
u'MOBILE SUIT GUNDAM UNICORN RE:0096 - 01 DEPARTURE 0096'
Hacky解决方案来袭:
Name = "01 DEPARTURE 0096 - MOBILE SUIT GUNDAM UNICORN RE:0096 - DAISUKI"
print ("- ".join(reversed(Name.split('-')[:2]))).strip()