使用 python 仅保留文本文件中的某些行

Question

我正在开发一个脚本，可以让我从某个网站获取 "solidfiles.com" 链接。我有所有的 href 链接。但是，我无法使用 python.

仅保留 solidfiles.com 个链接

This is the website I'm trying to fetch links from

这是我当前的脚本:-

import re
import requests
from bs4 import BeautifulSoup
import os
import fileinput

Link = 'https://animetosho.org/view/jacobswaggedup-kill-la-kill-bd-1280x720-mp4-batch.n677876'
q = requests.get(Link)
soup = BeautifulSoup(q.text)
#print soup
subtitles = soup.findAll('div',{'class':'links'})
#print subtitles


with  open("Anilinks.txt", "w") as f:
    for link in subtitles:
        x = link.find_all('a', limit=26)
        for a in x:
            url = a['href']
            f.write(url+'\n')

至此，我将所有链接都写在了名为"Anilinks.txt" 的文本文件中。我似乎不能只保留 solidfiles 链接。任何提示都会很棒。

Answer 1

这可能会起作用（如果您已有 .txt 文件）：

# Store the links we need in a list
links_to_keep = []
with open("Anilinks.txt", "r") as f:

     for line in f.readlines():
         if 'solidfiles.com' in line:
             links_to_keep.append(line)

# Write all the links in our list to the file
with open("Anilinks.txt", "w") as f:

    for link in links_to_keep:
        f.write(link)

或者您可以在写入文件之前过滤链接，那么您的代码的最后一部分将如下所示：

with  open("Anilinks.txt", "w") as f:
    for link in subtitles:
        x = link.find_all('a', limit=26)
        for a in x:
            if 'solidfiles.com' in a['href']:
                url = a['href']
                f.write(url+'\n')

使用 python 仅保留文本文件中的某些行

Keep only certain lines in a text file using python

python

beautifulsoup

request

python-2.7