只打印第一个输出行

Question

我写了一段代码，从指定的 url 中提取某些文本，但它在不同的行中给出了 2 或 3（取决于网页）后续相同的输出。我只需要使用第一个输出。我该怎么做？这是我的代码：-

 import requests, re
 from bs4 import BeautifulSoup
 url="http://www.barneys.com/raf-simons-%22boys%22-poplin-shirt-504182589.html#start=2"
 r=requests.get(url)
 soup=BeautifulSoup(r.content)
 links=soup.find_all("a")
 g_d4=soup.find_all("ol", {"class":"breadcrumb"})
 for item in g_d4:
      links_2=soup.find_all('a', href=re.compile('^http://www.barneys.com/barneys-new-york/men/'))
      pattern_2=re.compile("clothing/(\w+)")
      for link in links_2:
          match_1=pattern_2.search(link["href"])
          if match_1:
             print (match_1.group(1))

我的输出是：

         shirts
         shirts
         shirts

我希望我的输出像这样：

         shirts

我该怎么办？

Answer 1

不确定您需要哪个答案，所以我会回答两个。

独特的结果

如果您希望在整个页面中获得独特的结果，您可以使用集合来执行以下操作：

for item in g_d4:
    links_2=soup.find_all('a', href=re.compile('^http://www.barneys.com/barneys-new-york/men/'))
    pattern_2=re.compile("clothing/(\w+)")
    matches = set()
    for link in links_2:
        match_1=pattern_2.search(link["href"])
        if match_1:
            matches.add(match_1.group(1))
    print(matches)

单个结果

如果您只想要每次迭代中的第一个结果，您可以在内循环中中断：

for item in g_d4:
    links_2=soup.find_all('a', href=re.compile('^http://www.barneys.com/barneys-new-york/men/'))
    pattern_2=re.compile("clothing/(\w+)")
    for link in links_2:
        match_1=pattern_2.search(link["href"])
        if match_1:
            print(match_1.group(1))
            break

只打印第一个输出行

Print just first output line

python

regex

beautifulsoup

web-scraping

python-2.7

独特的结果

单个结果