网络抓取:为什么它 return 是一个空值,也许是 Java 脚本问题?

web scrapping: Why does it return a null value, Maybe a Java script issue?

好的,所以我是网络抓取的新手。我遵循了在互联网上找到的教程,它对特定网站很有用。所以我试图将其更改为适用于另一个站点。 当我收到 200 响应时,我想我已经弄清楚了 headers,但是当我瞄准 div 来提取它的值时,我遇到了 null。所以我的问题是我在这里做错了什么吗?我试图关注其他 tuts 以查看它是否会回答我的问题,但我想因为我是新手我不确定要寻找什么?!

编辑: 我应该更具体一点。正如您在我的代码中看到的那样,我正在尝试从混沌卡片网站上抓取数据,我想我已经对搜索功能进行了排序(可能是错误的?)但是我想要实现的是当我检查页面时我会喜欢从

获取数据

<div class="product-detail__content">Out of stock </div> 特别是“缺货”部分。据我所知,这个 div 将包含“有货”假设它是。但是当我瞄准这个 div 时,我遇到了 null

我想做的就是设置一个 scraper,当不和谐的用户输入特定产品时,它将搜索网站,如果有货,它会 return 说有货或没有库存。但现在我正在尝试采取一些小步骤,让它首先打印我想要的数据

代码

import os
import asyncio
import discord
import bs4 as bs
import requests



r = requests.session()
client = discord.Client()


headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36 Edg/97.0.1072.76'}


@client.event
async def on_ready():
    print(f'{client.user.name} - Have a good day <3')
result = requests.get ("https://www.chaoscards.co.uk/", headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36 Edg/97.0.1072.76'})
print(result.status_code)


def site_search(keyword):
    resp = r.get(f'https://www.chaoscards.co.uk/prod/{keyword}', headers = headers)
    
    # print(resp.text)
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    in_stock =''
    out_of_stock =''
    for x in soup.find_all('div', {'class': 'product-detail__content'}): 
        if ' Out of stock ' in (x):
            in_stock = 'Out of stock bro'
        if ' In stock ' in str():
            out_of_stock  = 'Its in stock '           
    #current_image_url = soup.find('img', {'itemprop': 'image'}).get('src') #
    #current_name = soup.find('p', {'class': 'listing-title'}).get_text() 
    return in_stock,out_of_stock

@client.event
async def on_message(message):

   
    if message.content.startswith('.sm'):
         keyword = message.content.split('.sm')[1]
         
         print(site_search(keyword))
         in_stock,out_of_stock =  site_search(keyword)
         

编辑 2: 所以我打印了 resp = r.get(f'https://www.chaoscards.co.uk/prod/{keyword}', headers = headers) 中的文本 并在 return

收到了这个
<html lang="en-US">
<head>
  <meta charset="UTF-8" />
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
  <meta name="robots" content="noindex, nofollow" />
  <meta name="viewport" content="width=device-width,initial-scale=1" />
  <title>Just a moment...</title>
  <style type="text/css">
    html, body {width: 100%; height: 100%; margin: 0; padding: 0;}
    body {background-color: #ffffff; color: #000000; font-family:-apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, "Helvetica Neue",Arial, sans-serif; font-size: 16px; line-height: 1.7em;-webkit-font-smoothing: antialiased;}
    h1 { text-align: center; font-weight:700; margin: 16px 0; font-size: 32px; color:#000000; line-height: 1.25;}
    p {font-size: 20px; font-weight: 400; margin: 8px 0;}
    p, .attribution, {text-align: center;}
    #spinner {margin: 0 auto 30px auto; display: block;}
    .attribution {margin-top: 32px;}
    @keyframes fader     { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
    @-webkit-keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
    #cf-bubbles > .bubbles { animation: fader 1.6s infinite;}
    #cf-bubbles > .bubbles:nth-child(2) { animation-delay: .2s;}
    #cf-bubbles > .bubbles:nth-child(3) { animation-delay: .4s;}
    .bubbles { background-color: #f58220; width:20px; height: 20px; margin:2px; border-radius:100%; display:inline-block; }
    a { color: #2c7cb0; text-decoration: none; -moz-transition: color 0.15s ease; -o-transition: color 0.15s ease; -webkit-transition: color 0.15s ease; transition: color 0.15s ease; }
    a:hover{color: #f4a15d}
    .attribution{font-size: 16px; line-height: 1.5;}
    .ray_id{display: block; margin-top: 8px;}
    #cf-wrapper #challenge-form { padding-top:25px; padding-bottom:25px; }
    #cf-hcaptcha-container { text-align:center;}
    #cf-hcaptcha-container iframe { display: inline-block;}
  </style>

      <meta http-equiv="refresh" content="35">
  <script type="text/javascript">
    //<![CDATA[
    (function(){
      
      window._cf_chl_opt={
        cvId: "2",
        cType: "non-interactive",
        cNounce: "66939",
        cRay: "6d5bfeb08acc8771",
        cHash: "18474546270a019",
        cPMDTk: "wjoavPcyn4sd4H8OTvY2JlyVlLXStFtB1PtHY4IbL58-1643559283-0-gaNycGzNB70",
        cUPMDTk: "\/prod\/Pokemon-Leafeon-V-Star-Special-Collection-Box?__cf_chl_tk=wjoavPcyn4sd4H8OTvY2JlyVlLXStFtB1PtHY4IbL58-1643559283-0-gaNycGzNB70",
        cFPWv: "b",
        cTTimeMs: "1000",
        cRq: {
          ru: "aHR0cHM6Ly93d3cuY2hhb3NjYXJkcy5jby51ay9wcm9kL1Bva2Vtb24tTGVhZmVvbi1WLVN0YXItU3BlY2lhbC1Db2xsZWN0aW9uLUJveA==",
          ra: "TW96aWxsYS81LjAgKFdpbmRvd3MgTlQgMTAuMDsgV2luNjQ7IHg2NCkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzk3LjAuNDY5Mi45OSBTYWZhcmkvNTM3LjM2IEVkZy85Ny4wLjEwNzIuNzY=",
          rm: "R0VU",
          d: "iWUrdApuyTqwp7Sa1s7+bi5hqVur/PkVsEkqFAgmNisGGdY/Hz93xG5mIaMzA9XizszFqLjvwVKypShAl3Lm45xvxp8eYawYXrvO505H8+ouA9KL2g+cmlQJrfXxkdmFI5QseUz1MIX/PGL/2S4A1HCLT7gmpXqr+muDiazQCUs7XUTOla+n/YWWyPERFG/uhI8+uOckDxuY+F8HdGDGE8xus50JmOBLgGMC4gELQfxSTyg7Ed7Lw1YUquPfkjSt9Q4aQ2nOWtuzYmO3zV/UTeu0qSsrMI/p7pPYi9ZDANElXlNnuUhFcMd2aDSnUF/aYdNG09p2RTiG3/Jkj5fPpGt4gm9X98Dd6X+OndUT/x01iSCq4NTgwgxjmubgZMbmuryIaU2eFKIV7o7TuJkIz1x6p4mdhapTdMMhsfVTS1iNWy0L0TwedlFeUaCNPv+lH76ely2NypA/hUtDUVYz1Eey/bwaxGZBp9McRcVwpsPbTCwddxr9Oc29obSDNCid5gpRPhu1Efs0a9zixzPEjQEjZD5tJ7SaFnmI6n7A6Hjc9YzHmvjPrNAUv++ZuWAD",
          t: "MTY0MzU1OTI4My4yOTAwMDA=",
          m: "HvTOqkkdUexOvObprQaK20tiA50EsMdMAUNxBs9a76U=",
          i1: "KnbCImKzNxo3XehPmg6jWg==",
          i2: "oGYSEcaLbEuXjAZsN7GZBg==",
          zh: "JJbyu7T+3hg5jWQCnkKHsP/7REhUTr23SkrwnAaFfjA=",
          uh: "l4HLyhywYXQDOYBGJBbVDnfNOSLbBOqVMJwcpsr3qjc=",
          hh: "8JWW5AsAg62xfggeMY1P1hRpDlpOqO6xoRTKU6X/36Q=",
        }
      }
      window._cf_chl_enter = function(){window._cf_chl_opt.p=1};
      
    })();
    //]]>
  </script>
  

</head>
<body>
  <table width="100%" height="100%" cellpadding="20">
    <tr>
      <td align="center" valign="middle">
          <div class="cf-browser-verification cf-im-under-attack">
  <noscript>
    <h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
  </noscript>
  <div id="cf-content" style="display:none">
    
    <div id="cf-bubbles">
      <div class="bubbles"></div>
      <div class="bubbles"></div>
      <div class="bubbles"></div>
    </div>
    <h1><span data-translate="checking_browser">Checking your browser before accessing</span> www.chaoscards.co.uk.</h1>
    
    <div id="no-cookie-warning" class="cookie-warning" data-translate="turn_on_cookies" style="display:none">
      <p data-translate="turn_on_cookies" style="color:#bd2426;">Please enable Cookies and reload the page.</p>
    </div>
    <p data-translate="process_is_automatic">This process is automatic. Your browser will redirect to your requested content shortly.</p>
    <p data-translate="allow_5_secs" id="cf-spinner-allow-5-secs" >Please allow up to 5 seconds&hellip;</p>
    <p data-translate="redirecting" id="cf-spinner-redirecting" style="display:none">Redirecting&hellip;</p>
  </div>
   
  <form class="challenge-form" id="challenge-form" action="/prod/Pokemon-Leafeon-V-Star-Special-Collection-Box?__cf_chl_f_tk=wjoavPcyn4sd4H8OTvY2JlyVlLXStFtB1PtHY4IbL58-1643559283-0-gaNycGzNB70" method="POST" enctype="application/x-www-form-urlencoded">
    <input type="hidden" name="md" value="lBy7XQRIP3rCTVaX6BoLog981WTI9wl7VPUnFUhdr80-1643559283-0-AfIJze-AsdFTbXwD6zN0kNrMUN92opj5F0JV4HP_IIHIJajx_7BeYxgFsAgzPKKs7B76uy2sTy0NMNe5Lonr5nsHsVd0d8oakLrUtEc43FE_-loi5O9yohJL7zVGcrm5BD3ZjEJMgxY3VwIM0TIl4QifHX3Xiacvm9Us_1J5_OALeEt8dyCDKBUbdhJbfkAV36zEt1-iFbst-6wTI-t_LM6YSJOD9j1K_sxVqdUzAawDadHBGslCDmRO4mA2LTGMhZdNdVN_RUZkUpqWKatfeHID4Hp-w3fx3tW4lxHE6gC86Ud8f-YgeYHKUDkfA_YomWCUxk9WFwoEYlr7MqQhQgWfBgxhAJNpXEbcaIb9e71bSZvbGw8BCLipFXuSk2ZvFofI-CdPIymN17v4S2xNgL92cGpXRhcr1OwJT6iFPJ8zuxPXPGud3C9ZeHnXbntYoYRQFXRcpcYcKIbBJEG8lIhJ4aWqmVkpkmai5oGlf0tnolsiO_5-i8cCEazYlbcUCqKnVDt6UGfuQNJdQXTNmmwNusmt4kPFLztjhNjKydzWHO6AWswLkMzj7rC1759cGdsyBiQkzb632-4Yqvi4f6ZOwBOEWfE0t8ZwdQtkEWy4U84c9j6hM8MG_xgl3t_0yKWRIFANVD9vkN1pqTfJRo8bQPm9oD3KmvRrVl5y_5InKhUotZYMJVV6DhV98WvHVOvjOGqJMPs75vQ0VaqQUiPzlyJ1MQ0G4Qe-sZzoIP0cxuvkCbQE2kxhRrzN887jWQ" />
    <input type="hidden" name="r" value="IxoGI_uynuxxTjqGKlMnSQ0FLUh3S6TIZtjcFTDgzzE-1643559283-0-ASk8gczAHx3QxOhXW8WEDt3t1OSXiJ7qJHx+ppz1M0nJipXy14O9Y2KKa1Q/qTKOeLAkBCnJuHaVX7YBvcXDde6M8x8kRdlX/AS1CNXDoqegpDIwjQKyyw0/e19MMsryFGK5ltynnTh9NKTFHJFOEcTF8OKBZqgcGH0dEGH3I1e/lPhMAAsMmWkE0i7aPiwTtEPYRkL/z8gpJbyDyqF/pL+ykLEqtpq3EDfFYbdMn4Fv27XNs4YKU9z4Z1DrjECS/Nwo4hCq0ZLYafLBnFHp9ZzIVEpGrM07Teci91bqTz3COri7Y3YZ0Vyj7NsZ/DPA+ykGWKU794u7OeSpIR4iifH6AEJA5ZVjhPMr46W7cvbgEAReq8TA+QdkIo7IA4Yn/Zcu77hx2ESjMpGMbXbJE4OrjZ8Xng/GoG18lBpF0nJUA9QAeUQ4cDOcHK8OkfHObBdTN4qGtQywBGdR7Wm8ZsxDjxry1kOKx1r4wXH1/PdOB0C5wWPVz5k6UPtIJOeqDfc8q7GFQ4f1UmHIeHE3Xg5FfntTitBbAQwNEZ/ymhpO2iGeLjog/wgAtiNY/qgnpTkpJXTjYgZoENwu9VgPIAaJt9wOUPLGnSkQu9nTDDnlbo2DwLmQKdfIYtCUfSF2DNNcyrk7LzWDHc5mWsfXhG/d9J2Ns9nJ4hWHcovnqOHHGLI7QLjBNKBW8+OrFn52OkYdCfXKrcC1PiV3mybK1gYT2uPWGjzOEodQ3x4GzII4qhvonkEPlaTKFnTA3sygjmsoQmbc6GnFQxP0kBIyI5B7qtF29/g2jTSB6ymvHQR+oNtrkvfaxM0tSt0tiiUV6HiI/83jCBmWkt6552D2PskpfPLgZqf968KCL5M9YfDBEBHlBswKZMBK6TvPGtS04P4S5gmi+M1rBuaubxKLZhUIs0V2OOy+HAZsJfluf6SNJe3W9x8EPqnXWT0b3tM3ybuYy4yj31JdChBk+On5zVqAoPpaWLQTeRLinVW2ludZ7KMFJltS9LqAJ0evwNcEJAnklwuE9/4uagEJjEsuWkf3C6UIyCFB5lfKlofe4hhwxkanVjds+Eg1bIJld0xqUNjPmZdA3LIWnzAq3iL5OoWN2WOAz87k7XI4A9H/ruSiPvtHf5KIOtX3fxDVP3TziOAtvb81p+pgK+WiL3LAEbEasDMw9O3HBSaXw54Gmq+gfNkoPDGCgyP7C25WH67yeqkoVtq64Q3EOpSglfjyyEmQyXT24Gs14zta3Ul6N1jSM38CDd3tIV/XCZZg3xa5TggKjI43lKe2dflR3pllF2Bpg8LH1JVMH6NKsts3TkBAy+KWrExBPOeoHgu0BZCIxs9nh1kk0k/LFQhjC6ENDW6swlJ+4hlv9865jTuu5DA4emNvpmHXKmjQ0OlQXpJJYhMcqRAoHpsT9TSaO2MYYZpbHx4kmYJIy04N5jY9TB8vzfnimnxrTYKrrM+zSxNPVCXZDVh8LUaxQKqYbgl5LsecA2QFzIc66SQc/8waruFwstNO/f8x/6ijA9s3EWrueKmYK6yeQqWrw4iVO30xppcSLK3lvk0aUYyu1TiQOXCokDCFUDIrxG/S3PEq4UgNIpTF3aRhBtkq49XCYd7MfCteVBzkDQu28IaN+JdojGY8LrVdR4VInr6p8+fmpirQZ7WgfWWLHJhqr8pF8eHG60yt372F+c5QecYvwGtitOitHbjOXKeLDKoXmtnnguTMRw4Xwz+ICfhz/wZ96PzlgKuPydwREQ4DbrhMf+mmRCc0EWi2QTGGdt56EiR/lJmXq8FpiRTgYuTRxSTtbtwFS1BHKrgdrc+Zuqm3h7t9WRvlRj8KhZEXsDJVWJgKDVT0sjox3phvRlo68Gr016valv5Lr+JAujzr1azDMgSaQhNL4cCuxW5jzL5Q3V/k9JgjEg=="/>
    <input type="hidden" value="b8506ea0b61c6bf512de56146f25f432" id="jschl-vc" name="jschl_vc"/>
    <!-- <input type="hidden" value="" id="jschl-vc" name="jschl_vc"/> -->
    <input type="hidden" name="pass" value="1643559284.29-RM/SqTEMYf"/>
    <input type="hidden" id="jschl-answer" name="jschl_answer"/>
  </form>
     
    <script type="text/javascript">
      //<![CDATA[
      (function(){
          var a = document.getElementById('cf-content');
          a.style.display = 'block';
          var isIE = /(MSIE|Trident\/|Edge\/)/i.test(window.navigator.userAgent);
          var trkjs = isIE ? new Image() : document.createElement('img');
          trkjs.setAttribute("src", "/cdn-cgi/images/trace/jschal/js/transparent.gif?ray=6d5bfeb08acc8771");
          trkjs.id = "trk_jschal_js";
          trkjs.setAttribute("alt", "");
          document.body.appendChild(trkjs);
          var cpo=document.createElement('script');
          cpo.type='text/javascript';
          cpo.src="/cdn-cgi/challenge-platform/h/b/orchestrate/jsch/v1?ray=6d5bfeb08acc8771";
          
          window._cf_chl_opt.cOgUQuery = location.search === '' && location.href.indexOf('?') !== -1 ? '?' : location.search;
          window._cf_chl_opt.cOgUHash = location.hash === '' && location.href.indexOf('#') !== -1 ? '#' : location.hash;
          if (window._cf_chl_opt.cUPMDTk && window.history && window.history.replaceState) {
            var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;
            history.replaceState(null, null, "\/prod\/Pokemon-Leafeon-V-Star-Special-Collection-Box?__cf_chl_rt_tk=wjoavPcyn4sd4H8OTvY2JlyVlLXStFtB1PtHY4IbL58-1643559283-0-gaNycGzNB70" + window._cf_chl_opt.cOgUHash);
            cpo.onload = function() {
              history.replaceState(null, null, ogU);
            };
          }
          
          document.getElementsByTagName('head')[0].appendChild(cpo);
        }());
      //]]>
    </script>
  

  
  <div id="trk_jschal_nojs" style="background-image:url('/cdn-cgi/images/trace/jschal/nojs/transparent.gif?ray=6d5bfeb08acc8771')"> </div>
</div>

          
          <div class="attribution">
            DDoS protection by <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing/" target="_blank">Cloudflare</a>
            <br />
            <span class="ray_id">Ray ID: <code>6d5bfeb08acc8771</code></span>
          </div>
      </td>
     
    </tr>
  </table>
</body>
</html> ```` 

One thing that stood out to me is this 
```<h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>``` So I am using beautiful soup and I have heard it cant handle java script? Is this whats affecting my search? 

Has anyone got tips, or if you may know the answer to my question but would prefer to point me in the correct direction, I would really appreciate it!

Thank You! 

您可以尝试将网站的源代码转换为字符串并执行以下操作之一:

website_contents = website_contents.split('<div class="product-detail__content">')[1].split('</div>')
if 'out' in website_contents.lower():
    print('Out of stock!')
else:
    print('In stock!')

if '>Out of stock </' in website_contents:
    print('Out of stock!')
else:
    print('In stock!')
    

所以我发现了我的问题。正如您从我对原始 post 所做的更新中看到的那样。我被阻止访问该站点。这是因为它是一个 Java 脚本加载站点,显然美丽的汤无法加载 Java 脚本。因此,我抓取了代码并遵循了一个使用 Selenium 的新教程,现在它完美地工作了。

对于遇到此 post 并遇到相同问题的任何人,我将提供 link 我所遵循的教程,希望它对您有所帮助!

Link: https://replit.com/talk/learn/Python-Selenium-Tutorial-The-Basics/148030