尽管 robots.txt 配置正确，网站仍出现在 Google SERP 上

Question

我有一个用于内部目的的 ExpressJS Web 应用程序，我不希望 Google 对其进行索引。所以我实现了以下路线：

app.get('/robots.txt', function(req,res) {
    res.set('Content-Type', 'text/plain');
    res.send('User-agent: *\nDisallow: /');
}

我通过点击 URL 并检查响应

来验证它工作正常

User-agent: *
Disallow: /

尽管如此，当我搜索网站标题时，我可以在 Google 上看到我的页面结果。该应用程序已经上线一年左右了，所以它不可能被缓存结果。发生这种情况还有其他可能的原因吗？有什么方法可以解决问题吗？

Answer 1

https://webmasters.stackexchange.com/questions/54879/does-google-ignore-robots-txt

Google will still see sites blocked by robots.txt, and may even list them in search results.

This is especially the case when entire domains/subdomains are blocked. Google will list links to these along with the text A description for this result is not available because of this site's robots.txt – learn more with a link to https://support.google.com/webmasters/answer/156449 .

在您的页面输出中添加 <meta name="robots" content="noindex, nofollow">。

编辑来自评论中的讨论：

If you allow a page with robots.txt but block it from being indexed using a meta tag, Googlebot will access the page, read the meta tag, and subsequently not index it.

因此，要防止 google 抓取您的网站：在 robots.txt 中使用 deny，不需要元标记。
如果有外部链接指向您的站点：在 robots.txt 中使用允许，在出现在 google.

中的那些页面上使用 noindex、nofollow

如何轻松查看 google 上有哪些页面：

使用 site:whosebug.com 作为搜索查询，google 将基本上列出已编入索引的该网站的所有页面。

要详细了解 google 如何抓取您的网页： https://support.google.com/webmasters/topic/4617736?hl=en&ref_topic=4589290

此外，请记住，google 不是唯一的搜索引擎。有 bing、yahoo、baidu 和大量其他搜索引擎，但并不是所有搜索引擎都可以很好地使用元标记或 robots.txt，有些甚至假装是另一个搜索引擎，这样他们的抓取就不会被阻止。

尽管 robots.txt 配置正确，网站仍出现在 Google SERP 上

Site appearing on Google SERP in spite of proper robots.txt configuration

html

javascript

seo

robots.txt

google-search