Javascript 屏幕抓取工具
Javascript screenscraper
我正在尝试在此站点上制作一个简单的屏幕抓取工具 - List of Javascript libraries 它应该 运行 通过控制台和 return 所有库作为没有类别的文本。我设法用下面的代码得到了所有这些。他们还提到我们可以使用 map() 函数来映射内容,但我无法做到这一点。我的问题是如何遍历所有类别并将各种数组连接成一个库名称数组?任何帮助表示赞赏!
root = document.documentElement
const firstTitle = root.getElementsByClassName("mw-headline")[0]
const firstGroup = firstTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const secondTitle = root.getElementsByClassName("mw-headline")[1]
const secondGroup = secondTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const thirdTitle = root.getElementsByClassName("mw-headline")[2]
const thirdGroup = thirdTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const fourthTitle = root.getElementsByClassName("mw-headline")[3]
const fourthGroup = fourthTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const fifthTitle = root.getElementsByClassName("mw-headline")[4]
const fifthGroup = fifthTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const sixthTitle = root.getElementsByClassName("mw-headline")[5]
const sixthGroup = sixthTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const seventhTitle = root.getElementsByClassName("mw-headline")[6]
const seventhGroup = seventhTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const eightTitle = root.getElementsByClassName("mw-headline")[7]
const eightGroup = eightTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const ninthTitle = root.getElementsByClassName("mw-headline")[8]
const ninthGroup = secondTitle.nextElementSibling.parentElement.nextElementSibling.textContent
Array(firstGroup, secondGroup, thirdGroup, fourthGroup, fifthGroup, sixthGroup, seventhGroup, eightGroup, ninthGroup)
结果
(9) ["↵Cassowary (software)↵CHR.js↵", "↵Google Polymer↵Dojo Toolkit↵jQuery↵midori↵MooTools↵Prototype JavaScript Framework↵", "↵See also: List of JavaScript graphics libraries↵A…echart↵Three.js↵Velocity.js↵Verge3D↵WhitestormJS↵", "↵AngularJS (framework)↵Angular (application platfo…K↵Glow↵Lively Kernel↵Script.aculo.us↵YUI Library↵", "Ample SDK↵Glow↵Lively Kernel↵Script.aculo.us↵YUI Library↵", "↵Google Closure Library↵Joose↵JsPHP↵Microsoft's Aj…F.js↵Rico↵Socket.IO↵Spry framework↵Underscore.js↵", "↵Cascade Framework↵jQuery Mobile↵Mustache↵Jinja-JS↵Twig.js↵", "↵Jasmine↵Mocha↵QUnit↵Tape↵Unit.js↵", "↵Google Polymer↵Dojo Toolkit↵jQuery↵midori↵MooTools↵Prototype JavaScript Framework↵"]
0
:
"↵Cassowary (software)↵CHR.js↵"
1
:
"↵Google Polymer↵Dojo Toolkit↵jQuery↵midori↵MooTools↵Prototype JavaScript Framework↵"
2
:
"↵See also: List of JavaScript graphics libraries↵AnyChart↵D3.js↵FusionCharts↵Highcharts↵EaselJS, part of CreateJS↵JavaScript InfoVis Toolkit↵p5.js↵Pixi.js↵Plotly↵Processing.js↵Raphaël↵SWFObject↵Teechart↵Three.js↵Velocity.js↵Verge3D↵WhitestormJS↵"
3
:
"↵AngularJS (framework)↵Angular (application platform)↵Bootstrap↵DevExtreme of DevExpress↵DHTMLX↵Dojo Widgets↵Ext JS of Sencha↵ZURB Foundation↵Google's Polymer paper elements↵jQuery UI↵jQWidgets↵Ignite UI of Infragistics↵Kendo UI of Telerik↵Wijmo 5 of GrapeCity↵OpenUI5 of SAP↵qooxdoo↵SmartClient↵React.js↵Webix↵WinJS↵No longer actively developedEdit↵Ample SDK↵Glow↵Lively Kernel↵Script.aculo.us↵YUI Library↵"
4
:
"Ample SDK↵Glow↵Lively Kernel↵Script.aculo.us↵YUI Library↵"
5
:
"↵Google Closure Library↵Joose↵JsPHP↵Microsoft's Ajax library↵MochiKit↵PDF.js↵Rico↵Socket.IO↵Spry framework↵Underscore.js↵"
6
:
"↵Cascade Framework↵jQuery Mobile↵Mustache↵Jinja-JS↵Twig.js↵"
7
:
"↵Jasmine↵Mocha↵QUnit↵Tape↵Unit.js↵"
8
:
"↵Google Polymer↵Dojo Toolkit↵jQuery↵midori↵MooTools↵Prototype JavaScript Framework↵"
length
:
9
您应该使用 querySelector 来查找类别名称的所有节点:
这是一个经过测试的工作示例:
const libNames = [... document.documentElement.querySelectorAll('.mw-headline')].map((lib) => lib.nextElementSibling.parentElement.nextElementSibling.textContent)
如果你想要一个单一的字符串,你可以加入 libNames :
const libNames = [... document.documentElement.querySelectorAll('.mw-headline')].map((lib) => lib.nextElementSibling.parentElement.nextElementSibling.textContent).join(' ')
我正在尝试在此站点上制作一个简单的屏幕抓取工具 - List of Javascript libraries 它应该 运行 通过控制台和 return 所有库作为没有类别的文本。我设法用下面的代码得到了所有这些。他们还提到我们可以使用 map() 函数来映射内容,但我无法做到这一点。我的问题是如何遍历所有类别并将各种数组连接成一个库名称数组?任何帮助表示赞赏!
root = document.documentElement
const firstTitle = root.getElementsByClassName("mw-headline")[0]
const firstGroup = firstTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const secondTitle = root.getElementsByClassName("mw-headline")[1]
const secondGroup = secondTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const thirdTitle = root.getElementsByClassName("mw-headline")[2]
const thirdGroup = thirdTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const fourthTitle = root.getElementsByClassName("mw-headline")[3]
const fourthGroup = fourthTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const fifthTitle = root.getElementsByClassName("mw-headline")[4]
const fifthGroup = fifthTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const sixthTitle = root.getElementsByClassName("mw-headline")[5]
const sixthGroup = sixthTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const seventhTitle = root.getElementsByClassName("mw-headline")[6]
const seventhGroup = seventhTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const eightTitle = root.getElementsByClassName("mw-headline")[7]
const eightGroup = eightTitle.nextElementSibling.parentElement.nextElementSibling.textContent
const ninthTitle = root.getElementsByClassName("mw-headline")[8]
const ninthGroup = secondTitle.nextElementSibling.parentElement.nextElementSibling.textContent
Array(firstGroup, secondGroup, thirdGroup, fourthGroup, fifthGroup, sixthGroup, seventhGroup, eightGroup, ninthGroup)
结果
(9) ["↵Cassowary (software)↵CHR.js↵", "↵Google Polymer↵Dojo Toolkit↵jQuery↵midori↵MooTools↵Prototype JavaScript Framework↵", "↵See also: List of JavaScript graphics libraries↵A…echart↵Three.js↵Velocity.js↵Verge3D↵WhitestormJS↵", "↵AngularJS (framework)↵Angular (application platfo…K↵Glow↵Lively Kernel↵Script.aculo.us↵YUI Library↵", "Ample SDK↵Glow↵Lively Kernel↵Script.aculo.us↵YUI Library↵", "↵Google Closure Library↵Joose↵JsPHP↵Microsoft's Aj…F.js↵Rico↵Socket.IO↵Spry framework↵Underscore.js↵", "↵Cascade Framework↵jQuery Mobile↵Mustache↵Jinja-JS↵Twig.js↵", "↵Jasmine↵Mocha↵QUnit↵Tape↵Unit.js↵", "↵Google Polymer↵Dojo Toolkit↵jQuery↵midori↵MooTools↵Prototype JavaScript Framework↵"]
0
:
"↵Cassowary (software)↵CHR.js↵"
1
:
"↵Google Polymer↵Dojo Toolkit↵jQuery↵midori↵MooTools↵Prototype JavaScript Framework↵"
2
:
"↵See also: List of JavaScript graphics libraries↵AnyChart↵D3.js↵FusionCharts↵Highcharts↵EaselJS, part of CreateJS↵JavaScript InfoVis Toolkit↵p5.js↵Pixi.js↵Plotly↵Processing.js↵Raphaël↵SWFObject↵Teechart↵Three.js↵Velocity.js↵Verge3D↵WhitestormJS↵"
3
:
"↵AngularJS (framework)↵Angular (application platform)↵Bootstrap↵DevExtreme of DevExpress↵DHTMLX↵Dojo Widgets↵Ext JS of Sencha↵ZURB Foundation↵Google's Polymer paper elements↵jQuery UI↵jQWidgets↵Ignite UI of Infragistics↵Kendo UI of Telerik↵Wijmo 5 of GrapeCity↵OpenUI5 of SAP↵qooxdoo↵SmartClient↵React.js↵Webix↵WinJS↵No longer actively developedEdit↵Ample SDK↵Glow↵Lively Kernel↵Script.aculo.us↵YUI Library↵"
4
:
"Ample SDK↵Glow↵Lively Kernel↵Script.aculo.us↵YUI Library↵"
5
:
"↵Google Closure Library↵Joose↵JsPHP↵Microsoft's Ajax library↵MochiKit↵PDF.js↵Rico↵Socket.IO↵Spry framework↵Underscore.js↵"
6
:
"↵Cascade Framework↵jQuery Mobile↵Mustache↵Jinja-JS↵Twig.js↵"
7
:
"↵Jasmine↵Mocha↵QUnit↵Tape↵Unit.js↵"
8
:
"↵Google Polymer↵Dojo Toolkit↵jQuery↵midori↵MooTools↵Prototype JavaScript Framework↵"
length
:
9
您应该使用 querySelector 来查找类别名称的所有节点: 这是一个经过测试的工作示例:
const libNames = [... document.documentElement.querySelectorAll('.mw-headline')].map((lib) => lib.nextElementSibling.parentElement.nextElementSibling.textContent)
如果你想要一个单一的字符串,你可以加入 libNames :
const libNames = [... document.documentElement.querySelectorAll('.mw-headline')].map((lib) => lib.nextElementSibling.parentElement.nextElementSibling.textContent).join(' ')