html 属性的正则表达式,需要修复
regex for html attributes, need fix
需要修复这个正则表达式,它通过 php 中的 preg_mach_all 函数为我提取数组中的 html 属性:
(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?
属性示例是:
style="width: 462px;" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......=" data-filename="Screenshot from 2016-02-09 21:54:47.png"
finddle 中的工作示例:https://regex101.com/r/QE9XGD/1
因为 src
属性末尾的等号,我得到了错误的数组:
Array
(
[0] => Array
(
[0] => style="width: 462px;"
[1] => src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......=" data-filename="
)
[1] => Array
(
[0] => style
[1] => src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......
)
[2] => Array
(
[0] => width: 462px;
[1] => data-filename=
)
)
正确的数组应该是这样的:
Array
(
[0] => Array
(
[0] => style="width: 462px;"
[1] => src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......="
[2] => data-filename="Screenshot from 2016-02-09 1:54:47.png"
)
[1] => Array
(
[0] => style
[1] => src
[2] => data-filename
)
[2] => Array
(
[0] => width: 462px;
[1] => data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......=
[2] => Screenshot from 2016-02-09 1:54:47.png
)
)
如何修正这个正则表达式以获得正确答案?
记住我不仅在图像属性提取中使用这个正则表达式,它是所有类型 html 标签的通用正则表达式
需要修复这个正则表达式,它通过 php 中的 preg_mach_all 函数为我提取数组中的 html 属性:
(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?
属性示例是:
style="width: 462px;" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......=" data-filename="Screenshot from 2016-02-09 21:54:47.png"
finddle 中的工作示例:https://regex101.com/r/QE9XGD/1
因为 src
属性末尾的等号,我得到了错误的数组:
Array
(
[0] => Array
(
[0] => style="width: 462px;"
[1] => src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......=" data-filename="
)
[1] => Array
(
[0] => style
[1] => src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......
)
[2] => Array
(
[0] => width: 462px;
[1] => data-filename=
)
)
正确的数组应该是这样的:
Array
(
[0] => Array
(
[0] => style="width: 462px;"
[1] => src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......="
[2] => data-filename="Screenshot from 2016-02-09 1:54:47.png"
)
[1] => Array
(
[0] => style
[1] => src
[2] => data-filename
)
[2] => Array
(
[0] => width: 462px;
[1] => data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAg4AAALoCAYAAAAQpn2mAAAABHNCSVQICAgIfAhkiAAAABl0RVh0U29mdHdhcmUAZ25vbWUtc2NyZWVuc2hvdO8Dv4AACAASURBVHic7L15fNTVufj/PjOTyWSyTfaEJBD2EJBNQFQEtFVRXMD7VQG1dfu2tLW92t77unaxam+t9nbTXze9tW61Vdqvgre9FXcqUHFBFiUEkX0PgSQkmf1zzu+Pzz6ZhBBwg3l4kZn5fM7yPM8553me85znnAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAYykIEMZCADGchABjKQgQxkIAMZyEAGMpCBDGQgAxnIQAqIiy66SDXM/SW7DyUQgEIBAiFAKTOZQn8p7N/OQhB6PgFCgUI43ull6mmwyhUolFWJMB.......=
[2] => Screenshot from 2016-02-09 1:54:47.png
)
)
如何修正这个正则表达式以获得正确答案?
记住我不仅在图像属性提取中使用这个正则表达式,它是所有类型 html 标签的通用正则表达式