使用正则表达式 accur 'undefined' 拆分字符串
Split string with regular expression accur 'undefined'
我希望从 URL 中提取以下字段,例如协议、域名、端口和路径。
我知道这个split
功能对我有帮助。这是我的代码
"https://www.test.com:8081/a/b/c".split(/(:\/\/)|(:)|(\/)/)
结果是
["https", "://", undefined, undefined, "www.test.com", undefined, ":", undefined, "8081", undefined, undefined, "/", "a", undefined, undefined, "/", "b", undefined, undefined, "/", "c"]
我希望结果是
['https', '://', 'www.test.com', ':', '8081', '/', 'a/b/c']
为什么会出现undefined
?如何更正我的正则表达式?
当您将捕获组放入正则表达式中时,结果将包含匹配每个组的条目。由于你的组在不同的备选方案中,当一个备选方案匹配时,其他的将不会被使用,所以结果中对应的元素将是 undefined
.
不要在每个选项中都放置一个组,而是将组围绕所有选项。
console.log("https://www.test.com:8081/a/b/c".split(/(:\/\/|:|\/)/));
还有另一种使用URL
对象提取参数的方法
var url = new URL('https://www.test.com:8081/a/b/c');
console.log(url.protocol);
console.log(url.hostname);
console.log(url.port);
console.log(url.pathname);
当然,捕获组会包含在 split
的结果中 - 当您与在特定迭代中 不 匹配的捕获组交替时,该捕获组不会匹配,但它仍然是 split
中的捕获组,因此 undefined
被添加到该位置的数组中。例如:
console.log('abc'.split(/b|(wontmatch)/));
// a more complicated example:
console.log('abcde'.split(/(b)|(d)/));
/*
[
"a", split substring
"b", b was captured, so it's included in the match
undefined, the (d) part did not match, but it's another capturing group, so "undefined"
"c", split substring
undefined, the (b) part did not match, but it's another capturing group, so "undefined"
"d", d was captured, so it's included in the match
"e" split substring
]
*/
您遇到的行为只是上述行为的更复杂版本。
您可以考虑使用 match
而不是 split
,它可能更容易理解:
const str = "https://www.test.com:8081/a/b/c";
const matches = str.match(/([^:]+)(:\/\/)([^:]+)(:)(\d+)(\/)(.*$)/);
console.log(matches);
// I expect the result is
// ['https', '://', 'www.test.com', ':', '8081', '/', 'a/b/c']
或者,如果您只想要协议、域名、端口和路径,请删除无用的捕获组:
const str = "https://www.test.com:8081/a/b/c";
const [, protocol, domain, port, path] = str.match(
/([^:]+):\/\/([^:]+):(\d+)\/(.*$)/
);
console.log(protocol, domain, port, path);
如果端口是可选的,就把它和前面的:
放到可选的非捕获组中,把第二个字符集改成[^:/]
确保不匹配斜线:
const str = "https://www.test.com/a/b/c";
const [, protocol, domain, port, path] = str.match(
/([^:]+):\/\/([^:/]+)(?::(\d+))?\/(.*$)/
);
console.log(protocol, domain, port, path);
我希望从 URL 中提取以下字段,例如协议、域名、端口和路径。
我知道这个split
功能对我有帮助。这是我的代码
"https://www.test.com:8081/a/b/c".split(/(:\/\/)|(:)|(\/)/)
结果是
["https", "://", undefined, undefined, "www.test.com", undefined, ":", undefined, "8081", undefined, undefined, "/", "a", undefined, undefined, "/", "b", undefined, undefined, "/", "c"]
我希望结果是
['https', '://', 'www.test.com', ':', '8081', '/', 'a/b/c']
为什么会出现undefined
?如何更正我的正则表达式?
当您将捕获组放入正则表达式中时,结果将包含匹配每个组的条目。由于你的组在不同的备选方案中,当一个备选方案匹配时,其他的将不会被使用,所以结果中对应的元素将是 undefined
.
不要在每个选项中都放置一个组,而是将组围绕所有选项。
console.log("https://www.test.com:8081/a/b/c".split(/(:\/\/|:|\/)/));
还有另一种使用URL
对象提取参数的方法
var url = new URL('https://www.test.com:8081/a/b/c');
console.log(url.protocol);
console.log(url.hostname);
console.log(url.port);
console.log(url.pathname);
当然,捕获组会包含在 split
的结果中 - 当您与在特定迭代中 不 匹配的捕获组交替时,该捕获组不会匹配,但它仍然是 split
中的捕获组,因此 undefined
被添加到该位置的数组中。例如:
console.log('abc'.split(/b|(wontmatch)/));
// a more complicated example:
console.log('abcde'.split(/(b)|(d)/));
/*
[
"a", split substring
"b", b was captured, so it's included in the match
undefined, the (d) part did not match, but it's another capturing group, so "undefined"
"c", split substring
undefined, the (b) part did not match, but it's another capturing group, so "undefined"
"d", d was captured, so it's included in the match
"e" split substring
]
*/
您遇到的行为只是上述行为的更复杂版本。
您可以考虑使用 match
而不是 split
,它可能更容易理解:
const str = "https://www.test.com:8081/a/b/c";
const matches = str.match(/([^:]+)(:\/\/)([^:]+)(:)(\d+)(\/)(.*$)/);
console.log(matches);
// I expect the result is
// ['https', '://', 'www.test.com', ':', '8081', '/', 'a/b/c']
或者,如果您只想要协议、域名、端口和路径,请删除无用的捕获组:
const str = "https://www.test.com:8081/a/b/c";
const [, protocol, domain, port, path] = str.match(
/([^:]+):\/\/([^:]+):(\d+)\/(.*$)/
);
console.log(protocol, domain, port, path);
如果端口是可选的,就把它和前面的:
放到可选的非捕获组中,把第二个字符集改成[^:/]
确保不匹配斜线:
const str = "https://www.test.com/a/b/c";
const [, protocol, domain, port, path] = str.match(
/([^:]+):\/\/([^:/]+)(?::(\d+))?\/(.*$)/
);
console.log(protocol, domain, port, path);