在文本文件上编写和更新 ~8K-10K 次 URL 字符串迭代(PHP、性能、CRON)
Writing and Updating ~8K-10K Iterations of URLs Strings on a Text File (PHP, Performance, CRON)
问题
我试图在 txt
文件中编写一个唯一的 list-urls 站点地图。每天生成一个文件,可以更新。
试用
generateSitemap
是大型 class UpdateStocks
的一部分,它获取输入字符串并为该输入写入 URL 迭代约 8-10K。在转到 generateSitemap
之前,正在使用来自 API 的数据生成输入。
性能
你能不能帮我让它更快、更简单或更有效率? generateSitemap
中有一个我没发现的小错误,当它更新文件时,有时 txt
文件中会有一个额外的换行符 \n
。
调用 generateSitemap
的伪代码
{pseudocode} for i=1 to 8000;
generate input[i]; // for example: 'aapl-apple-technology-nasdaq-us-8f4c'
UpdateStocks::generateSitemap(input[i]);
{/pseudocode} endfor;
Class常数
const DIR_URL_KEYWORD_1 = "equity";
const DIR_URL_KEYWORD_2 = "equilibrium-estimation";
const DOMAIN = "domain.org";
const EXTENSION_MD = ".md";
const EXTENSION_TXT = ".txt";
const NEW_LINE = "\n";
const PROTOCOL = "https://";
const SITEMAP_PREFIX = "/sitemap-";
const SLASH = "/";
生成站点地图
/**
*
* @return a large string in a txt file including all urls for a daily sitemap
*/
public static function generateSitemap($lurl){
$dir=__DIR__ . self::DIR_FRONT_PUBLIC_HTML;
// url
$sm=sprintf('%s%s%s',
self::PROTOCOL.self::DOMAIN.self::SLASH.self::DIR_URL_KEYWORD_1.self::SLASH.self::DIR_URL_KEYWORD_2.self::SLASH,
$lurl,
self::NEW_LINE
);
$dt=new \DateTime('now');
$dt=$dt->format('Y-m-d'); // today
$fn=$dir . self::SITEMAP_PREFIX . $dt . self::EXTENSION_TXT; // sitemap filename in public_html
// if daily sitemap already exits
if(file_exists($fn)){
$arr = preg_split('/\n/', trim(file_get_contents($fn))); // array of links
$i=0; // counter
foreach ($arr as $k=>$lk){
if($arr[$k]==null){unset($arr[$k]);}
if(trim($lk)===trim($sm)){ // link already exist
$i++;
if($i>0){$arr[$k]=null;} // link already exist more than once
}else{
if($k==sizeof($arr)-1){
$k++;
$arr[$k]=$sm;
$arr=implode(self::NEW_LINE, $arr);
$fh=fopen($fn, 'wb');
fwrite($fh, $arr);
fclose($fh);
}
continue;
}
}
}else{
$fh=fopen($fn, 'wb');
fwrite($fh, $sm);
fclose($fh);
}
}
输入示例
a-agilent-technologies-healthcare-nyse-us-39d4
aa-alcoa-basic-materials-nyse-us-159a
aaau-perth-mint-physical-gold-nyse-us-8ed9
aaba-altaba-financial-services-nasdaq-us-26f5
aac-healthcare-nyse-us-e92a
aadr-advisorshares-dorsey-wright-adr-nyse-us-d842
aal-airlines-industrials-nasdaq-us-29eb
aamc-altisource-asset-management-com-financial-services-nyse-us-b46a
aan-aarons-industrials-nyse-us-d00e
aaoi-applied-optoelectronics-technology-nasdaq-us-1dee
aaon-basic-materials-nasdaq-us-238e
aap-advance-auto-parts-wi-consumer-cyclical-nyse-us-1f60
aapl-apple-technology-nasdaq-us-8f4c
aat-assets-real-estate-nyse-us-3598
aau-almaden-minerals-basic-materials-nyse-us-1c57
aaww-atlas-air-worldwide-industrials-nasdaq-us-69f3
aaxj-ishares-msci-all-country-asia-ex-japan-nasdaq-us-c6c4
aaxn-axon-enterprise-industrials-nasdaq-us-0eef
ab-alliancebernstein-units-financial-services-nyse-us-deb1
abac-renmin-tianli-consumer-defensive-nasdaq-us-8701
abb-industrials-nyse-us-a407
abbv-abbvie-healthcare-nyse-us-9aea
abc-amerisourcebergen-healthcare-nyse-us-bd9d
abcb-ameris-bancorp-financial-services-nasdaq-us-df98
abdc-alcentra-capital-financial-services-nasdaq-us-96dd
abeo-abeona-therapeutics-healthcare-nasdaq-us-aa0f
abeow-market-us-d84d
abev-ambev-1-consumer-defensive-nyse-us-a9b4
abg-asbury-automotive-consumer-cyclical-nyse-us-db5f
abil-ability-technology-nasdaq-us-91a6
abio-arca-biopharma-healthcare-nasdaq-us-098e
abm-abm-industries-industrials-nyse-us-bcbc
abmd-abiomed-healthcare-nasdaq-us-2818
abr-arbor-realty-real-estate-nyse-us-68b1
abr-a-arbor-realty-real-estate-nyse-us-8c1d
abr-b-arbor-realty-real-estate-nyse-us-97f2
abr-c-arbor-realty-real-estate-nyse-us-ee81
abt-abbott-laboratories-healthcare-nyse-us-c7fd
abtx-allegiance-bancshares-financial-services-nasdaq-us-6913
abus-arbutus-biopharma-healthcare-nasdaq-us-c23f
ac-associated-capital-financial-services-nyse-us-fca3
aca-arcosa-industrials-nyse-us-b429
部分站点地图-2019-03-15.txt:
domain.org/equity/equilibrium-estimation/a-agilent-technologies-healthcare-nyse-us-39d4
domain.org/equity/equilibrium-estimation/aa-alcoa-basic-materials-nyse-us-159a
domain.org/equity/equilibrium-estimation/aaau-perth-mint-physical-gold-nyse-us-8ed9
domain.org/equity/equilibrium-estimation/aaba-altaba-financial-services-nasdaq-us-26f5
domain.org/equity/equilibrium-estimation/aac-healthcare-nyse-us-e92a
domain.org/equity/equilibrium-estimation/aadr-advisorshares-dorsey-wright-adr-nyse-us-d842
domain.org/equity/equilibrium-estimation/aal-airlines-industrials-nasdaq-us-29eb
domain.org/equity/equilibrium-estimation/aamc-altisource-asset-management-com-financial-services-nyse-us-b46a
domain.org/equity/equilibrium-estimation/aan-aarons-industrials-nyse-us-d00e
domain.org/equity/equilibrium-estimation/aaoi-applied-optoelectronics-technology-nasdaq-us-1dee
domain.org/equity/equilibrium-estimation/aaon-basic-materials-nasdaq-us-238e
domain.org/equity/equilibrium-estimation/aap-advance-auto-parts-wi-consumer-cyclical-nyse-us-1f60
domain.org/equity/equilibrium-estimation/aapl-apple-technology-nasdaq-us-8f4c
domain.org/equity/equilibrium-estimation/aat-assets-real-estate-nyse-us-3598
domain.org/equity/equilibrium-estimation/aau-almaden-minerals-basic-materials-nyse-us-1c57
domain.org/equity/equilibrium-estimation/aaww-atlas-air-worldwide-industrials-nasdaq-us-69f3
domain.org/equity/equilibrium-estimation/aaxj-ishares-msci-all-country-asia-ex-japan-nasdaq-us-c6c4
domain.org/equity/equilibrium-estimation/aaxn-axon-enterprise-industrials-nasdaq-us-0eef
domain.org/equity/equilibrium-estimation/ab-alliancebernstein-units-financial-services-nyse-us-deb1
domain.org/equity/equilibrium-estimation/abac-renmin-tianli-consumer-defensive-nasdaq-us-8701
domain.org/equity/equilibrium-estimation/abb-industrials-nyse-us-a407
domain.org/equity/equilibrium-estimation/abbv-abbvie-healthcare-nyse-us-9aea
domain.org/equity/equilibrium-estimation/abc-amerisourcebergen-healthcare-nyse-us-bd9d
domain.org/equity/equilibrium-estimation/abcb-ameris-bancorp-financial-services-nasdaq-us-df98
domain.org/equity/equilibrium-estimation/abdc-alcentra-capital-financial-services-nasdaq-us-96dd
domain.org/equity/equilibrium-estimation/abeo-abeona-therapeutics-healthcare-nasdaq-us-aa0f
domain.org/equity/equilibrium-estimation/abeow-market-us-d84d
domain.org/equity/equilibrium-estimation/abev-ambev-1-consumer-defensive-nyse-us-a9b4
domain.org/equity/equilibrium-estimation/abg-asbury-automotive-consumer-cyclical-nyse-us-db5f
domain.org/equity/equilibrium-estimation/abil-ability-technology-nasdaq-us-91a6
domain.org/equity/equilibrium-estimation/abio-arca-biopharma-healthcare-nasdaq-us-098e
domain.org/equity/equilibrium-estimation/abm-abm-industries-industrials-nyse-us-bcbc
domain.org/equity/equilibrium-estimation/abmd-abiomed-healthcare-nasdaq-us-2818
domain.org/equity/equilibrium-estimation/abr-arbor-realty-real-estate-nyse-us-68b1
domain.org/equity/equilibrium-estimation/abr-a-arbor-realty-real-estate-nyse-us-8c1d
domain.org/equity/equilibrium-estimation/abr-b-arbor-realty-real-estate-nyse-us-97f2
domain.org/equity/equilibrium-estimation/abr-c-arbor-realty-real-estate-nyse-us-ee81
domain.org/equity/equilibrium-estimation/abt-abbott-laboratories-healthcare-nyse-us-c7fd
domain.org/equity/equilibrium-estimation/abtx-allegiance-bancshares-financial-services-nasdaq-us-6913
domain.org/equity/equilibrium-estimation/abus-arbutus-biopharma-healthcare-nasdaq-us-c23f
domain.org/equity/equilibrium-estimation/ac-associated-capital-financial-services-nyse-us-fca3
domain.org/equity/equilibrium-estimation/aca-arcosa-industrials-nyse-us-b429
$i=0; // counter
foreach ($arr as $k=>$lk){
if($arr[$k]==null)
{unset($arr[$k]);}
if(trim($lk)===trim($sm))
{
if($i>0){$arr[$k]=null;}
$i++;
}
$i++ 应该在 if 语句之后
这是一个未经测试的脚本,它体现了我将如何 运行 它(除非我们正在处理过大的文件大小)。
将所有 api 个字符串收集并准备到一个数组中。
如果是当天的第一条数据,直接推入新文件即可。
如果文件存在,提取旧数据,与新数据合并,清除重复项,按字母顺序排列,然后替换文件内容。
public static function collectAPIData() {
$leading_url = self::PROTOCOL .
self::DOMAIN .
self::SLASH .
self::DIR_URL_KEYWORD_1 .
self::SLASH .
self::DIR_URL_KEYWORD_2 .
self::SLASH;
$fresh_data = [];
// start loop
$fresh_data[] = $leading_url . $your_string_from_the_api;
// end loop
return $fresh_data;
}
public static function storeSitemapData($new_urls) {
if (!$new_urls)) {
return;
}
$fn = __DIR__ .
self::DIR_FRONT_PUBLIC_HTML .
self::SITEMAP_PREFIX .
(new \DateTime('now'))->format('Y-m-d') .
self::EXTENSION_TXT;
if (file_exists($fn)) {
$old_urls = file($fn, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
$merged = array_merge($old_urls, $new_urls);
$unique = array_keys(array_flip($merged));
sort($unique);
$new_urls = $unique;
}
file_put_contents($fn, implode(self::NEW_LINE, $new_urls));
}
这些静态函数可以这样调用:
UpdateStocks::storeSitemapData(UpdateStocks::collectAPIData));
事实上,为了提高效率,我可以区分新的唯一 url,然后将它们附加到现有文件中,但我喜欢按字母顺序排列数据的想法。
问题
我试图在 txt
文件中编写一个唯一的 list-urls 站点地图。每天生成一个文件,可以更新。
试用
generateSitemap
是大型 class UpdateStocks
的一部分,它获取输入字符串并为该输入写入 URL 迭代约 8-10K。在转到 generateSitemap
之前,正在使用来自 API 的数据生成输入。
性能
你能不能帮我让它更快、更简单或更有效率? generateSitemap
中有一个我没发现的小错误,当它更新文件时,有时 txt
文件中会有一个额外的换行符 \n
。
调用 generateSitemap
的伪代码
{pseudocode} for i=1 to 8000;
generate input[i]; // for example: 'aapl-apple-technology-nasdaq-us-8f4c'
UpdateStocks::generateSitemap(input[i]);
{/pseudocode} endfor;
Class常数
const DIR_URL_KEYWORD_1 = "equity";
const DIR_URL_KEYWORD_2 = "equilibrium-estimation";
const DOMAIN = "domain.org";
const EXTENSION_MD = ".md";
const EXTENSION_TXT = ".txt";
const NEW_LINE = "\n";
const PROTOCOL = "https://";
const SITEMAP_PREFIX = "/sitemap-";
const SLASH = "/";
生成站点地图
/**
*
* @return a large string in a txt file including all urls for a daily sitemap
*/
public static function generateSitemap($lurl){
$dir=__DIR__ . self::DIR_FRONT_PUBLIC_HTML;
// url
$sm=sprintf('%s%s%s',
self::PROTOCOL.self::DOMAIN.self::SLASH.self::DIR_URL_KEYWORD_1.self::SLASH.self::DIR_URL_KEYWORD_2.self::SLASH,
$lurl,
self::NEW_LINE
);
$dt=new \DateTime('now');
$dt=$dt->format('Y-m-d'); // today
$fn=$dir . self::SITEMAP_PREFIX . $dt . self::EXTENSION_TXT; // sitemap filename in public_html
// if daily sitemap already exits
if(file_exists($fn)){
$arr = preg_split('/\n/', trim(file_get_contents($fn))); // array of links
$i=0; // counter
foreach ($arr as $k=>$lk){
if($arr[$k]==null){unset($arr[$k]);}
if(trim($lk)===trim($sm)){ // link already exist
$i++;
if($i>0){$arr[$k]=null;} // link already exist more than once
}else{
if($k==sizeof($arr)-1){
$k++;
$arr[$k]=$sm;
$arr=implode(self::NEW_LINE, $arr);
$fh=fopen($fn, 'wb');
fwrite($fh, $arr);
fclose($fh);
}
continue;
}
}
}else{
$fh=fopen($fn, 'wb');
fwrite($fh, $sm);
fclose($fh);
}
}
输入示例
a-agilent-technologies-healthcare-nyse-us-39d4
aa-alcoa-basic-materials-nyse-us-159a
aaau-perth-mint-physical-gold-nyse-us-8ed9
aaba-altaba-financial-services-nasdaq-us-26f5
aac-healthcare-nyse-us-e92a
aadr-advisorshares-dorsey-wright-adr-nyse-us-d842
aal-airlines-industrials-nasdaq-us-29eb
aamc-altisource-asset-management-com-financial-services-nyse-us-b46a
aan-aarons-industrials-nyse-us-d00e
aaoi-applied-optoelectronics-technology-nasdaq-us-1dee
aaon-basic-materials-nasdaq-us-238e
aap-advance-auto-parts-wi-consumer-cyclical-nyse-us-1f60
aapl-apple-technology-nasdaq-us-8f4c
aat-assets-real-estate-nyse-us-3598
aau-almaden-minerals-basic-materials-nyse-us-1c57
aaww-atlas-air-worldwide-industrials-nasdaq-us-69f3
aaxj-ishares-msci-all-country-asia-ex-japan-nasdaq-us-c6c4
aaxn-axon-enterprise-industrials-nasdaq-us-0eef
ab-alliancebernstein-units-financial-services-nyse-us-deb1
abac-renmin-tianli-consumer-defensive-nasdaq-us-8701
abb-industrials-nyse-us-a407
abbv-abbvie-healthcare-nyse-us-9aea
abc-amerisourcebergen-healthcare-nyse-us-bd9d
abcb-ameris-bancorp-financial-services-nasdaq-us-df98
abdc-alcentra-capital-financial-services-nasdaq-us-96dd
abeo-abeona-therapeutics-healthcare-nasdaq-us-aa0f
abeow-market-us-d84d
abev-ambev-1-consumer-defensive-nyse-us-a9b4
abg-asbury-automotive-consumer-cyclical-nyse-us-db5f
abil-ability-technology-nasdaq-us-91a6
abio-arca-biopharma-healthcare-nasdaq-us-098e
abm-abm-industries-industrials-nyse-us-bcbc
abmd-abiomed-healthcare-nasdaq-us-2818
abr-arbor-realty-real-estate-nyse-us-68b1
abr-a-arbor-realty-real-estate-nyse-us-8c1d
abr-b-arbor-realty-real-estate-nyse-us-97f2
abr-c-arbor-realty-real-estate-nyse-us-ee81
abt-abbott-laboratories-healthcare-nyse-us-c7fd
abtx-allegiance-bancshares-financial-services-nasdaq-us-6913
abus-arbutus-biopharma-healthcare-nasdaq-us-c23f
ac-associated-capital-financial-services-nyse-us-fca3
aca-arcosa-industrials-nyse-us-b429
部分站点地图-2019-03-15.txt:
domain.org/equity/equilibrium-estimation/a-agilent-technologies-healthcare-nyse-us-39d4
domain.org/equity/equilibrium-estimation/aa-alcoa-basic-materials-nyse-us-159a
domain.org/equity/equilibrium-estimation/aaau-perth-mint-physical-gold-nyse-us-8ed9
domain.org/equity/equilibrium-estimation/aaba-altaba-financial-services-nasdaq-us-26f5
domain.org/equity/equilibrium-estimation/aac-healthcare-nyse-us-e92a
domain.org/equity/equilibrium-estimation/aadr-advisorshares-dorsey-wright-adr-nyse-us-d842
domain.org/equity/equilibrium-estimation/aal-airlines-industrials-nasdaq-us-29eb
domain.org/equity/equilibrium-estimation/aamc-altisource-asset-management-com-financial-services-nyse-us-b46a
domain.org/equity/equilibrium-estimation/aan-aarons-industrials-nyse-us-d00e
domain.org/equity/equilibrium-estimation/aaoi-applied-optoelectronics-technology-nasdaq-us-1dee
domain.org/equity/equilibrium-estimation/aaon-basic-materials-nasdaq-us-238e
domain.org/equity/equilibrium-estimation/aap-advance-auto-parts-wi-consumer-cyclical-nyse-us-1f60
domain.org/equity/equilibrium-estimation/aapl-apple-technology-nasdaq-us-8f4c
domain.org/equity/equilibrium-estimation/aat-assets-real-estate-nyse-us-3598
domain.org/equity/equilibrium-estimation/aau-almaden-minerals-basic-materials-nyse-us-1c57
domain.org/equity/equilibrium-estimation/aaww-atlas-air-worldwide-industrials-nasdaq-us-69f3
domain.org/equity/equilibrium-estimation/aaxj-ishares-msci-all-country-asia-ex-japan-nasdaq-us-c6c4
domain.org/equity/equilibrium-estimation/aaxn-axon-enterprise-industrials-nasdaq-us-0eef
domain.org/equity/equilibrium-estimation/ab-alliancebernstein-units-financial-services-nyse-us-deb1
domain.org/equity/equilibrium-estimation/abac-renmin-tianli-consumer-defensive-nasdaq-us-8701
domain.org/equity/equilibrium-estimation/abb-industrials-nyse-us-a407
domain.org/equity/equilibrium-estimation/abbv-abbvie-healthcare-nyse-us-9aea
domain.org/equity/equilibrium-estimation/abc-amerisourcebergen-healthcare-nyse-us-bd9d
domain.org/equity/equilibrium-estimation/abcb-ameris-bancorp-financial-services-nasdaq-us-df98
domain.org/equity/equilibrium-estimation/abdc-alcentra-capital-financial-services-nasdaq-us-96dd
domain.org/equity/equilibrium-estimation/abeo-abeona-therapeutics-healthcare-nasdaq-us-aa0f
domain.org/equity/equilibrium-estimation/abeow-market-us-d84d
domain.org/equity/equilibrium-estimation/abev-ambev-1-consumer-defensive-nyse-us-a9b4
domain.org/equity/equilibrium-estimation/abg-asbury-automotive-consumer-cyclical-nyse-us-db5f
domain.org/equity/equilibrium-estimation/abil-ability-technology-nasdaq-us-91a6
domain.org/equity/equilibrium-estimation/abio-arca-biopharma-healthcare-nasdaq-us-098e
domain.org/equity/equilibrium-estimation/abm-abm-industries-industrials-nyse-us-bcbc
domain.org/equity/equilibrium-estimation/abmd-abiomed-healthcare-nasdaq-us-2818
domain.org/equity/equilibrium-estimation/abr-arbor-realty-real-estate-nyse-us-68b1
domain.org/equity/equilibrium-estimation/abr-a-arbor-realty-real-estate-nyse-us-8c1d
domain.org/equity/equilibrium-estimation/abr-b-arbor-realty-real-estate-nyse-us-97f2
domain.org/equity/equilibrium-estimation/abr-c-arbor-realty-real-estate-nyse-us-ee81
domain.org/equity/equilibrium-estimation/abt-abbott-laboratories-healthcare-nyse-us-c7fd
domain.org/equity/equilibrium-estimation/abtx-allegiance-bancshares-financial-services-nasdaq-us-6913
domain.org/equity/equilibrium-estimation/abus-arbutus-biopharma-healthcare-nasdaq-us-c23f
domain.org/equity/equilibrium-estimation/ac-associated-capital-financial-services-nyse-us-fca3
domain.org/equity/equilibrium-estimation/aca-arcosa-industrials-nyse-us-b429
$i=0; // counter
foreach ($arr as $k=>$lk){
if($arr[$k]==null)
{unset($arr[$k]);}
if(trim($lk)===trim($sm))
{
if($i>0){$arr[$k]=null;}
$i++;
}
$i++ 应该在 if 语句之后
这是一个未经测试的脚本,它体现了我将如何 运行 它(除非我们正在处理过大的文件大小)。
将所有 api 个字符串收集并准备到一个数组中。
如果是当天的第一条数据,直接推入新文件即可。
如果文件存在,提取旧数据,与新数据合并,清除重复项,按字母顺序排列,然后替换文件内容。
public static function collectAPIData() {
$leading_url = self::PROTOCOL .
self::DOMAIN .
self::SLASH .
self::DIR_URL_KEYWORD_1 .
self::SLASH .
self::DIR_URL_KEYWORD_2 .
self::SLASH;
$fresh_data = [];
// start loop
$fresh_data[] = $leading_url . $your_string_from_the_api;
// end loop
return $fresh_data;
}
public static function storeSitemapData($new_urls) {
if (!$new_urls)) {
return;
}
$fn = __DIR__ .
self::DIR_FRONT_PUBLIC_HTML .
self::SITEMAP_PREFIX .
(new \DateTime('now'))->format('Y-m-d') .
self::EXTENSION_TXT;
if (file_exists($fn)) {
$old_urls = file($fn, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
$merged = array_merge($old_urls, $new_urls);
$unique = array_keys(array_flip($merged));
sort($unique);
$new_urls = $unique;
}
file_put_contents($fn, implode(self::NEW_LINE, $new_urls));
}
这些静态函数可以这样调用:
UpdateStocks::storeSitemapData(UpdateStocks::collectAPIData));
事实上,为了提高效率,我可以区分新的唯一 url,然后将它们附加到现有文件中,但我喜欢按字母顺序排列数据的想法。