写入文件系统数据库中的现有文件
Writing on existing files in a filesystem database
我有一个函数,每隔几分钟写入 ~120Kb-150Kb HTML 和 ~8000 个 .md
具有固定名称的文件的元数据:
a-agilent-technologies-healthcare-nyse-us-39d4
aa-alcoa-basic-materials-nyse-us-159a
aaau-perth-mint-physical-gold--nyse-us-8ed9
aaba-altaba-financial-services-nasdaq-us-26f5
aac-healthcare-nyse-us-e92a
aadr-advisorshares-dorsey-wright-adr--nyse-us-d842
aal-airlines-industrials-nasdaq-us-29eb
- 如果文件不存在,它generates/writes相当快。
- 但是,如果文件存在,它执行同样的操作会慢得多,因为现有文件包含约 150KB 的数据。
如何解决这个问题?
是否在同一目录中生成一个具有新名称的新文件,并在 for
循环中取消链接旧文件?
或者我生成一个新文件夹并写入所有文件然后我取消链接以前的目录?这种方法的问题是有时90%的文件都被重写了,而有些文件保持原样。
代码
此函数在 for
循环中被调用,您可以在 link
中看到它
public static function writeFinalStringOnDatabase($equity_symbol, $md_file_content, $no_extension_filename)
{
/**
*@var is the MD file content with meta and entire HTML
*/
$md_file_content = $md_file_content . ConfigConstants::NEW_LINE . ConfigConstants::NEW_LINE;
$dir = __DIR__ . ConfigConstants::DIR_FRONT_SYMBOLS_MD_FILES; // symbols front directory
$new_filename = EQ::generateFileNameFromLeadingURL($no_extension_filename, $dir);
if (file_exists($new_filename)) {
if (is_writable($new_filename)) {
file_put_contents($new_filename, $md_file_content);
if (EQ::isLocalServer()) {
echo $equity_symbol . " " . ConfigConstants::NEW_LINE;
}
} else {
if (EQ::isLocalServer()) {
echo $equity_symbol . " symbol MD file is not writable in " . __METHOD__ . " Maybe, check permissions!" . ConfigConstants::NEW_LINE;
}
}
} else {
$fh = fopen($new_filename, 'wb');
fwrite($fh, $md_file_content);
fclose($fh);
if (EQ::isLocalServer()) {
echo $equity_symbol . " front md file does not exit in " . __METHOD__ . " It's writing on the database now " . ConfigConstants::NEW_LINE;
}
}
}
我已经 PHP 多年没有编程了,但今天这个问题引起了我的兴趣。 :D
建议
How do I solve this problem?
Do I generate a new file with a new name in the same directory, and unlink the older file in the for loop?
只需再次使用 3 个朋友 fopen()
、fwrite()
和 fclose()
,因为 fwrite
也会覆盖现有文件的全部内容。
if (file_exists($new_filename)) {
if (is_writable($new_filename)) {
$fh = fopen($new_filename,'wb');
fwrite($fh, $md_file_content);
fclose($fh);
if (EQ::isLocalServer()) {
echo $equity_symbol . " " . ConfigConstants::NEW_LINE;
}
} else {
if (EQ::isLocalServer()) {
echo $equity_symbol . " symbol MD file is not writable in " . __METHOD__ . " Maybe, check permissions!" . ConfigConstants::NEW_LINE;
}
}
} else {
$fh = fopen($new_filename, 'wb');
fwrite($fh, $md_file_content);
fclose($fh);
if (EQ::isLocalServer()) {
echo $equity_symbol . " front md file does not exit in " . __METHOD__ . " It's writing on the database now " . ConfigConstants::NEW_LINE;
}
}
为了DRY原则:
// It's smart to put the logging and similar tasks in a separate function,
// after you end up writing the same thing over and over again.
public static function log($content)
{
if (EQ::isLocalServer()) {
echo $content;
}
}
public static function writeFinalStringOnDatabase($equity_symbol, $md_file_content, $no_extension_filename)
{
$md_file_content = $md_file_content . ConfigConstants::NEW_LINE . ConfigConstants::NEW_LINE;
$dir = __DIR__ . ConfigConstants::DIR_FRONT_SYMBOLS_MD_FILES; // symbols front directory
$new_filename = EQ::generateFileNameFromLeadingURL($no_extension_filename, $dir);
$file_already_exists = file_exists($new_filename);
if ($file_already_exists && !is_writable($new_filename)) {
EQ::log($equity_symbol . " symbol MD file is not writable in " . __METHOD__ . " Maybe, check permissions!" . ConfigConstants::NEW_LINE);
} else {
$fh = fopen($new_filename,'wb'); // you should also check whether fopen succeeded
fwrite($fh, $md_file_content); // you should also check whether fwrite succeeded
if ($file_already_exists) {
EQ::log($equity_symbol . " " . ConfigConstants::NEW_LINE);
} else {
EQ::log($equity_symbol . " front md file does not exit in " . __METHOD__ . " It's writing on the database now " . ConfigConstants::NEW_LINE);
}
fclose($fh);
}
}
可能原因
tl;dr 由于使用 the Zend string API 导致开销很大。
官方PHP manual说:
file_put_contents()
is identical to calling fopen()
, fwrite()
and fclose()
successively to write data to a file.
但是,如果您查看 source code of PHP on GitHub,您会发现 "writing data" 部分在 file_put_contents()
和 fwrite()
中的处理方式略有不同。
在 fwrite
函数中直接访问原始输入数据 (= $md_file_content
) 以便将缓冲区数据写入流上下文:
ret = php_stream_write(stream, input, num_bytes);
另一方面,在file_put_contents
函数中使用了the Zend string API(我以前从未听说过)。
由于某种原因,这里输入数据和长度被封装了。
numbytes = php_stream_write(stream, Z_STRVAL_P(data), Z_STRLEN_P(data));
(Z_STR....
宏已定义 here,如果您有兴趣)。
所以,我怀疑 Zend 字符串 API 在使用 file_put_contents
.
时可能会导致开销
旁注
起初我以为每个 file_put_contents()
调用都会创建一个新的流上下文,因为与创建上下文相关的行也略有不同:
PHP_NAMED_FUNCTION(php_if_fopen)
(Reference):
context = php_stream_context_from_zval(zcontext, 0);
PHP_FUNCTION(file_put_contents)
(Reference):
context = php_stream_context_from_zval(zcontext, flags & PHP_FILE_NO_DEFAULT_CONTEXT);
然而,仔细检查后,php_stream_context_from_zval 调用是使用相同的参数有效地进行的,即第一个参数 zcontext
是 null
,并且由于您没有通过任何 flags
到 file_put_contents
,flags & PHP_FILE_NO_DEFAULT_CONTEXT
也变成 0
并作为第二个参数传递。
所以,我猜是 default stream context is re-used here on every call. Since it's apparently a stream of type persistent
it is not disposed after the php_stream_close() 电话。
因此,正如德国人所说,Fazit 显然没有额外的开销,或者在两种情况下都没有创建或重用流上下文的相同开销。
感谢您的阅读。
我有一个函数,每隔几分钟写入 ~120Kb-150Kb HTML 和 ~8000 个 .md
具有固定名称的文件的元数据:
a-agilent-technologies-healthcare-nyse-us-39d4
aa-alcoa-basic-materials-nyse-us-159a
aaau-perth-mint-physical-gold--nyse-us-8ed9
aaba-altaba-financial-services-nasdaq-us-26f5
aac-healthcare-nyse-us-e92a
aadr-advisorshares-dorsey-wright-adr--nyse-us-d842
aal-airlines-industrials-nasdaq-us-29eb
- 如果文件不存在,它generates/writes相当快。
- 但是,如果文件存在,它执行同样的操作会慢得多,因为现有文件包含约 150KB 的数据。
如何解决这个问题?
是否在同一目录中生成一个具有新名称的新文件,并在 for
循环中取消链接旧文件?
或者我生成一个新文件夹并写入所有文件然后我取消链接以前的目录?这种方法的问题是有时90%的文件都被重写了,而有些文件保持原样。
代码
此函数在 for
循环中被调用,您可以在 link
public static function writeFinalStringOnDatabase($equity_symbol, $md_file_content, $no_extension_filename)
{
/**
*@var is the MD file content with meta and entire HTML
*/
$md_file_content = $md_file_content . ConfigConstants::NEW_LINE . ConfigConstants::NEW_LINE;
$dir = __DIR__ . ConfigConstants::DIR_FRONT_SYMBOLS_MD_FILES; // symbols front directory
$new_filename = EQ::generateFileNameFromLeadingURL($no_extension_filename, $dir);
if (file_exists($new_filename)) {
if (is_writable($new_filename)) {
file_put_contents($new_filename, $md_file_content);
if (EQ::isLocalServer()) {
echo $equity_symbol . " " . ConfigConstants::NEW_LINE;
}
} else {
if (EQ::isLocalServer()) {
echo $equity_symbol . " symbol MD file is not writable in " . __METHOD__ . " Maybe, check permissions!" . ConfigConstants::NEW_LINE;
}
}
} else {
$fh = fopen($new_filename, 'wb');
fwrite($fh, $md_file_content);
fclose($fh);
if (EQ::isLocalServer()) {
echo $equity_symbol . " front md file does not exit in " . __METHOD__ . " It's writing on the database now " . ConfigConstants::NEW_LINE;
}
}
}
我已经 PHP 多年没有编程了,但今天这个问题引起了我的兴趣。 :D
建议
How do I solve this problem? Do I generate a new file with a new name in the same directory, and unlink the older file in the for loop?
只需再次使用 3 个朋友 fopen()
、fwrite()
和 fclose()
,因为 fwrite
也会覆盖现有文件的全部内容。
if (file_exists($new_filename)) {
if (is_writable($new_filename)) {
$fh = fopen($new_filename,'wb');
fwrite($fh, $md_file_content);
fclose($fh);
if (EQ::isLocalServer()) {
echo $equity_symbol . " " . ConfigConstants::NEW_LINE;
}
} else {
if (EQ::isLocalServer()) {
echo $equity_symbol . " symbol MD file is not writable in " . __METHOD__ . " Maybe, check permissions!" . ConfigConstants::NEW_LINE;
}
}
} else {
$fh = fopen($new_filename, 'wb');
fwrite($fh, $md_file_content);
fclose($fh);
if (EQ::isLocalServer()) {
echo $equity_symbol . " front md file does not exit in " . __METHOD__ . " It's writing on the database now " . ConfigConstants::NEW_LINE;
}
}
为了DRY原则:
// It's smart to put the logging and similar tasks in a separate function,
// after you end up writing the same thing over and over again.
public static function log($content)
{
if (EQ::isLocalServer()) {
echo $content;
}
}
public static function writeFinalStringOnDatabase($equity_symbol, $md_file_content, $no_extension_filename)
{
$md_file_content = $md_file_content . ConfigConstants::NEW_LINE . ConfigConstants::NEW_LINE;
$dir = __DIR__ . ConfigConstants::DIR_FRONT_SYMBOLS_MD_FILES; // symbols front directory
$new_filename = EQ::generateFileNameFromLeadingURL($no_extension_filename, $dir);
$file_already_exists = file_exists($new_filename);
if ($file_already_exists && !is_writable($new_filename)) {
EQ::log($equity_symbol . " symbol MD file is not writable in " . __METHOD__ . " Maybe, check permissions!" . ConfigConstants::NEW_LINE);
} else {
$fh = fopen($new_filename,'wb'); // you should also check whether fopen succeeded
fwrite($fh, $md_file_content); // you should also check whether fwrite succeeded
if ($file_already_exists) {
EQ::log($equity_symbol . " " . ConfigConstants::NEW_LINE);
} else {
EQ::log($equity_symbol . " front md file does not exit in " . __METHOD__ . " It's writing on the database now " . ConfigConstants::NEW_LINE);
}
fclose($fh);
}
}
可能原因
tl;dr 由于使用 the Zend string API 导致开销很大。
官方PHP manual说:
file_put_contents()
is identical to callingfopen()
,fwrite()
andfclose()
successively to write data to a file.
但是,如果您查看 source code of PHP on GitHub,您会发现 "writing data" 部分在 file_put_contents()
和 fwrite()
中的处理方式略有不同。
在
fwrite
函数中直接访问原始输入数据 (=$md_file_content
) 以便将缓冲区数据写入流上下文:
ret = php_stream_write(stream, input, num_bytes);
另一方面,在
file_put_contents
函数中使用了the Zend string API(我以前从未听说过)。 由于某种原因,这里输入数据和长度被封装了。
numbytes = php_stream_write(stream, Z_STRVAL_P(data), Z_STRLEN_P(data));
(Z_STR....
宏已定义 here,如果您有兴趣)。
所以,我怀疑 Zend 字符串 API 在使用 file_put_contents
.
旁注
起初我以为每个 file_put_contents()
调用都会创建一个新的流上下文,因为与创建上下文相关的行也略有不同:
PHP_NAMED_FUNCTION(php_if_fopen)
(Reference):
context = php_stream_context_from_zval(zcontext, 0);
PHP_FUNCTION(file_put_contents)
(Reference):
context = php_stream_context_from_zval(zcontext, flags & PHP_FILE_NO_DEFAULT_CONTEXT);
然而,仔细检查后,php_stream_context_from_zval 调用是使用相同的参数有效地进行的,即第一个参数 zcontext
是 null
,并且由于您没有通过任何 flags
到 file_put_contents
,flags & PHP_FILE_NO_DEFAULT_CONTEXT
也变成 0
并作为第二个参数传递。
所以,我猜是 default stream context is re-used here on every call. Since it's apparently a stream of type persistent
it is not disposed after the php_stream_close() 电话。
因此,正如德国人所说,Fazit 显然没有额外的开销,或者在两种情况下都没有创建或重用流上下文的相同开销。
感谢您的阅读。