C函数将文件内容加载到字符串数组中
C function to load file contents into string array
我想做的是:
- 正确初始化一个二维字符数组(字符串数组),然后,
- 在另一个函数中,将文件的内容加载到该二维数组,在每个
'\n'
换行符处中断到二维数组中的下一行。
因为我们不能 return 从 C 函数中对数组进行双精度运算,我尝试使用 strncpy()
,逐行读取文件。我尝试了很多变体,但我一直在弄乱内存,所以我遇到了很多段错误和总线错误。
这是一个模型,展示了我想要做的事情是多么简单:
int main()
{
char *file_content[];
load_file_to_array(file_content);
}
void load_file_to_array(char *to_load[]) {
// something to update file_content
}
编辑 1:
我不会要求任何人为我编写代码。我对研究这个特定主题感到沮丧,所以我决定问问你们的想法。任何不同的方法或方法本身都会受到赞赏。
我有一个很好的预感,我的方法可能完全偏离了轨道。
我看了很多关于动态内存的帖子和文章。我发现最接近我想要完成的事情是:Changing an Array of Strings inside a function in C
编辑 2:
正如@pm100 在评论中解释的那样,我试过:
#include <stdio.h>
#include <stdlib.h>
#define MAX_SIZE 255
void load_array(char *, char *[]);
int main() {
// NULL terminate array
char *fn = "text.txt";
char *arr[MAX_SIZE] = { NULL };
// Also tried:
// char **arr = malloc(sizeof * char*MAX_SIZE)
// arr[0] = malloc(sizeof(char)*MAX_SIZE);
load_array(fn, arr);
}
void load_array(char *fn, char *fcontent[]) {
FILE * file = fopen(fn, "r");
char line[MAX_SIZE];
int i = 0;
// read file line by line with fgets()
while (fgets(line, sizeof(line), file))
{
// this part I'm not sure of. maybe setting with fgets makes more
// sense.
strcpy(fcontent[i], line);
i++;
// realloc()
fcontent = realloc(fcontent, sizeof (char*) * (i + 1));
fcontent[i] = malloc(sizeof(char) * MAX_SIZE);
// null terminate the last line
*fcontent[i] = 0;
}
fclose(file);
}
我在 运行 程序后遇到段错误。
除了说这不是一个有效的解决方案之外,这里是你的稍微修改的:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_SIZE 1024
char** load_array(const char *fn) {
char** fcontent = (char**)malloc(0);
FILE * file = fopen(fn, "r");
char line[MAX_SIZE];
int i = 0;
// read file line by line with fgets()
while (fgets(line, sizeof(line), file))
{
fcontent = (char**)realloc(fcontent, (i + 1) * sizeof(char*));
fcontent[i] = (char*)malloc(strlen(line) + 1);
// this part I'm not sure of. maybe setting with fgets makes more sense.
strcpy(fcontent[i], line);
i++;
}
fcontent = (char**)realloc(fcontent, (i + 1) * sizeof(char*));
fcontent[i] = NULL;
fclose(file);
return fcontent;
}
int main()
{
char **lines = load_array("test.txt");
int i = 0;
while (lines[i])
printf("%s", lines[i++]);
i = 0;
while (lines[i])
free(lines[i++]);
free(lines);
return 0;
}
Here is a model demonstrating how simple what I'm trying to do is:
int main()
{
char *file_content[];
load_file_to_array(file_content);
}
void load_file_to_array(char *to_load[]){
// something to update file_content
}
基本用法
好吧,这里有一种方法可以让你用 100% 静态内存分配 做你想做的事,出于很多原因(下面列出了一些原因)我更喜欢这种方法。
示例用法,展示使用起来有多么简单:
int main()
{
// Make this huge struct `static` so that the buffers it contains will be
// `static` so that they are neither on the stack **nor** the heap, thereby
// preventing stack overflow in the event you make them larger than the stack
// size, which is ~7.4 MB for Linux. See my answer here:
//
static file_t file;
const char FILENAME[] = "path/to/some/file.txt";
// OR:
// const char FILENAME[] = __FILE__; // read this source code file itself
file_store_path(&file, FILENAME);
printf("Loading file at path \"%s\".\n", file.path);
// open the file and copy its entire contents into the `file` object
file_load(&file);
printf("Printing the entire file:\n");
file_print_all(&file);
printf("\n");
printf("Printing just this 1 line number:\n");
file_print_line(&file, 256);
printf("\n");
printf("Printing 4 lines starting at this line number:\n");
file_print_lines(&file, 256, 4);
printf("\n");
return 0;
}
它是如何工作的?
我不使用任何二维数组或任何东西。相反,我创建了一个静态的一维字符数组 char file_str[MAX_NUM_CHARS];
,它包含整个文件中的所有字符,并且我创建了一个 char *
的静态一维数组(指向 char
)作为 char* line_array[MAX_NUM_LINES];
,其中包含指向整个文件中每一行的第一个字符的指针,其中这些指针指向的字符在 file_str
数组中 。 然后,我一次将文件一个字符读入 file_str
数组。每次我看到 \n
换行符时,我就知道下一个字符是新行的开始,所以我将指向该字符的指针添加到 line_array
数组中。
为了重新打印文件,我遍历了 line_array
数组,打印每一行的所有字符,一次一行。
可以选择完全摆脱 line_array
数组,只使用 file_str
字符数组。您仍然可以选择一行来打印并打印它。然而,这种方法的缺点是找到要打印的行的开头将花费 O(n) 时间,因为您必须从第一个 char
开始在文件中一直读到感兴趣的行,通过计算换行符的数量来计算行数 \n
您看到的字符。另一方面,我的方法需要 O(1) 时间并通过对 line_array
数组的简单索引直接索引到感兴趣行的前面。
上面的数组存储在一个file_t
结构体中,定义如下:
#define MAX_NUM_LINES 10000UL
#define MAX_NUM_CHARS (MAX_NUM_LINES*200UL) // 2 MB
#define MAX_PATH_LEN (1000)
typedef struct file_s
{
/// The path to the file to open.
char path[MAX_PATH_LEN];
/// All characters read from the file.
char file_str[MAX_NUM_CHARS]; // array of `char`
/// The total number of chars read into the `file_str` string, including
// null terminator.
size_t num_chars;
/// A ptr to each line in the file.
char* line_array[MAX_NUM_LINES]; // array of `char*` (ptr to char)
/// The total number of lines in the file, and hence in the `line_array`
// above.
size_t num_lines;
} file_t;
以下是我在程序开始时打印出的关于 file_t
的一些统计数据:
The size of the file_t
struct is 2081016 bytes (2.081016 MB; 1.984612 MiB).
Max file size that can be read into this struct is 2000000 bytes or 10000 lines, whichever limit is hit first.
file
对象是静态的,所以如果你让你的 file_t
结构非常大以处理大量文件(如果你愿意,甚至可以是几千兆字节),你就没有堆栈溢出,因为 Linux 的线程堆栈大小限制为 ~7.4 MB。在这里查看我的回答:C/C++ maximum stack size of program.
我使用静态内存分配,而不是动态的,原因如下:
- run-time速度更快,因为大量动态内存分配会增加大量开销。
- 使用静态内存分配是确定性的,使得这种实现方式有利于
safety-critical、memory-constrained、real-time、确定性、嵌入式设备和程序。
- 它可以通过一个简单的索引在 O(1) 时间内访问文件中任何行的第一个字符
一个数组。
- 如果需要,它可以通过动态内存分配进行扩展(更多内容见下文)。
文件被打开并通过file_load()
函数加载到file
对象中。该函数看起来像这样,包括强大的错误处理:
/// Read all characters from a file on your system at the path specified in the
/// file object and copy this file data **into** the passed-in `file` object.
void file_load(file_t* file)
{
if (file == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
FILE* fp = fopen(file->path, "r");
if (fp == NULL)
{
printf("ERROR in function %s(): Failed to open file (%s).\n",
__func__, strerror(errno));
return;
}
// See: https://en.cppreference.com/w/c/io/fgetc
int c; // note: int, not char, required to handle EOF
size_t i_write_char = 0;
size_t i_write_line = 0;
bool start_of_line = true;
const size_t I_WRITE_CHAR_MAX = ARRAY_LEN(file->file_str) - 1;
const size_t I_WRITE_LINE_MAX = ARRAY_LEN(file->line_array) - 1;
while ((c = fgetc(fp)) != EOF) // standard C I/O file reading loop
{
// 1. Write the char
if (i_write_char > I_WRITE_CHAR_MAX)
{
printf("ERROR in function %s(): file is full (i_write_char = "
"%zu, but I_WRITE_CHAR_MAX is only %zu).\n",
__func__, i_write_char, I_WRITE_CHAR_MAX);
break;
}
file->file_str[i_write_char] = c;
// 2. Write the ptr to the line
if (start_of_line)
{
start_of_line = false;
if (i_write_line > I_WRITE_LINE_MAX)
{
printf("ERROR in function %s(): file is full (i_write_line = "
"%zu, but I_WRITE_LINE_MAX is only %zu).\n",
__func__, i_write_line, I_WRITE_LINE_MAX);
break;
}
file->line_array[i_write_line] = &(file->file_str[i_write_char]);
i_write_line++;
}
// end of line
if (c == '\n')
{
// '\n' indicates the end of a line, so prepare to start a new line
// on the next iteration
start_of_line = true;
}
i_write_char++;
}
file->num_chars = i_write_char;
file->num_lines = i_write_line;
fclose(fp);
}
通过动态内存分配扩展它
在很多情况下,上面的静态内存分配就足够了。最好尽可能避免动态内存分配。但是,如果您打算使用动态内存分配,请 而不是 分配大量的小块!最好分配一两个大块。速度测试以证明这一点——总是验证人们所说的——包括我自己!不过,作为这方面的证据,也请阅读这篇文章并查看情节:https://github.com/facontidavide/CPP_Optimizations_Diary/blob/master/docs/reserve.md。 “NoReserve”有很多小的动态内存分配,现有数据的后续内存副本到这些新的内存位置,“WithReserve”有一个大的动态内存分配up-front 反而:
有时动态内存分配 是 谨慎的,但是。我们怎样才能做到最好?假设您需要打开 1000 个文件并同时打开它们,文件大小从几字节到几 GB 不等。在那种情况下,对所有 1000 个文件使用静态内存分配不仅不好,而且 几乎不可能 (至少,尴尬和 space-inefficient)。您应该做的是使 file_t
足够大以容纳几 GB 大小的最大文件,然后 静态分配它的一个实例用作缓冲区 ,然后执行此操作:打开每个文件(一次一个)并将其加载到您拥有的 单个 file
对象后,动态地 malloc()
确切 该文件所需的内存量,以及 strncpy()
或 memcpy()
从初始静态 file
对象到动态对象的所有数据文件需要的确切内存量,没有浪费。这样,静态 file
对象只是作为占位符或缓冲区,用于读取文件,同时计算字节数和行数,因此您可以动态分配足够的内存对于那些字节和行。
完整代码
这是完整的代码。这个程序打开源cde 文件本身并将其全部打印出来,在每行的开头打印行号只是为了好玩。
read_file_into_c_string_and_array_of_lines.c, from my eRCaGuy_hello_world 回购:
#include <errno.h>
#include <stdbool.h> // For `true` (`1`) and `false` (`0`) macros in C
#include <stdint.h> // For `uint8_t`, `int8_t`, etc.
#include <stdio.h> // For `printf()`
#include <string.h> // for `strerror()`
// Get the number of elements in any C array
// - Usage example: [my own answer]:
// https://arduino.stackexchange.com/questions/80236/initializing-array-of-structs/80289#80289
#define ARRAY_LEN(array) (sizeof(array) / sizeof(array[0]))
/// Max and min gcc/clang **statement expressions** (safer than macros) for C. By Gabriel Staples.
/// See:
#define MAX(a, b) \
({ \
__typeof__(a) _a = (a); \
__typeof__(b) _b = (b); \
_a > _b ? _a : _b; \
})
#define MIN(a, b) \
({ \
__typeof__(a) _a = (a); \
__typeof__(b) _b = (b); \
_a < _b ? _a : _b; \
})
/// Bytes per megabyte
#define BYTES_PER_MB (1000*1000)
/// Bytes per mebibyte
#define BYTES_PER_MIB (1024*1024)
/// Convert bytes to megabytes
#define BYTES_TO_MB(bytes) (((double)(bytes))/BYTES_PER_MB)
/// Convert bytes to mebibytes
#define BYTES_TO_MIB(bytes) (((double)(bytes))/BYTES_PER_MIB)
#define MAX_NUM_LINES 10000UL
#define MAX_NUM_CHARS (MAX_NUM_LINES*200UL) // 2 MB
#define MAX_PATH_LEN (1000)
typedef struct file_s
{
/// The path to the file to open.
char path[MAX_PATH_LEN];
/// All characters read from the file.
char file_str[MAX_NUM_CHARS]; // array of `char`
/// The total number of chars read into the `file_str` string, including
// null terminator.
size_t num_chars;
/// A ptr to each line in the file.
char* line_array[MAX_NUM_LINES]; // array of `char*` (ptr to char)
/// The total number of lines in the file, and hence in the `line_array`
// above.
size_t num_lines;
} file_t;
/// Copy the file path pointed to by `path` into the `file_t` object.
void file_store_path(file_t* file, const char *path)
{
if (file == NULL || path == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
strncpy(file->path, path, sizeof(file->path));
}
/// Print the entire line at 1-based line number `line_number` in file `file`, including the
/// '\n' at the end of the line.
void file_print_line(const file_t* file, size_t line_number)
{
if (file == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
// Ensure we don't read outside the `file->line_array`
if (line_number > file->num_lines)
{
printf("ERROR in function %s(): line_number (%zu) is too large (file->num_lines = %zu).\n",
__func__, line_number, file->num_lines);
return;
}
size_t i_line = line_number - 1;
char* line = file->line_array[i_line];
if (line == NULL)
{
printf("ERROR in function %s(): line_array contains NULL ptr for line_number = %zu at "
"index = %zu.\n", __func__, line_number, i_line);
return;
}
// print all chars in the line
size_t i_char = 0;
while (true)
{
if (i_char > file->num_chars - 1)
{
// outside valid data
break;
}
char c = line[i_char];
if (c == '\n')
{
printf("%c", c);
break;
}
else if (c == '[=14=]')
{
// null terminator
break;
}
printf("%c", c);
i_char++;
}
}
/// Print `num_lines` number of lines in a file, starting at 1-based line number `first_line`,
/// and including the '\n' at the end of each line.
/// At the start of each line, the line number is also printed, followed by a colon (:).
void file_print_lines(const file_t* file, size_t first_line, size_t num_lines)
{
if (file == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
if (num_lines == 0 || file->num_lines == 0)
{
printf("ERROR in function %s(): num_lines passed in == %zu; file->num_lines = %zu.\n",
__func__, num_lines, file->num_lines);
return;
}
// Ensure we don't read outside the valid data
size_t last_line = MIN(first_line + num_lines - 1, file->num_lines);
// printf("last_line = %zu\n", last_line); // DEBUGGING
for (size_t line_number = first_line; line_number <= last_line; line_number++)
{
printf("%4lu: ", line_number);
file_print_line(file, line_number);
}
}
/// Print an entire file.
void file_print_all(const file_t* file)
{
printf("num_chars to print = %zu\n", file->num_chars);
printf("num_lines to print = %zu\n", file->num_lines);
printf("========== FILE START ==========\n");
file_print_lines(file, 1, file->num_lines);
printf("=========== FILE END ===========\n");
}
/// Read all characters from a file on your system at the path specified in the
// file object.
void file_load(file_t* file)
{
if (file == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
FILE* fp = fopen(file->path, "r");
if (fp == NULL)
{
printf("ERROR in function %s(): Failed to open file (%s).\n",
__func__, strerror(errno));
return;
}
// See: https://en.cppreference.com/w/c/io/fgetc
int c; // note: int, not char, required to handle EOF
size_t i_write_char = 0;
size_t i_write_line = 0;
bool start_of_line = true;
const size_t I_WRITE_CHAR_MAX = ARRAY_LEN(file->file_str) - 1;
const size_t I_WRITE_LINE_MAX = ARRAY_LEN(file->line_array) - 1;
while ((c = fgetc(fp)) != EOF) // standard C I/O file reading loop
{
// 1. Write the char
if (i_write_char > I_WRITE_CHAR_MAX)
{
printf("ERROR in function %s(): file is full (i_write_char = "
"%zu, but I_WRITE_CHAR_MAX is only %zu).\n",
__func__, i_write_char, I_WRITE_CHAR_MAX);
break;
}
file->file_str[i_write_char] = c;
// 2. Write the ptr to the line
if (start_of_line)
{
start_of_line = false;
if (i_write_line > I_WRITE_LINE_MAX)
{
printf("ERROR in function %s(): file is full (i_write_line = "
"%zu, but I_WRITE_LINE_MAX is only %zu).\n",
__func__, i_write_line, I_WRITE_LINE_MAX);
break;
}
file->line_array[i_write_line] = &(file->file_str[i_write_char]);
i_write_line++;
}
// end of line
if (c == '\n')
{
// '\n' indicates the end of a line, so prepare to start a new line
// on the next iteration
start_of_line = true;
}
i_write_char++;
}
file->num_chars = i_write_char;
file->num_lines = i_write_line;
fclose(fp);
}
// Make this huge struct `static` so that the buffers it contains will be `static` so that they are
// neither on the stack **nor** the heap, thereby preventing stack overflow in the event you make
// them larger than the stack size, which is ~7.4 MB for Linux, and are generally even smaller for other systems. See my answer here:
//
static file_t file;
// int main(int argc, char *argv[]) // alternative prototype
int main()
{
printf("The size of the `file_t` struct is %zu bytes (%.6f MB; %.6f MiB).\n"
"Max file size that can be read into this struct is %zu bytes or %lu lines, whichever "
"limit is hit first.\n\n",
sizeof(file_t), BYTES_TO_MB(sizeof(file_t)), BYTES_TO_MIB(sizeof(file_t)),
sizeof(file.file_str), ARRAY_LEN(file.line_array));
const char FILENAME[] = __FILE__;
file_store_path(&file, FILENAME);
printf("Loading file at path \"%s\".\n", file.path);
// open the file and copy its entire contents into the `file` object
file_load(&file);
printf("Printing the entire file:\n");
file_print_all(&file);
printf("\n");
printf("Printing just this 1 line number:\n");
file_print_line(&file, 256);
printf("\n");
printf("Printing 4 lines starting at this line number:\n");
file_print_lines(&file, 256, 4);
printf("\n");
// FOR TESTING: intentionally cause some errors by trying to print some lines for an unpopulated
// file object. Example errors:
// 243: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 243 at index = 242.
// 244: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 244 at index = 243.
// 245: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 245 at index = 244.
// 246: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 246 at index = 245.
// Note: for kicks (since I didn't realize this was possible), I'm also using the variable name
// `$` for this `file_t` object.
printf("Causing some intentional errors here:\n");
file_t $;
file_print_lines(&$, 243, 4);
return 0;
}
构建和 运行 命令:
- 在 C 中:
mkdir -p bin && gcc -Wall -Wextra -Werror -O3 -std=c17 \
read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
- 在 C++ 中
mkdir -p bin && g++ -Wall -Wextra -Werror -O3 -std=c++17 \
read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
示例运行 cmd 和输出(中间的大部分行已删除,因为大部分输出只是源代码本身的打印输出):
eRCaGuy_hello_world/c$ gcc -Wall -Wextra -Werror -O3 -std=c17 read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
The size of the `file_t` struct is 2081016 bytes (2.081016 MB; 1.984612 MiB).
Max file size that can be read into this struct is 2000000 bytes or 10000 lines, whichever limit is hit first.
Loading file at path "read_file_into_c_string_and_array_of_lines.c".
Printing the entire file:
num_chars to print = 15603
num_lines to print = 425
========== FILE START ==========
1: /*
2: This file is part of eRCaGuy_hello_world: https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world
3:
4: GS
5: 2 Mar. 2022
6:
7: Read a file in C into a C-string (array of chars), while also placing pointers to the start of each
8: line into another array of `char *`. This way you have all the data plus the
9: individually-addressable lines. Use static memory allocation, not dynamic, for these reasons:
10: 1. It's a good demo.
11: 1. It's faster at run-time since lots of dynamic memory allocation can add substantial overhead.
12: 1. It's deterministic to use static memory allocation, making this implementation style good for
13: safety-critical, memory-constrained, real-time, deterministic, embedded devices and programs.
14: 1. It can access the first character of any line in the file in O(1) time via a simple index into
15: an array.
16: 1. It's extensible via dynamic memory allocation if needed.
17:
18: STATUS: works!
19:
20: To compile and run (assuming you've already `cd`ed into this dir):
21: 1. In C:
22: ```bash
23: gcc -Wall -Wextra -Werror -O3 -std=c17 read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
24: ```
25: 2. In C++
26: ```bash
27: g++ -Wall -Wextra -Werror -O3 -std=c++17 read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
28: ```
29:
.
.
.
406: // 300:
407: // 301: eRCaGuy_hello_world/c$ g++ -Wall -Wextra -Werror -O3 -std=c++17 read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
408: // 302:
409: // 303:
410: // 304: */
411: // =========== FILE END ===========
412: //
413: // Printing just one line now:
414: // 255:
415: //
416: // Causing some intentional errors here:
417: // 243: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 243 at index = 242.
418: // 244: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 244 at index = 243.
419: // 245: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 245 at index = 244.
420: // 246: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 246 at index = 245.
421: //
422: //
423: // OR, in C++:
424: //
425: // [SAME AS THE C OUTPUT]
=========== FILE END ===========
Printing just this 1 line number:
file->line_array[i_write_line] = &(file->file_str[i_write_char]);
Printing 4 lines starting at this line number:
256: file->line_array[i_write_line] = &(file->file_str[i_write_char]);
257: i_write_line++;
258: }
259:
Causing some intentional errors here:
243: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 243 at index = 242.
244: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 244 at index = 243.
245: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 245 at index = 244.
246: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 246 at index = 245.
参考资料
(无论如何都不是完整列表)
- read_file_into_c_string_and_array_of_lines.c, from my eRCaGuy_hello_world 回购
- https://en.cppreference.com/w/c/io/fopen
- https://en.cppreference.com/w/c/string/byte/strerror - 显示了一个很好的用法示例
printf("File opening error: %s\n", strerror(errno));
if fopen()
打开文件失败。
- https://en.cppreference.com/w/c/io/fgetc
我想做的是:
- 正确初始化一个二维字符数组(字符串数组),然后,
- 在另一个函数中,将文件的内容加载到该二维数组,在每个
'\n'
换行符处中断到二维数组中的下一行。
因为我们不能 return 从 C 函数中对数组进行双精度运算,我尝试使用 strncpy()
,逐行读取文件。我尝试了很多变体,但我一直在弄乱内存,所以我遇到了很多段错误和总线错误。
这是一个模型,展示了我想要做的事情是多么简单:
int main()
{
char *file_content[];
load_file_to_array(file_content);
}
void load_file_to_array(char *to_load[]) {
// something to update file_content
}
编辑 1:
我不会要求任何人为我编写代码。我对研究这个特定主题感到沮丧,所以我决定问问你们的想法。任何不同的方法或方法本身都会受到赞赏。
我有一个很好的预感,我的方法可能完全偏离了轨道。
我看了很多关于动态内存的帖子和文章。我发现最接近我想要完成的事情是:Changing an Array of Strings inside a function in C
编辑 2:
正如@pm100 在评论中解释的那样,我试过:
#include <stdio.h>
#include <stdlib.h>
#define MAX_SIZE 255
void load_array(char *, char *[]);
int main() {
// NULL terminate array
char *fn = "text.txt";
char *arr[MAX_SIZE] = { NULL };
// Also tried:
// char **arr = malloc(sizeof * char*MAX_SIZE)
// arr[0] = malloc(sizeof(char)*MAX_SIZE);
load_array(fn, arr);
}
void load_array(char *fn, char *fcontent[]) {
FILE * file = fopen(fn, "r");
char line[MAX_SIZE];
int i = 0;
// read file line by line with fgets()
while (fgets(line, sizeof(line), file))
{
// this part I'm not sure of. maybe setting with fgets makes more
// sense.
strcpy(fcontent[i], line);
i++;
// realloc()
fcontent = realloc(fcontent, sizeof (char*) * (i + 1));
fcontent[i] = malloc(sizeof(char) * MAX_SIZE);
// null terminate the last line
*fcontent[i] = 0;
}
fclose(file);
}
我在 运行 程序后遇到段错误。
除了说这不是一个有效的解决方案之外,这里是你的稍微修改的:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_SIZE 1024
char** load_array(const char *fn) {
char** fcontent = (char**)malloc(0);
FILE * file = fopen(fn, "r");
char line[MAX_SIZE];
int i = 0;
// read file line by line with fgets()
while (fgets(line, sizeof(line), file))
{
fcontent = (char**)realloc(fcontent, (i + 1) * sizeof(char*));
fcontent[i] = (char*)malloc(strlen(line) + 1);
// this part I'm not sure of. maybe setting with fgets makes more sense.
strcpy(fcontent[i], line);
i++;
}
fcontent = (char**)realloc(fcontent, (i + 1) * sizeof(char*));
fcontent[i] = NULL;
fclose(file);
return fcontent;
}
int main()
{
char **lines = load_array("test.txt");
int i = 0;
while (lines[i])
printf("%s", lines[i++]);
i = 0;
while (lines[i])
free(lines[i++]);
free(lines);
return 0;
}
Here is a model demonstrating how simple what I'm trying to do is:
int main() { char *file_content[]; load_file_to_array(file_content); } void load_file_to_array(char *to_load[]){ // something to update file_content }
基本用法
好吧,这里有一种方法可以让你用 100% 静态内存分配 做你想做的事,出于很多原因(下面列出了一些原因)我更喜欢这种方法。
示例用法,展示使用起来有多么简单:
int main()
{
// Make this huge struct `static` so that the buffers it contains will be
// `static` so that they are neither on the stack **nor** the heap, thereby
// preventing stack overflow in the event you make them larger than the stack
// size, which is ~7.4 MB for Linux. See my answer here:
//
static file_t file;
const char FILENAME[] = "path/to/some/file.txt";
// OR:
// const char FILENAME[] = __FILE__; // read this source code file itself
file_store_path(&file, FILENAME);
printf("Loading file at path \"%s\".\n", file.path);
// open the file and copy its entire contents into the `file` object
file_load(&file);
printf("Printing the entire file:\n");
file_print_all(&file);
printf("\n");
printf("Printing just this 1 line number:\n");
file_print_line(&file, 256);
printf("\n");
printf("Printing 4 lines starting at this line number:\n");
file_print_lines(&file, 256, 4);
printf("\n");
return 0;
}
它是如何工作的?
我不使用任何二维数组或任何东西。相反,我创建了一个静态的一维字符数组 char file_str[MAX_NUM_CHARS];
,它包含整个文件中的所有字符,并且我创建了一个 char *
的静态一维数组(指向 char
)作为 char* line_array[MAX_NUM_LINES];
,其中包含指向整个文件中每一行的第一个字符的指针,其中这些指针指向的字符在 file_str
数组中 。 然后,我一次将文件一个字符读入 file_str
数组。每次我看到 \n
换行符时,我就知道下一个字符是新行的开始,所以我将指向该字符的指针添加到 line_array
数组中。
为了重新打印文件,我遍历了 line_array
数组,打印每一行的所有字符,一次一行。
可以选择完全摆脱 line_array
数组,只使用 file_str
字符数组。您仍然可以选择一行来打印并打印它。然而,这种方法的缺点是找到要打印的行的开头将花费 O(n) 时间,因为您必须从第一个 char
开始在文件中一直读到感兴趣的行,通过计算换行符的数量来计算行数 \n
您看到的字符。另一方面,我的方法需要 O(1) 时间并通过对 line_array
数组的简单索引直接索引到感兴趣行的前面。
上面的数组存储在一个file_t
结构体中,定义如下:
#define MAX_NUM_LINES 10000UL
#define MAX_NUM_CHARS (MAX_NUM_LINES*200UL) // 2 MB
#define MAX_PATH_LEN (1000)
typedef struct file_s
{
/// The path to the file to open.
char path[MAX_PATH_LEN];
/// All characters read from the file.
char file_str[MAX_NUM_CHARS]; // array of `char`
/// The total number of chars read into the `file_str` string, including
// null terminator.
size_t num_chars;
/// A ptr to each line in the file.
char* line_array[MAX_NUM_LINES]; // array of `char*` (ptr to char)
/// The total number of lines in the file, and hence in the `line_array`
// above.
size_t num_lines;
} file_t;
以下是我在程序开始时打印出的关于 file_t
的一些统计数据:
The size of the
file_t
struct is 2081016 bytes (2.081016 MB; 1.984612 MiB).
Max file size that can be read into this struct is 2000000 bytes or 10000 lines, whichever limit is hit first.
file
对象是静态的,所以如果你让你的 file_t
结构非常大以处理大量文件(如果你愿意,甚至可以是几千兆字节),你就没有堆栈溢出,因为 Linux 的线程堆栈大小限制为 ~7.4 MB。在这里查看我的回答:C/C++ maximum stack size of program.
我使用静态内存分配,而不是动态的,原因如下:
- run-time速度更快,因为大量动态内存分配会增加大量开销。
- 使用静态内存分配是确定性的,使得这种实现方式有利于 safety-critical、memory-constrained、real-time、确定性、嵌入式设备和程序。
- 它可以通过一个简单的索引在 O(1) 时间内访问文件中任何行的第一个字符 一个数组。
- 如果需要,它可以通过动态内存分配进行扩展(更多内容见下文)。
文件被打开并通过file_load()
函数加载到file
对象中。该函数看起来像这样,包括强大的错误处理:
/// Read all characters from a file on your system at the path specified in the
/// file object and copy this file data **into** the passed-in `file` object.
void file_load(file_t* file)
{
if (file == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
FILE* fp = fopen(file->path, "r");
if (fp == NULL)
{
printf("ERROR in function %s(): Failed to open file (%s).\n",
__func__, strerror(errno));
return;
}
// See: https://en.cppreference.com/w/c/io/fgetc
int c; // note: int, not char, required to handle EOF
size_t i_write_char = 0;
size_t i_write_line = 0;
bool start_of_line = true;
const size_t I_WRITE_CHAR_MAX = ARRAY_LEN(file->file_str) - 1;
const size_t I_WRITE_LINE_MAX = ARRAY_LEN(file->line_array) - 1;
while ((c = fgetc(fp)) != EOF) // standard C I/O file reading loop
{
// 1. Write the char
if (i_write_char > I_WRITE_CHAR_MAX)
{
printf("ERROR in function %s(): file is full (i_write_char = "
"%zu, but I_WRITE_CHAR_MAX is only %zu).\n",
__func__, i_write_char, I_WRITE_CHAR_MAX);
break;
}
file->file_str[i_write_char] = c;
// 2. Write the ptr to the line
if (start_of_line)
{
start_of_line = false;
if (i_write_line > I_WRITE_LINE_MAX)
{
printf("ERROR in function %s(): file is full (i_write_line = "
"%zu, but I_WRITE_LINE_MAX is only %zu).\n",
__func__, i_write_line, I_WRITE_LINE_MAX);
break;
}
file->line_array[i_write_line] = &(file->file_str[i_write_char]);
i_write_line++;
}
// end of line
if (c == '\n')
{
// '\n' indicates the end of a line, so prepare to start a new line
// on the next iteration
start_of_line = true;
}
i_write_char++;
}
file->num_chars = i_write_char;
file->num_lines = i_write_line;
fclose(fp);
}
通过动态内存分配扩展它
在很多情况下,上面的静态内存分配就足够了。最好尽可能避免动态内存分配。但是,如果您打算使用动态内存分配,请 而不是 分配大量的小块!最好分配一两个大块。速度测试以证明这一点——总是验证人们所说的——包括我自己!不过,作为这方面的证据,也请阅读这篇文章并查看情节:https://github.com/facontidavide/CPP_Optimizations_Diary/blob/master/docs/reserve.md。 “NoReserve”有很多小的动态内存分配,现有数据的后续内存副本到这些新的内存位置,“WithReserve”有一个大的动态内存分配up-front 反而:
有时动态内存分配 是 谨慎的,但是。我们怎样才能做到最好?假设您需要打开 1000 个文件并同时打开它们,文件大小从几字节到几 GB 不等。在那种情况下,对所有 1000 个文件使用静态内存分配不仅不好,而且 几乎不可能 (至少,尴尬和 space-inefficient)。您应该做的是使 file_t
足够大以容纳几 GB 大小的最大文件,然后 静态分配它的一个实例用作缓冲区 ,然后执行此操作:打开每个文件(一次一个)并将其加载到您拥有的 单个 file
对象后,动态地 malloc()
确切 该文件所需的内存量,以及 strncpy()
或 memcpy()
从初始静态 file
对象到动态对象的所有数据文件需要的确切内存量,没有浪费。这样,静态 file
对象只是作为占位符或缓冲区,用于读取文件,同时计算字节数和行数,因此您可以动态分配足够的内存对于那些字节和行。
完整代码
这是完整的代码。这个程序打开源cde 文件本身并将其全部打印出来,在每行的开头打印行号只是为了好玩。
read_file_into_c_string_and_array_of_lines.c, from my eRCaGuy_hello_world 回购:
#include <errno.h>
#include <stdbool.h> // For `true` (`1`) and `false` (`0`) macros in C
#include <stdint.h> // For `uint8_t`, `int8_t`, etc.
#include <stdio.h> // For `printf()`
#include <string.h> // for `strerror()`
// Get the number of elements in any C array
// - Usage example: [my own answer]:
// https://arduino.stackexchange.com/questions/80236/initializing-array-of-structs/80289#80289
#define ARRAY_LEN(array) (sizeof(array) / sizeof(array[0]))
/// Max and min gcc/clang **statement expressions** (safer than macros) for C. By Gabriel Staples.
/// See:
#define MAX(a, b) \
({ \
__typeof__(a) _a = (a); \
__typeof__(b) _b = (b); \
_a > _b ? _a : _b; \
})
#define MIN(a, b) \
({ \
__typeof__(a) _a = (a); \
__typeof__(b) _b = (b); \
_a < _b ? _a : _b; \
})
/// Bytes per megabyte
#define BYTES_PER_MB (1000*1000)
/// Bytes per mebibyte
#define BYTES_PER_MIB (1024*1024)
/// Convert bytes to megabytes
#define BYTES_TO_MB(bytes) (((double)(bytes))/BYTES_PER_MB)
/// Convert bytes to mebibytes
#define BYTES_TO_MIB(bytes) (((double)(bytes))/BYTES_PER_MIB)
#define MAX_NUM_LINES 10000UL
#define MAX_NUM_CHARS (MAX_NUM_LINES*200UL) // 2 MB
#define MAX_PATH_LEN (1000)
typedef struct file_s
{
/// The path to the file to open.
char path[MAX_PATH_LEN];
/// All characters read from the file.
char file_str[MAX_NUM_CHARS]; // array of `char`
/// The total number of chars read into the `file_str` string, including
// null terminator.
size_t num_chars;
/// A ptr to each line in the file.
char* line_array[MAX_NUM_LINES]; // array of `char*` (ptr to char)
/// The total number of lines in the file, and hence in the `line_array`
// above.
size_t num_lines;
} file_t;
/// Copy the file path pointed to by `path` into the `file_t` object.
void file_store_path(file_t* file, const char *path)
{
if (file == NULL || path == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
strncpy(file->path, path, sizeof(file->path));
}
/// Print the entire line at 1-based line number `line_number` in file `file`, including the
/// '\n' at the end of the line.
void file_print_line(const file_t* file, size_t line_number)
{
if (file == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
// Ensure we don't read outside the `file->line_array`
if (line_number > file->num_lines)
{
printf("ERROR in function %s(): line_number (%zu) is too large (file->num_lines = %zu).\n",
__func__, line_number, file->num_lines);
return;
}
size_t i_line = line_number - 1;
char* line = file->line_array[i_line];
if (line == NULL)
{
printf("ERROR in function %s(): line_array contains NULL ptr for line_number = %zu at "
"index = %zu.\n", __func__, line_number, i_line);
return;
}
// print all chars in the line
size_t i_char = 0;
while (true)
{
if (i_char > file->num_chars - 1)
{
// outside valid data
break;
}
char c = line[i_char];
if (c == '\n')
{
printf("%c", c);
break;
}
else if (c == '[=14=]')
{
// null terminator
break;
}
printf("%c", c);
i_char++;
}
}
/// Print `num_lines` number of lines in a file, starting at 1-based line number `first_line`,
/// and including the '\n' at the end of each line.
/// At the start of each line, the line number is also printed, followed by a colon (:).
void file_print_lines(const file_t* file, size_t first_line, size_t num_lines)
{
if (file == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
if (num_lines == 0 || file->num_lines == 0)
{
printf("ERROR in function %s(): num_lines passed in == %zu; file->num_lines = %zu.\n",
__func__, num_lines, file->num_lines);
return;
}
// Ensure we don't read outside the valid data
size_t last_line = MIN(first_line + num_lines - 1, file->num_lines);
// printf("last_line = %zu\n", last_line); // DEBUGGING
for (size_t line_number = first_line; line_number <= last_line; line_number++)
{
printf("%4lu: ", line_number);
file_print_line(file, line_number);
}
}
/// Print an entire file.
void file_print_all(const file_t* file)
{
printf("num_chars to print = %zu\n", file->num_chars);
printf("num_lines to print = %zu\n", file->num_lines);
printf("========== FILE START ==========\n");
file_print_lines(file, 1, file->num_lines);
printf("=========== FILE END ===========\n");
}
/// Read all characters from a file on your system at the path specified in the
// file object.
void file_load(file_t* file)
{
if (file == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
FILE* fp = fopen(file->path, "r");
if (fp == NULL)
{
printf("ERROR in function %s(): Failed to open file (%s).\n",
__func__, strerror(errno));
return;
}
// See: https://en.cppreference.com/w/c/io/fgetc
int c; // note: int, not char, required to handle EOF
size_t i_write_char = 0;
size_t i_write_line = 0;
bool start_of_line = true;
const size_t I_WRITE_CHAR_MAX = ARRAY_LEN(file->file_str) - 1;
const size_t I_WRITE_LINE_MAX = ARRAY_LEN(file->line_array) - 1;
while ((c = fgetc(fp)) != EOF) // standard C I/O file reading loop
{
// 1. Write the char
if (i_write_char > I_WRITE_CHAR_MAX)
{
printf("ERROR in function %s(): file is full (i_write_char = "
"%zu, but I_WRITE_CHAR_MAX is only %zu).\n",
__func__, i_write_char, I_WRITE_CHAR_MAX);
break;
}
file->file_str[i_write_char] = c;
// 2. Write the ptr to the line
if (start_of_line)
{
start_of_line = false;
if (i_write_line > I_WRITE_LINE_MAX)
{
printf("ERROR in function %s(): file is full (i_write_line = "
"%zu, but I_WRITE_LINE_MAX is only %zu).\n",
__func__, i_write_line, I_WRITE_LINE_MAX);
break;
}
file->line_array[i_write_line] = &(file->file_str[i_write_char]);
i_write_line++;
}
// end of line
if (c == '\n')
{
// '\n' indicates the end of a line, so prepare to start a new line
// on the next iteration
start_of_line = true;
}
i_write_char++;
}
file->num_chars = i_write_char;
file->num_lines = i_write_line;
fclose(fp);
}
// Make this huge struct `static` so that the buffers it contains will be `static` so that they are
// neither on the stack **nor** the heap, thereby preventing stack overflow in the event you make
// them larger than the stack size, which is ~7.4 MB for Linux, and are generally even smaller for other systems. See my answer here:
//
static file_t file;
// int main(int argc, char *argv[]) // alternative prototype
int main()
{
printf("The size of the `file_t` struct is %zu bytes (%.6f MB; %.6f MiB).\n"
"Max file size that can be read into this struct is %zu bytes or %lu lines, whichever "
"limit is hit first.\n\n",
sizeof(file_t), BYTES_TO_MB(sizeof(file_t)), BYTES_TO_MIB(sizeof(file_t)),
sizeof(file.file_str), ARRAY_LEN(file.line_array));
const char FILENAME[] = __FILE__;
file_store_path(&file, FILENAME);
printf("Loading file at path \"%s\".\n", file.path);
// open the file and copy its entire contents into the `file` object
file_load(&file);
printf("Printing the entire file:\n");
file_print_all(&file);
printf("\n");
printf("Printing just this 1 line number:\n");
file_print_line(&file, 256);
printf("\n");
printf("Printing 4 lines starting at this line number:\n");
file_print_lines(&file, 256, 4);
printf("\n");
// FOR TESTING: intentionally cause some errors by trying to print some lines for an unpopulated
// file object. Example errors:
// 243: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 243 at index = 242.
// 244: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 244 at index = 243.
// 245: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 245 at index = 244.
// 246: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 246 at index = 245.
// Note: for kicks (since I didn't realize this was possible), I'm also using the variable name
// `$` for this `file_t` object.
printf("Causing some intentional errors here:\n");
file_t $;
file_print_lines(&$, 243, 4);
return 0;
}
构建和 运行 命令:
- 在 C 中:
mkdir -p bin && gcc -Wall -Wextra -Werror -O3 -std=c17 \ read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
- 在 C++ 中
mkdir -p bin && g++ -Wall -Wextra -Werror -O3 -std=c++17 \ read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
示例运行 cmd 和输出(中间的大部分行已删除,因为大部分输出只是源代码本身的打印输出):
eRCaGuy_hello_world/c$ gcc -Wall -Wextra -Werror -O3 -std=c17 read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
The size of the `file_t` struct is 2081016 bytes (2.081016 MB; 1.984612 MiB).
Max file size that can be read into this struct is 2000000 bytes or 10000 lines, whichever limit is hit first.
Loading file at path "read_file_into_c_string_and_array_of_lines.c".
Printing the entire file:
num_chars to print = 15603
num_lines to print = 425
========== FILE START ==========
1: /*
2: This file is part of eRCaGuy_hello_world: https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world
3:
4: GS
5: 2 Mar. 2022
6:
7: Read a file in C into a C-string (array of chars), while also placing pointers to the start of each
8: line into another array of `char *`. This way you have all the data plus the
9: individually-addressable lines. Use static memory allocation, not dynamic, for these reasons:
10: 1. It's a good demo.
11: 1. It's faster at run-time since lots of dynamic memory allocation can add substantial overhead.
12: 1. It's deterministic to use static memory allocation, making this implementation style good for
13: safety-critical, memory-constrained, real-time, deterministic, embedded devices and programs.
14: 1. It can access the first character of any line in the file in O(1) time via a simple index into
15: an array.
16: 1. It's extensible via dynamic memory allocation if needed.
17:
18: STATUS: works!
19:
20: To compile and run (assuming you've already `cd`ed into this dir):
21: 1. In C:
22: ```bash
23: gcc -Wall -Wextra -Werror -O3 -std=c17 read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
24: ```
25: 2. In C++
26: ```bash
27: g++ -Wall -Wextra -Werror -O3 -std=c++17 read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
28: ```
29:
.
.
.
406: // 300:
407: // 301: eRCaGuy_hello_world/c$ g++ -Wall -Wextra -Werror -O3 -std=c++17 read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
408: // 302:
409: // 303:
410: // 304: */
411: // =========== FILE END ===========
412: //
413: // Printing just one line now:
414: // 255:
415: //
416: // Causing some intentional errors here:
417: // 243: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 243 at index = 242.
418: // 244: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 244 at index = 243.
419: // 245: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 245 at index = 244.
420: // 246: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 246 at index = 245.
421: //
422: //
423: // OR, in C++:
424: //
425: // [SAME AS THE C OUTPUT]
=========== FILE END ===========
Printing just this 1 line number:
file->line_array[i_write_line] = &(file->file_str[i_write_char]);
Printing 4 lines starting at this line number:
256: file->line_array[i_write_line] = &(file->file_str[i_write_char]);
257: i_write_line++;
258: }
259:
Causing some intentional errors here:
243: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 243 at index = 242.
244: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 244 at index = 243.
245: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 245 at index = 244.
246: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 246 at index = 245.
参考资料
(无论如何都不是完整列表)
- read_file_into_c_string_and_array_of_lines.c, from my eRCaGuy_hello_world 回购
- https://en.cppreference.com/w/c/io/fopen
- https://en.cppreference.com/w/c/string/byte/strerror - 显示了一个很好的用法示例
printf("File opening error: %s\n", strerror(errno));
iffopen()
打开文件失败。 - https://en.cppreference.com/w/c/io/fgetc