是否有 sscanf 的变体,它带有指向输入字符串而不是缓冲区的指针?

Is there a variant of sscanf with pointer to input string instead of buffers?

sscanf 是这样工作的:

int main(const int argc, const char *argv[]) {
    char buf1[1024] = {0};
    char buf2[1024] = {0};
    char buf3[1024] = {0};
    char *str = "abc, 123; xyz";
    sscanf(str, "%[^,], %[^;]; %s", buf1, buf2, buf3);
    printf("'%s' '%s' '%s'", buf1, buf2, buf3); // Prints: "'abc' '123' 'xyz'"
    return 0;
}

我想知道是否有一个函数不需要将 str 的内容复制到缓冲区 (buf1, buf2, buf3) 中,也不需要分配任何新内存。相反,它只会将指针 (ptr1, ptr2, ptr3) 设置为指向 str 中的匹配部分,并且 null 终止匹配之后的任何内容。

int main(const int argc, const char *argv[]) {
    char *ptr1 = NULL;
    char *ptr2 = NULL;
    char *ptr3 = NULL;
    char *str = "abc, 123; xyz";
    //
    // str = "abc, 123; xyz[=11=]"
    //
    _sscanf(str, "%[^,], %[^;]; %s", &ptr1, &ptr2, &ptr3);
    //
    // str = "abc[=11=] 123[=11=] xyz[=11=]"
    //        ^     ^     ^
    //       ptr1  ptr2  ptr3
    //
    printf("'%s' '%s' '%s'", ptr1, ptr2, ptr3); // Prints: "'abc' '123' 'xyz'"

    return 0;
}

我知道可以使用 strtok_rregex.h 库等函数,但我认为这在可以修改输入字符串的情况下会更方便。

它不漂亮,但 %n 说明符可用于捕获标记开始和结束的索引。错误检查将确保索引和结束值不是 -1

#include <stdio.h>

int main(int argc, char *argv[]) {
    int index1 = -1;
    int end1 = -1;
    int index2 = -1;
    int end2 = -1;
    int index3 = -1;
    int end3 = -1;
    char *str = "abc, 123; xyz";
    sscanf(str, " %n%*[^,]%n, %n%*[^;]%n; %n%*s%n", &index1, &end1, &index2, &end2, &index3, &end3);
    printf("'%.*s' '%.*s' '%.*s'", end1, str + index1, end2 - index2, str + index2, end3 - index3, str + index3); // Prints: "'abc' '123' 'xyz'"
    return 0;
}

没有以 char * 指向原始字符串中的位置的指针结束的标准化变体。 POSIX 中有一个变体,它为每个字符串项分配内存并将数据复制到其中。

sscanf() 的功能与 fscanf() 和其他变体的功能相匹配,并且在非常广泛的范围内,适用于一个变体的适用于所有变体。但是,您正在寻找的内容无法应用于基于文件的变体,因此它不存在。


有一个 sscanf() 的变体,它为字符串分配内存。它是 sscanf() 的 POSIX 2008 变体和 m 修饰符。

[CX] ⌦ The %c, %s, and %[ conversion specifiers shall accept an optional assignment-allocation character 'm', which shall cause a memory buffer to be allocated to hold the string converted including a terminating null character. In such a case, the argument corresponding to the conversion specifier should be a reference to a pointer variable that will receive a pointer to the allocated buffer. The system shall allocate a buffer as if malloc() had been called. The application shall be responsible for freeing the memory after usage. If there is insufficient memory to allocate a buffer, the function shall set errno to [ENOMEM] and a conversion error shall result. If the function returns EOF, any memory successfully allocated for parameters using assignment-allocation character 'm' by this call shall be freed before the function returns. ⌫

[CX] 表示法将其标记为对 C 标准的扩展(因此 m 修饰符不是标准 C 的一部分并且并非在任何地方都受支持),⌦ 和 ⌫ 符号标记扩展的范围。

因此,如果您的实现支持它(例如,Linux 支持;macOS Sierra 不支持),sscanf() 的变体将为您分配正确大小的缓冲区,它需要 char ** 个参数。

Linux 上的手册页说:

An optional 'm' character. This is used with string conversions (%s, %c, %[), and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialized before the call). The caller should subsequently free(3) this buffer when it is no longer required.

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char data[] = "The hills are alive with the sound of music";
    char *w[9];

    if (sscanf(data, "%ms %ms %ms %ms %ms %ms %ms %ms %ms",
               &w[0], &w[1], &w[2], &w[3], &w[4], &w[5], &w[6], &w[7], &w[8]) != 9)
    {
        fprintf(stderr, "Oops!\n");
        return 1;
    }
    printf("Forwards: %s\n", data);
    printf("Reversed:");
    for (int i = 8; i >= 0; i--)
        printf(" %s", w[i]);
    putchar('\n');
    for (int i = 0; i < 9; i++)
        free(w[i]);
    return 0;
}

输出:

Forwards: The hills are alive with the sound of music
Reversed: music of sound the with alive are hills The