Fortran：使用 'read' 读取数据时检查数据格式

Question

我有特定的数据格式，比如 'n'（任意）行和“4”列。如果 'n' 为“10”，则示例数据将如下所示。

 1.01e+00 -2.01e-02 -3.01e-01    4.01e+02
 1.02e+00 -2.02e-02 -3.02e-01    4.02e+02
 1.03e+00 -2.03e-02 -3.03e-01    4.03e+02
 1.04e+00 -2.04e-02 -3.04e-01    4.04e+02
 1.05e+00 -2.05e-02 -3.05e-01    4.05e+02
 1.06e+00 -2.06e-02 -3.06e-01    4.06e+02
 1.07e+00 -2.07e-02 -3.07e-01    4.07e+02
 1.08e+00 -2.08e-02 -3.08e-01    4.07e+02
 1.09e+00 -2.09e-02 -3.09e-01    4.09e+02
 1.10e+00 -2.10e-02 -3.10e-01    4.10e+02

构建此输入的限制是

数据应该有“4”列。
数据以白色分隔 spaces.

我想实现一个功能来检查输入文件的每一行是否有“4”列，并根据 post 中的“M.S.B”答案构建我自己的Reading data file in Fortran with known number of lines but unknown number of entries in each line.

program readtest

  use :: iso_fortran_env

  implicit none

  character(len=512)     :: buffer

  integer                :: i, i_line, n, io, pos, pos_tmp, n_space
  integer,parameter      :: max_len = 512
  character(len=max_len) :: filename

  filename = 'data_wrong.dat'

  open(42, file=trim(filename), status='old', action='read')

  print *, '+++++++++++++++++++++++++++++++++++'
  print *, '+ Count lines                     +'
  print *, '+++++++++++++++++++++++++++++++++++'
  n       = 0
  i_line  = 0
  do
    pos     = 1
    pos_tmp = 1

    i_line = i_line+1
    read(42, '(a)', iostat=io) buffer

(*1)! Count blank spaces.
    n_space = 0
    do
      pos = index(buffer(pos+1:), " ") + pos
      if (pos /= 0) then
        if (pos > pos_tmp+1) then
          n_space = n_space+1
          pos_tmp = pos
        else
          pos_tmp = pos
        end if
      endif
      if (pos == max_len) then
        exit
      end if
    end do
    pos_tmp = pos

    if (io /= 0) then
      exit
    end if

    print *, '> line : ', i_line, ' n_space : ', n_space

    n = n+1
  end do

  print *, ' >> number of line = ', n

end program

如果我运行上述程序的输入文件有如下错误行，

1.01e+00 -2.01e-02 -3.01e-01  4.01e+02
1.02e+00 -2.02e-02 -3.02e-01  4.02e+02
1.03e+00 -2.03e-02 -3.03e-01  4.03e+02
1.04e+00 -2.04e-02 -3.04e-01  4.04e+02
1.05e+00 -2.05e-02 -3.05e-01  4.05e+02
1.06e+00 -2.06e-02 -3.06e-01  4.06e+02
1.07e+00 -2.07e-02 -3.07e-01  4.07e+02
1.0      2.0       3.0
1.08e+00 -2.08e-02 -3.08e-01  4.07e+02  1.00
1.09e+00 -2.09e-02 -3.09e-01  4.09e+02
1.10e+00 -2.10e-02 -3.10e-01  4.10e+02

输出是这样的，

 +++++++++++++++++++++++++++++++++++
 + Count lines                     +
 +++++++++++++++++++++++++++++++++++
 > line :            1  n_space :            4
 > line :            2  n_space :            4
 > line :            3  n_space :            4
 > line :            4  n_space :            4
 > line :            5  n_space :            4
 > line :            6  n_space :            4
 > line :            7  n_space :            4
 > line :            8  n_space :            3   (*2)
 > line :            9  n_space :            5   (*3)
 > line :           10  n_space :            4
 > line :           11  n_space :            4
  >> number of line =           11

而且你可以看到错误的行被正确检测到，正如我预期的那样（参见（*2）和（*3）），我可以编写 'if' 语句来生成一些错误消息。

但我认为我的代码 'extremely' 很难看，因为我必须在代码中执行类似 (*1) 的操作才能将连续的白色 space 计为一个 space。我认为会有更优雅的方法来确保每行只包含“4”列，比如说，

read(*,'4(X, A)') line

（没用）

如果 'buffer' 的长度超过 'max_len'（在本例中设置为“512”），我的程序也会失败。事实上，“512”对于大多数实际用途来说应该足够了，我也希望我的检查子程序以这种方式健壮。

所以，我想至少在这些方面改进我的子程序

希望它更优雅（而不是 (*1)）
更笼统（尤其是 'max_len'）

有没有人有构建这种输入检查子程序的经验？？

如有任何意见，我们将不胜感激。

感谢您阅读问题。

Answer 1

如果不了解确切的数据格式，我认为实现您想要的效果会相当困难（或者至少，我不知道该怎么做）。

在最一般的情况下，我认为你的space计数思路是最稳健和正确的。可以对其进行调整以避免您描述的最大字符串长度问题。

在下面的代码中，我将数据作为未格式化的流访问文件进行处理。基本上你阅读每个字符并记下 new_lines 和 spaces。正如您所做的那样，您使用 spaces 来计算列数（跳过双 spaces）和 new_line 字符来计算行数。但是，这里我们不是将整行作为字符串读取并通过它来查找 spaces；我们逐个读取一个字符，避免了固定字符串长度的问题，而且我们也以一个循环结束。希望对你有帮助。

编辑：现在在行尾开始处理白色 spaces 和空行

program readtest

  use :: iso_fortran_env

  implicit none

  character              :: old_char, new_char
  integer                :: line, io, cols
  logical                :: beg_line
  integer,parameter      :: max_len = 512
  character(len=max_len) :: filename

  filename = 'data_wrong.txt'

  ! Output format to be used later
  100 format (a, 3x, i0, a, 3x , i0)

  open(42, file=trim(filename), status='old', action='read', &
      form="unformatted", access="stream")

  ! set utils
  old_char = " "
  line = 0
  beg_line = .true.
  cols = 0

  ! Start scannig char by char
  do
     read(42, iostat = io) new_char

     ! Exit if EOF
     if (io < 0) then
         exit
     end if

     ! Deal with empty lines
     if  (beg_line .and. new_char==new_line(new_char)) then
         line = line + 1
         write(*, 100, advance="no") "Line number:", line, &
             "; Columns: Number", cols
         write(*,'(6x, a5)') "EMPTYLINE"

     ! Deal with beginning of line for white spaces
     elseif  (beg_line) then
         beg_line = .false.

     ! this indicates new columns
     elseif (new_char==" " .and. old_char/=" ") then
         cols = cols + 1

     ! End of line: time to print
     elseif (new_char==new_line(new_char)) then
         if (old_char/=" ") then
             cols = cols+1
         endif
         line = line + 1

         ! Printing out results
         write(*, 100, advance="no") "Line number:", line, &
             "; Columns: Number", cols
         if (cols == 4) then
             write(*,'(6x, a5)') "OK"
         else
             write(*,'(6x, a5)') "ERROR"
         end if

         ! Restart with a new line (reset counters)
         cols = 0
         beg_line = .true.
     end if
     old_char = new_char
  end do
end program

这是这个程序的输出：

Line number:   1; Columns number:   4         OK
Line number:   2; Columns number:   4         OK
Line number:   3; Columns number:   4         OK
Line number:   4; Columns number:   4         OK
Line number:   5; Columns number:   4         OK
Line number:   6; Columns number:   4         OK
Line number:   7; Columns number:   4         OK
Line number:   8; Columns number:   3      ERROR
Line number:   9; Columns number:   5      ERROR
Line number:   10; Columns number:   4         OK
Line number:   11; Columns number:   4         OK

如果你知道你的数据格式，你可以在 4 维向量中读取你的行，并使用 iostat 变量在每一行打印出一个错误，其中 iostat 是一个大于的整数0.

Answer 2

您可以使用子字符串操作来获取您想要的内容，而不是计算空格。一个简单的例子如下：

program foo

  implicit none

  character(len=512) str    ! Assume str is sufficiently long buffer
  integer fd, cnt, m, n

  open(newunit=fd, file='test.dat', status='old')

  do
     cnt = 0
     read(fd,'(A)',end=10) str
     str = adjustl(str)      ! Eliminate possible leading whitespace
     do 
        n = index(str, ' ')  ! Find first space
        if (n /= 0) then
           write(*, '(A)', advance='no') str(1:n)
           str = adjustl(str(n+1:))
        end if
        if (len_trim(str) == 0) exit    ! Trailing whitespace
        cnt = cnt + 1
     end do
     if (cnt /= 3) then
        write(*,'(A)') '   Error'
     else
        write(*,*)
     end if
  end do

10 close(fd) 

end program foo

Answer 3

这应该读取任何合理长度的行（直到你的编译器默认的行限制，现在通常是 2GB）。您可以将其更改为流 I/O 以没有限制，但大多数 Fortran 编译器无法从标准输入读取流 I/O，本示例从中读取。因此，如果该行看起来像一个数字列表，它应该读取它们，告诉你它读取了多少，并让你知道它读取任何值作为数字（字符串，大于 REAL 大小的字符串）是否有错误价值， ....）。这里的所有部分都在 Fortran Wiki 上进行了解释，但为了简短起见，这是一个将各个部分组合在一起的精简版本。最奇怪的行为是，如果你输入这样的东西，里面有一个斜线

10 20,,30,40e4    50 / this is a list of numbers

它会将斜杠后的所有内容都视为注释，并且不会生成非零状态 return 而 return 有五个值。对于代码的更详细解释，我认为 Wiki 上的注释部分解释了它是如何工作的。在搜索中，查找“getvals”和“readline”。

所以使用这个程序你可以读取一行，如果 return 状态为零并且读取的值的数量是四个你应该很好，除了一些尘土飞扬的角落，这些线条肯定不会看起来就像一个数字列表。

module M_getvals
private
public getvals, readline
implicit none
contains
subroutine getvals(line,values,icount,ierr)
character(len=*),intent(in)     :: line
real                            :: values(:)
integer,intent(out)             :: icount, ierr
character(len=:),allocatable    :: buffer
character(len=len(line))        :: words(size(values))
integer                         :: ios, i
   ierr=0
   words=' '                            
   buffer=trim(line)//"/"               
   read(buffer,*,iostat=ios) words      
   icount=0
   do i=1,size(values)                 
      if(words(i).eq.'') cycle
      read(words(i),*,iostat=ios)values(icount+1)
      if(ios.eq.0)then
         icount=icount+1
      else
         ierr=ios
         write(*,*)'*getvals* WARNING:['//trim(words(i))//'] is not a number'
      endif
   enddo
end subroutine getvals
subroutine readline(line,ier)
character(len=:),allocatable,intent(out) :: line
integer,intent(out)                      :: ier
integer,parameter                        :: buflen=1024
character(len=buflen)                    :: buffer
integer                                  :: last, isize
   line=''
   ier=0
   INFINITE: do
      read(*,iostat=ier,fmt='(a)',advance='no',size=isize) buffer
      if(isize.gt.0)line=line//buffer(:isize)
      if(is_iostat_eor(ier))then
         last=len(line)
         if(last.ne.0)then
            if(line(last:last).eq.'\')then
               line=line(:last-1)
               cycle INFINITE
            endif
         endif
         ier=0
         exit INFINITE
     elseif(ier.ne.0)then
        exit INFINITE
     endif
   enddo INFINITE
   line=trim(line)
end subroutine readline
end module M_getvals
program tryit
use M_getvals, only: getvals, readline
implicit none
character(len=:),allocatable :: line
real,allocatable             :: values(:)
integer                      :: icount, ier, ierr
   INFINITE: do
      call readline(line,ier)
      if(allocated(values))deallocate(values)
      allocate(values(len(line)/2+1))
      if(ier.ne.0)exit INFINITE
      call getvals(line,values,icount,ierr)
      write(*,'(*(g0,1x))')'VALUES=',values(:icount),'NUMBER OF VALUES=',icount,'STATUS=',ierr
   enddo INFINITE
end program tryit

老实说，它应该可以与你扔给它的任何一行都合理地工作。

PS: 如果您总是读取四个值，使用列表导向 I/O 并在 READ 上检查 iostat= 值并检查您是否命中 EOR 将非常简单（只需几行），但既然您说过要读取行任意长度我假设一行有四个值只是一个例子，你想要一些非常通用的东西。

Fortran：使用 'read' 读取数据时检查数据格式

Fortran : check data format while reading data with 'read'

file-io

fortran