从 FTP 服务器下载多个文件(使用套接字)
Download multiple files from FTP server (using sockets)
我想直接使用套接字从 FTP 服务器下载多个文件。我将 Swift 5 与 BlueSocket 库一起使用,它基本上是一个包装器,因此命令与我通过例如完成所有操作一样。 Windows控制台。
FTP 命令:
Login + connect cmdSocket
cmdSocket send: PASV
cmdSocket receive: 227 Entering Passive Mode
cmdSocket send: TYPE I
cmdSocket receive: 200 Type set to I
Connect dataSocket to Passive Mode IP/port
cmdSocket send: CWD myFolder
cmdSocket receive: 250 CWD command successful
遍历所有文件:
cmdSocket send: RETR myFileX
cmdSocket receive: Either "150 Downloading in BINARY file" or "125 Data connection already open; Transfer starting"
dataSocket: Receive data and save it to storage
cmdSocket receive: 226 Transfer complete
这对第一个文件(“myFile1”)工作正常,但在第二个循环迭代(“myFile2”)中一切都改变了:
cmdSocket send: RETR myFile2
cmdSocket receive: 150 Opening BINARY mode data connection.
现在 dataSocket 不会 return 任何字节,有时它还会收到“425 无法打开数据连接”。除了“150”。我尝试使用“myFile1”两次,但结果相同。
我猜订单已关闭,但到底出了什么问题?我是否必须在循环内更改每个文件的类型?我是否必须为每个文件打开一个新的数据套接字,或者在第一个文件收到“226”后发送一些“重置”命令?
FTP 的工作原理如下:
您设置了到 FTP 服务器的 TCP 连接。
您可以通过该套接字发送命令,服务器将通过同一个套接字进行回复。
您可以使用端口命令:
Client: PORT 192,168,1,2,7,139 //The client wants the server to send to port number 1931 on the client machine. 7 and 139 in hex = 078B = 1931
Server: 200 PORT command successful.
Client: RETR Yoyodyne.TXT //Download "Yoyodyne.TXT."
Server: 150 Opening ASCII mode data connection for Yoyodyne.TXT.The server now connects out from its port 20 on 172.16.62.36 to port 1931 on 192.168.1.2.
Server: 226 Transfer completed. //That succeeded, so the data is now sent over the established data connection. And the connection is closed
所以一个新的RETR首先需要一个新的PORT命令。或者,如果不是,则可能是之前的端口命令是持久的。但是你必须重新连接。
每个文件都有自己的套接字,其数据将 send/received。并且在发送每个文件后关闭套接字。所以你每次都需要一个新的套接字,一个新的连接。
更多细节在这里:
https://www.ncftp.com/libncftp/doc/ftp_overview.html
默认情况下,FTP 在数据传输时使用 STREAM
传输模式。在 STREAM
模式下,通过关闭数据连接来发出文件结束信号。因此,在 STREAM
模式下,每个数据连接只能发送 1 个文件。
要解决这个问题,您必须:
发出新的 PORT
/PASV
命令为每个单独的文件建立新的数据连接。
发出MODE
命令,在传输文件前切换到BLOCK
或COMPRESSED
传输模式。两种模式都不会通过关闭数据连接来发出 EOF 信号,而是通过在每个文件末尾的数据连接上发送显式标记,从而允许通过单个数据连接传输多个文件。
有关详细信息,请阅读官方 FTP 协议规范,RFC 959,特别是第 3.3 节“数据连接管理”和第 3.4 节“传输模式”:
3.3. DATA CONNECTION MANAGEMENT
Default Data Connection Ports: All FTP implementations must
support use of the default data connection ports, and only the
User-PI may initiate the use of non-default ports.
Negotiating Non-Default Data Ports: The User-PI may specify a
non-default user side data port with the PORT command. The
User-PI may request the server side to identify a non-default
server side data port with the PASV command. Since a connection
is defined by the pair of addresses, either of these actions is
enough to get a different data connection, still it is permitted
to do both commands to use new ports on both ends of the data
connection.
Reuse of the Data Connection: When using the stream mode of data
transfer the end of the file must be indicated by closing the
connection. This causes a problem if multiple files are to be
transfered in the session, due to need for TCP to hold the
connection record for a time out period to guarantee the reliable
communication. Thus the connection can not be reopened at once.
There are two solutions to this problem. The first is to
negotiate a non-default port. The second is to use another
transfer mode.
A comment on transfer modes. The stream transfer mode is
inherently unreliable, since one can not determine if the
connection closed prematurely or not. The other transfer modes
(Block, Compressed) do not close the connection to indicate the
end of file. They have enough FTP encoding that the data
connection can be parsed to determine the end of the file.
Thus using these modes one can leave the data connection open
for multiple file transfers.
3.4. TRANSMISSION MODES
The next consideration in transferring data is choosing the
appropriate transmission mode. There are three modes: one which
formats the data and allows for restart procedures; one which also
compresses the data for efficient transfer; and one which passes
the data with little or no processing. In this last case the mode
interacts with the structure attribute to determine the type of
processing. In the compressed mode, the representation type
determines the filler byte.
All data transfers must be completed with an end-of-file (EOF)
which may be explicitly stated or implied by the closing of the
data connection. For files with record structure, all the
end-of-record markers (EOR) are explicit, including the final one.
For files transmitted in page structure a "last-page" page type is
used.
NOTE: In the rest of this section, byte means "transfer byte"
except where explicitly stated otherwise.
For the purpose of standardized transfer, the sending host will
translate its internal end of line or end of record denotation
into the representation prescribed by the transfer mode and file
structure, and the receiving host will perform the inverse
translation to its internal denotation. An IBM Mainframe record
count field may not be recognized at another host, so the
end-of-record information may be transferred as a two byte control
code in Stream mode or as a flagged bit in a Block or Compressed
mode descriptor. End-of-line in an ASCII or EBCDIC file with no
record structure should be indicated by <CRLF> or <NL>,
respectively. Since these transformations imply extra work for
some systems, identical systems transferring non-record structured
text files might wish to use a binary representation and stream
mode for the transfer.
The following transmission modes are defined in FTP:
3.4.1. STREAM MODE
The data is transmitted as a stream of bytes. There is no
restriction on the representation type used; record structures
are allowed.
In a record structured file EOR and EOF will each be indicated
by a two-byte control code. The first byte of the control code
will be all ones, the escape character. The second byte will
have the low order bit on and zeros elsewhere for EOR and the
second low order bit on for EOF; that is, the byte will have
value 1 for EOR and value 2 for EOF. EOR and EOF may be
indicated together on the last byte transmitted by turning both
low order bits on (i.e., the value 3). If a byte of all ones
was intended to be sent as data, it should be repeated in the
second byte of the control code.
If the structure is a file structure, the EOF is indicated by
the sending host closing the data connection and all bytes are
data bytes.
3.4.2. BLOCK MODE
The file is transmitted as a series of data blocks preceded by
one or more header bytes. The header bytes contain a count
field, and descriptor code. The count field indicates the
total length of the data block in bytes, thus marking the
beginning of the next data block (there are no filler bits).
The descriptor code defines: last block in the file (EOF) last
block in the record (EOR), restart marker (see the Section on
Error Recovery and Restart) or suspect data (i.e., the data
being transferred is suspected of errors and is not reliable).
This last code is NOT intended for error control within FTP.
It is motivated by the desire of sites exchanging certain types
of data (e.g., seismic or weather data) to send and receive all
the data despite local errors (such as "magnetic tape read
errors"), but to indicate in the transmission that certain
portions are suspect). Record structures are allowed in this
mode, and any representation type may be used.
The header consists of the three bytes. Of the 24 bits of
header information, the 16 low order bits shall represent byte
count, and the 8 high order bits shall represent descriptor
codes as shown below.
Block Header
+----------------+----------------+----------------+
| Descriptor | Byte Count |
| 8 bits | 16 bits |
+----------------+----------------+----------------+
The descriptor codes are indicated by bit flags in the
descriptor byte. Four codes have been assigned, where each
code number is the decimal value of the corresponding bit in
the byte.
Code Meaning
128 End of data block is EOR
64 End of data block is EOF
32 Suspected errors in data block
16 Data block is a restart marker
With this encoding, more than one descriptor coded condition
may exist for a particular block. As many bits as necessary
may be flagged.
The restart marker is embedded in the data stream as an
integral number of 8-bit bytes representing printable
characters in the language being used over the control
connection (e.g., default--NVT-ASCII). <SP> (Space, in the
appropriate language) must not be used WITHIN a restart marker.
For example, to transmit a six-character marker, the following
would be sent:
+--------+--------+--------+
|Descrptr| Byte count |
|code= 16| = 6 |
+--------+--------+--------+
+--------+--------+--------+
| Marker | Marker | Marker |
| 8 bits | 8 bits | 8 bits |
+--------+--------+--------+
+--------+--------+--------+
| Marker | Marker | Marker |
| 8 bits | 8 bits | 8 bits |
+--------+--------+--------+
3.4.3. COMPRESSED MODE
There are three kinds of information to be sent: regular data,
sent in a byte string; compressed data, consisting of
replications or filler; and control information, sent in a
two-byte escape sequence. If n>0 bytes (up to 127) of regular
data are sent, these n bytes are preceded by a byte with the
left-most bit set to 0 and the right-most 7 bits containing the
number n.
Byte string:
1 7 8 8
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|0| n | | d(1) | ... | d(n) |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
^ ^
|---n bytes---|
of data
String of n data bytes d(1),..., d(n)
Count n must be positive.
To compress a string of n replications of the data byte d, the
following 2 bytes are sent:
Replicated Byte:
2 6 8
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|1 0| n | | d |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
A string of n filler bytes can be compressed into a single
byte, where the filler byte varies with the representation
type. If the type is ASCII or EBCDIC the filler byte is <SP>
(Space, ASCII code 32, EBCDIC code 64). If the type is Image
or Local byte the filler is a zero byte.
Filler String:
2 6
+-+-+-+-+-+-+-+-+
|1 1| n |
+-+-+-+-+-+-+-+-+
The escape sequence is a double byte, the first of which is the
escape byte (all zeros) and the second of which contains
descriptor codes as defined in Block mode. The descriptor
codes have the same meaning as in Block mode and apply to the
succeeding string of bytes.
Compressed mode is useful for obtaining increased bandwidth on
very large network transmissions at a little extra CPU cost.
It can be most effectively used to reduce the size of printer
files such as those generated by RJE hosts.
我想直接使用套接字从 FTP 服务器下载多个文件。我将 Swift 5 与 BlueSocket 库一起使用,它基本上是一个包装器,因此命令与我通过例如完成所有操作一样。 Windows控制台。
FTP 命令:
Login + connect cmdSocket
cmdSocket send: PASV
cmdSocket receive: 227 Entering Passive Mode
cmdSocket send: TYPE I
cmdSocket receive: 200 Type set to I
Connect dataSocket to Passive Mode IP/port
cmdSocket send: CWD myFolder
cmdSocket receive: 250 CWD command successful
遍历所有文件:
cmdSocket send: RETR myFileX
cmdSocket receive: Either "150 Downloading in BINARY file" or "125 Data connection already open; Transfer starting"
dataSocket: Receive data and save it to storage
cmdSocket receive: 226 Transfer complete
这对第一个文件(“myFile1”)工作正常,但在第二个循环迭代(“myFile2”)中一切都改变了:
cmdSocket send: RETR myFile2
cmdSocket receive: 150 Opening BINARY mode data connection.
现在 dataSocket 不会 return 任何字节,有时它还会收到“425 无法打开数据连接”。除了“150”。我尝试使用“myFile1”两次,但结果相同。
我猜订单已关闭,但到底出了什么问题?我是否必须在循环内更改每个文件的类型?我是否必须为每个文件打开一个新的数据套接字,或者在第一个文件收到“226”后发送一些“重置”命令?
FTP 的工作原理如下: 您设置了到 FTP 服务器的 TCP 连接。 您可以通过该套接字发送命令,服务器将通过同一个套接字进行回复。 您可以使用端口命令:
Client: PORT 192,168,1,2,7,139 //The client wants the server to send to port number 1931 on the client machine. 7 and 139 in hex = 078B = 1931
Server: 200 PORT command successful.
Client: RETR Yoyodyne.TXT //Download "Yoyodyne.TXT."
Server: 150 Opening ASCII mode data connection for Yoyodyne.TXT.The server now connects out from its port 20 on 172.16.62.36 to port 1931 on 192.168.1.2.
Server: 226 Transfer completed. //That succeeded, so the data is now sent over the established data connection. And the connection is closed
所以一个新的RETR首先需要一个新的PORT命令。或者,如果不是,则可能是之前的端口命令是持久的。但是你必须重新连接。
每个文件都有自己的套接字,其数据将 send/received。并且在发送每个文件后关闭套接字。所以你每次都需要一个新的套接字,一个新的连接。
更多细节在这里: https://www.ncftp.com/libncftp/doc/ftp_overview.html
默认情况下,FTP 在数据传输时使用 STREAM
传输模式。在 STREAM
模式下,通过关闭数据连接来发出文件结束信号。因此,在 STREAM
模式下,每个数据连接只能发送 1 个文件。
要解决这个问题,您必须:
发出新的
PORT
/PASV
命令为每个单独的文件建立新的数据连接。发出
MODE
命令,在传输文件前切换到BLOCK
或COMPRESSED
传输模式。两种模式都不会通过关闭数据连接来发出 EOF 信号,而是通过在每个文件末尾的数据连接上发送显式标记,从而允许通过单个数据连接传输多个文件。
有关详细信息,请阅读官方 FTP 协议规范,RFC 959,特别是第 3.3 节“数据连接管理”和第 3.4 节“传输模式”:
3.3. DATA CONNECTION MANAGEMENT
Default Data Connection Ports: All FTP implementations must
support use of the default data connection ports, and only the
User-PI may initiate the use of non-default ports.
Negotiating Non-Default Data Ports: The User-PI may specify a
non-default user side data port with the PORT command. The
User-PI may request the server side to identify a non-default
server side data port with the PASV command. Since a connection
is defined by the pair of addresses, either of these actions is
enough to get a different data connection, still it is permitted
to do both commands to use new ports on both ends of the data
connection.
Reuse of the Data Connection: When using the stream mode of data
transfer the end of the file must be indicated by closing the
connection. This causes a problem if multiple files are to be
transfered in the session, due to need for TCP to hold the
connection record for a time out period to guarantee the reliable
communication. Thus the connection can not be reopened at once.
There are two solutions to this problem. The first is to
negotiate a non-default port. The second is to use another
transfer mode.
A comment on transfer modes. The stream transfer mode is
inherently unreliable, since one can not determine if the
connection closed prematurely or not. The other transfer modes
(Block, Compressed) do not close the connection to indicate the
end of file. They have enough FTP encoding that the data
connection can be parsed to determine the end of the file.
Thus using these modes one can leave the data connection open
for multiple file transfers.
3.4. TRANSMISSION MODES
The next consideration in transferring data is choosing the
appropriate transmission mode. There are three modes: one which
formats the data and allows for restart procedures; one which also
compresses the data for efficient transfer; and one which passes
the data with little or no processing. In this last case the mode
interacts with the structure attribute to determine the type of
processing. In the compressed mode, the representation type
determines the filler byte.
All data transfers must be completed with an end-of-file (EOF)
which may be explicitly stated or implied by the closing of the
data connection. For files with record structure, all the
end-of-record markers (EOR) are explicit, including the final one.
For files transmitted in page structure a "last-page" page type is
used.
NOTE: In the rest of this section, byte means "transfer byte"
except where explicitly stated otherwise.
For the purpose of standardized transfer, the sending host will
translate its internal end of line or end of record denotation
into the representation prescribed by the transfer mode and file
structure, and the receiving host will perform the inverse
translation to its internal denotation. An IBM Mainframe record
count field may not be recognized at another host, so the
end-of-record information may be transferred as a two byte control
code in Stream mode or as a flagged bit in a Block or Compressed
mode descriptor. End-of-line in an ASCII or EBCDIC file with no
record structure should be indicated by <CRLF> or <NL>,
respectively. Since these transformations imply extra work for
some systems, identical systems transferring non-record structured
text files might wish to use a binary representation and stream
mode for the transfer.
The following transmission modes are defined in FTP:
3.4.1. STREAM MODE
The data is transmitted as a stream of bytes. There is no
restriction on the representation type used; record structures
are allowed.
In a record structured file EOR and EOF will each be indicated
by a two-byte control code. The first byte of the control code
will be all ones, the escape character. The second byte will
have the low order bit on and zeros elsewhere for EOR and the
second low order bit on for EOF; that is, the byte will have
value 1 for EOR and value 2 for EOF. EOR and EOF may be
indicated together on the last byte transmitted by turning both
low order bits on (i.e., the value 3). If a byte of all ones
was intended to be sent as data, it should be repeated in the
second byte of the control code.
If the structure is a file structure, the EOF is indicated by
the sending host closing the data connection and all bytes are
data bytes.
3.4.2. BLOCK MODE
The file is transmitted as a series of data blocks preceded by
one or more header bytes. The header bytes contain a count
field, and descriptor code. The count field indicates the
total length of the data block in bytes, thus marking the
beginning of the next data block (there are no filler bits).
The descriptor code defines: last block in the file (EOF) last
block in the record (EOR), restart marker (see the Section on
Error Recovery and Restart) or suspect data (i.e., the data
being transferred is suspected of errors and is not reliable).
This last code is NOT intended for error control within FTP.
It is motivated by the desire of sites exchanging certain types
of data (e.g., seismic or weather data) to send and receive all
the data despite local errors (such as "magnetic tape read
errors"), but to indicate in the transmission that certain
portions are suspect). Record structures are allowed in this
mode, and any representation type may be used.
The header consists of the three bytes. Of the 24 bits of
header information, the 16 low order bits shall represent byte
count, and the 8 high order bits shall represent descriptor
codes as shown below.
Block Header
+----------------+----------------+----------------+
| Descriptor | Byte Count |
| 8 bits | 16 bits |
+----------------+----------------+----------------+
The descriptor codes are indicated by bit flags in the
descriptor byte. Four codes have been assigned, where each
code number is the decimal value of the corresponding bit in
the byte.
Code Meaning
128 End of data block is EOR
64 End of data block is EOF
32 Suspected errors in data block
16 Data block is a restart marker
With this encoding, more than one descriptor coded condition
may exist for a particular block. As many bits as necessary
may be flagged.
The restart marker is embedded in the data stream as an
integral number of 8-bit bytes representing printable
characters in the language being used over the control
connection (e.g., default--NVT-ASCII). <SP> (Space, in the
appropriate language) must not be used WITHIN a restart marker.
For example, to transmit a six-character marker, the following
would be sent:
+--------+--------+--------+
|Descrptr| Byte count |
|code= 16| = 6 |
+--------+--------+--------+
+--------+--------+--------+
| Marker | Marker | Marker |
| 8 bits | 8 bits | 8 bits |
+--------+--------+--------+
+--------+--------+--------+
| Marker | Marker | Marker |
| 8 bits | 8 bits | 8 bits |
+--------+--------+--------+
3.4.3. COMPRESSED MODE
There are three kinds of information to be sent: regular data,
sent in a byte string; compressed data, consisting of
replications or filler; and control information, sent in a
two-byte escape sequence. If n>0 bytes (up to 127) of regular
data are sent, these n bytes are preceded by a byte with the
left-most bit set to 0 and the right-most 7 bits containing the
number n.
Byte string:
1 7 8 8
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|0| n | | d(1) | ... | d(n) |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
^ ^
|---n bytes---|
of data
String of n data bytes d(1),..., d(n)
Count n must be positive.
To compress a string of n replications of the data byte d, the
following 2 bytes are sent:
Replicated Byte:
2 6 8
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|1 0| n | | d |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
A string of n filler bytes can be compressed into a single
byte, where the filler byte varies with the representation
type. If the type is ASCII or EBCDIC the filler byte is <SP>
(Space, ASCII code 32, EBCDIC code 64). If the type is Image
or Local byte the filler is a zero byte.
Filler String:
2 6
+-+-+-+-+-+-+-+-+
|1 1| n |
+-+-+-+-+-+-+-+-+
The escape sequence is a double byte, the first of which is the
escape byte (all zeros) and the second of which contains
descriptor codes as defined in Block mode. The descriptor
codes have the same meaning as in Block mode and apply to the
succeeding string of bytes.
Compressed mode is useful for obtaining increased bandwidth on
very large network transmissions at a little extra CPU cost.
It can be most effectively used to reduce the size of printer
files such as those generated by RJE hosts.