PostgreSQL:强制执行 Table 中的行顺序
PostgreSQL: Enforce Order of Rows in Table
我在我的 Windows 7 笔记本电脑上使用 PostgreSQL 9.6.1 来编译和分析来自不同来源的大型数据集。我的一位客户注意到,在我提供给他们的最终报告中,她所在州的一些人与其他州混为一谈。
对于这份报告,我创建了最终的 table:
CREATE UNLOGGED TABLE LPIS_IssuanceDetail (
ID SERIAL PRIMARY KEY,
Zone TEXT DEFAULT NULL,
State TEXT DEFAULT NULL,
LastName TEXT DEFAULT NULL,
FirstName TEXT DEFAULT NULL,
Email TEXT DEFAULT NULL,
UPN TEXT DEFAULT NULL,
LincPassUsed TEXT DEFAULT NULL,
EmployeeID TEXT DEFAULT NULL,
EmploymentType TEXT DEFAULT NULL,
NonEmployeeCategory TEXT DEFAULT NULL,
EmploymentStatus TEXT DEFAULT NULL,
ISAComplete TEXT DEFAULT NULL,
ISACompletionDate TIMESTAMP WITHOUT TIME ZONE,
LincPassStatus TEXT DEFAULT NULL,
ERO TEXT DEFAULT NULL,
Sponsored TEXT DEFAULT NULL,
Enrolled TEXT DEFAULT NULL,
Adjudicated TEXT DEFAULT NULL,
ShipToSite TEXT DEFAULT NULL,
ValidSite TEXT DEFAULT NULL,
CardExpiration DATE,
CertExpiration DATE,
LastEnrollment DATE,
EnrollmentExpiration DATE,
NewEnrollment TEXT DEFAULT NULL,
Sponsor TEXT DEFAULT NULL,
ContractEnd DATE,
ContractID TEXT DEFAULT NULL,
ContractPOC TEXT DEFAULT NULL
);
然后我用来自主数据 table 的数据填充此 table 数据:
INSERT INTO LPIS_IssuanceDetail (
Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID,
EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete,
ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated,
ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration,
CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC
)
SELECT
Zone, StateName, MAS_LastName, MAS_FirstName, MAS_Email, MAS_UPN,
LincPassUsed, MAS_EmployeeID, MAS_Category, MAS_OrgRelType,
MAS_EmploymentStatus, ISAComplete, ISA_CompletionDate, MAS_IssuanceStatus,
MAS_FedEmerResponse, Sponsored, Enrolled, Adjudicated, MAS_ShipToCityState,
MAS_ValidShipToSite, MAS_CertExpireDate, MAS_LastEnrollmentDate, MAS_EnrollExpireDate,
MAS_CardExpireDate, MAS_NewEnrollment, MAS_Sponsor, MAS_PeriodofPerformanceEndDate,
MAS_ContractID, MAS_ContractPOC
FROM LPIS_MasterData
ORDER BY Zone, StateName, MAS_LastName, MAS_FirstName;
果然,当我向下滚动 table 时,我发现单条记录乱序散布,就像这个样本,其中一条来自缅因州的记录错位了:
id | zone | state | lastname | firstname
11849 | 3 | Georgia | Roberts | George
11850 | 3 | Georgia | Smith | Dan
11922 | 3 | Maine | Edwards | John
11851 | 3 | Georgia | Snowden | Ed
11852 | 3 | Georgia | Williams | Casey
作为测试,我只将前四列转储到单独的 table,如下所示:
CREATE UNLOGGED TABLE LPIS_DetailTest (
ID SERIAL PRIMARY KEY,
Zone TEXT DEFAULT NULL,
State TEXT DEFAULT NULL,
LastName TEXT DEFAULT NULL,
FirstName TEXT DEFAULT NULL
);
INSERT INTO LPIS_DetailTest (
Zone, State, LastName, FirstName
)
SELECT
Zone, State, LastName, FirstName
FROM LPIS_IssuanceDetail
ORDER BY Zone, State, LastName, FirstName;
并且所有行都按预期顺序排列:
id | zone | state | lastname | firstname
11849 | 3 | Georgia | Roberts | George
11850 | 3 | Georgia | Smith | Dan
11851 | 3 | Georgia | Snowden | Ed
11852 | 3 | Georgia | Williams | Casey
11853 | 3 | Georgia | Spaid | Dennis
为什么这个较小的 table 使用与较大的 table 相同的 ORDER BY
子句正确排序,其中一些行是乱序的?
数据库和所有 table 都设置为 UTF8。
我查看了所有内容,但无法弄清楚为什么 ORDER BY
子句会产生如此奇怪的结果。我还能检查什么?
编辑:附加信息
在我的脚本中,在 INSERT INTO ... SELERCT ...
语句之后,我使用 COPY 将数据转储到 CSV 文件,如下所示:
-- Export data to CSV files
COPY LPIS_IssuanceDetail (
Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID,
EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete,
ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated,
ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration,
CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC
)
TO 'C:/Users/Michael.Sheaver/Documents/LincPass/Datasets/Compiled Reports/LPIS_IssuanceDetail.csv'
WITH (
FORMAT CSV,
DELIMITER ',',
NULL '',
HEADER TRUE,
QUOTE '"',
ENCODING 'UTF8'
);
然后,当我将此 CSV 文件导入电子表格以进行最终演示时,我必须手动对 ID 列上的数据进行排序,然后删除该列。
新问题:
我可以在 INSERT INTO 语句中使用任何选项来严格保留行的顺序以遵循 ORDER BY 子句中指定的顺序吗?
如果要对 CSV 文件中的数据进行排序,请使用 copy
和 select
语句:
COPY (select Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID,
EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete,
ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated,
ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration,
CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC
from LPIS_IssuanceDetail
ORDER BY Zone, State, LastName, FirstName
)
TO 'C:/Users/Michael.Sheaver/Documents/LincPass/Datasets/Compiled Reports/LPIS_IssuanceDetail.csv'
WITH (FORMAT CSV, DELIMITER ',', NULL '', HEADER TRUE, QUOTE '"', ENCODING 'UTF8');
我在我的 Windows 7 笔记本电脑上使用 PostgreSQL 9.6.1 来编译和分析来自不同来源的大型数据集。我的一位客户注意到,在我提供给他们的最终报告中,她所在州的一些人与其他州混为一谈。
对于这份报告,我创建了最终的 table:
CREATE UNLOGGED TABLE LPIS_IssuanceDetail (
ID SERIAL PRIMARY KEY,
Zone TEXT DEFAULT NULL,
State TEXT DEFAULT NULL,
LastName TEXT DEFAULT NULL,
FirstName TEXT DEFAULT NULL,
Email TEXT DEFAULT NULL,
UPN TEXT DEFAULT NULL,
LincPassUsed TEXT DEFAULT NULL,
EmployeeID TEXT DEFAULT NULL,
EmploymentType TEXT DEFAULT NULL,
NonEmployeeCategory TEXT DEFAULT NULL,
EmploymentStatus TEXT DEFAULT NULL,
ISAComplete TEXT DEFAULT NULL,
ISACompletionDate TIMESTAMP WITHOUT TIME ZONE,
LincPassStatus TEXT DEFAULT NULL,
ERO TEXT DEFAULT NULL,
Sponsored TEXT DEFAULT NULL,
Enrolled TEXT DEFAULT NULL,
Adjudicated TEXT DEFAULT NULL,
ShipToSite TEXT DEFAULT NULL,
ValidSite TEXT DEFAULT NULL,
CardExpiration DATE,
CertExpiration DATE,
LastEnrollment DATE,
EnrollmentExpiration DATE,
NewEnrollment TEXT DEFAULT NULL,
Sponsor TEXT DEFAULT NULL,
ContractEnd DATE,
ContractID TEXT DEFAULT NULL,
ContractPOC TEXT DEFAULT NULL
);
然后我用来自主数据 table 的数据填充此 table 数据:
INSERT INTO LPIS_IssuanceDetail (
Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID,
EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete,
ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated,
ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration,
CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC
)
SELECT
Zone, StateName, MAS_LastName, MAS_FirstName, MAS_Email, MAS_UPN,
LincPassUsed, MAS_EmployeeID, MAS_Category, MAS_OrgRelType,
MAS_EmploymentStatus, ISAComplete, ISA_CompletionDate, MAS_IssuanceStatus,
MAS_FedEmerResponse, Sponsored, Enrolled, Adjudicated, MAS_ShipToCityState,
MAS_ValidShipToSite, MAS_CertExpireDate, MAS_LastEnrollmentDate, MAS_EnrollExpireDate,
MAS_CardExpireDate, MAS_NewEnrollment, MAS_Sponsor, MAS_PeriodofPerformanceEndDate,
MAS_ContractID, MAS_ContractPOC
FROM LPIS_MasterData
ORDER BY Zone, StateName, MAS_LastName, MAS_FirstName;
果然,当我向下滚动 table 时,我发现单条记录乱序散布,就像这个样本,其中一条来自缅因州的记录错位了:
id | zone | state | lastname | firstname
11849 | 3 | Georgia | Roberts | George
11850 | 3 | Georgia | Smith | Dan
11922 | 3 | Maine | Edwards | John
11851 | 3 | Georgia | Snowden | Ed
11852 | 3 | Georgia | Williams | Casey
作为测试,我只将前四列转储到单独的 table,如下所示:
CREATE UNLOGGED TABLE LPIS_DetailTest (
ID SERIAL PRIMARY KEY,
Zone TEXT DEFAULT NULL,
State TEXT DEFAULT NULL,
LastName TEXT DEFAULT NULL,
FirstName TEXT DEFAULT NULL
);
INSERT INTO LPIS_DetailTest (
Zone, State, LastName, FirstName
)
SELECT
Zone, State, LastName, FirstName
FROM LPIS_IssuanceDetail
ORDER BY Zone, State, LastName, FirstName;
并且所有行都按预期顺序排列:
id | zone | state | lastname | firstname
11849 | 3 | Georgia | Roberts | George
11850 | 3 | Georgia | Smith | Dan
11851 | 3 | Georgia | Snowden | Ed
11852 | 3 | Georgia | Williams | Casey
11853 | 3 | Georgia | Spaid | Dennis
为什么这个较小的 table 使用与较大的 table 相同的 ORDER BY
子句正确排序,其中一些行是乱序的?
数据库和所有 table 都设置为 UTF8。
我查看了所有内容,但无法弄清楚为什么 ORDER BY
子句会产生如此奇怪的结果。我还能检查什么?
编辑:附加信息
在我的脚本中,在 INSERT INTO ... SELERCT ...
语句之后,我使用 COPY 将数据转储到 CSV 文件,如下所示:
-- Export data to CSV files
COPY LPIS_IssuanceDetail (
Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID,
EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete,
ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated,
ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration,
CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC
)
TO 'C:/Users/Michael.Sheaver/Documents/LincPass/Datasets/Compiled Reports/LPIS_IssuanceDetail.csv'
WITH (
FORMAT CSV,
DELIMITER ',',
NULL '',
HEADER TRUE,
QUOTE '"',
ENCODING 'UTF8'
);
然后,当我将此 CSV 文件导入电子表格以进行最终演示时,我必须手动对 ID 列上的数据进行排序,然后删除该列。
新问题: 我可以在 INSERT INTO 语句中使用任何选项来严格保留行的顺序以遵循 ORDER BY 子句中指定的顺序吗?
如果要对 CSV 文件中的数据进行排序,请使用 copy
和 select
语句:
COPY (select Zone, State, LastName, FirstName, Email, UPN, LincPassUsed, EmployeeID,
EmploymentType, NonEmployeeCategory, EmploymentStatus, ISAComplete,
ISACompletionDate, LincPassStatus, ERO, Sponsored, Enrolled, Adjudicated,
ShipToSite, ValidSite, CertExpiration, LastEnrollment, EnrollmentExpiration,
CardExpiration, NewEnrollment, Sponsor, ContractEnd, ContractID, ContractPOC
from LPIS_IssuanceDetail
ORDER BY Zone, State, LastName, FirstName
)
TO 'C:/Users/Michael.Sheaver/Documents/LincPass/Datasets/Compiled Reports/LPIS_IssuanceDetail.csv'
WITH (FORMAT CSV, DELIMITER ',', NULL '', HEADER TRUE, QUOTE '"', ENCODING 'UTF8');