Oracle- Quick Delete Repeated Record

xiaoxiao2021-04-11  4.0K+

When I did a project, a colleague was transferred to the data, I accidentally made the data in a table, that is, all records in this table have a repetition. The data of this table is tens of millions and is a production system. That is, you cannot delete all records and you must delete your duplicate record quickly.

In this regard, summarize the method of deleting repeated records, and the advantages and disadvantages of each method.

For the convenience of shooting, it is assumed to have a TBL, and there are three columns of colipi, col2, col3, where col1, col2 is the primary key, and coll1, col2 adds an index.

1. Create a temporary table

You can import the data into a temporary table, then delete the data of the original table, then guide the data back to the original table, the SQL statement is as follows:


Truncate Table TBL; / / Clear Picture Record

INSERT INTO TBL SELECT * from TBL_TMP; // Inserts the data in the temporary table back.

This method can achieve demand, but it is obvious that for a 10 million-level record table, this method is slow, in the production system, this will bring a lot of overhead to the system, not feasible.

2, use RowID

In Oracle, each record has a RowID, and RowID is unique in the entire database, and the RowID determines which data file, block, and rows of each record is in Oracle. In repeated records, all columns may be the same, but the RowID will not be the same. The SQL statement is as follows:

Delete from TBL Where Rowid in (SELECT A.ROWID FROM TBL A, TBL B Where A.Rowid> B.RowID and a.col1 = B.COL1 and A.COL2 = B.COL2)

If each record has only a repetition, this SQL statement is applicable. However, if the repeated record of each record has n, this N is unknown, it is necessary to consider the following method.

3, use the max or min function

It is also necessary to use the RowID, which is different from the above, is accomplished, and the MAX or MIN function is implemented. SQL statement is as follows

Delete from TBL a WHERE ROWID NOT IN (SELECT MAX (B.ROWID) from TBL B Where A.col1 = B.COL1 and A.COL2 = B.COL2); // Here MAX can also use MIN or

Or use the following statement

Delete from TBL a WHERE ROWID <(SELECT MAX (B.ROWID) from TBL B Where A.col1 = B.COL1 and A.COL2 = B.COL2); // This here, if you change your MAX to min, the front where WHERE In the clause, "<" is required to be ">" the above method is basically the same, but the group BY is used, which reduces the explicit comparison conditions and improves efficiency. The SQL statement is as follows: delete from TBL Where Rowid Not in (SELECT MAX (Rowid) from TBL T Group By T.COL1, T.COL2); Delete from TBL Where (Col1, Col2) in (Select Col1, Col2 from TBL Group by COL1, Col2 Having Count (*)> 1) And Rowid Not in (SELECT NIN (ROWID) from TBL Group By Col1, Col2 Having Count (*)> 1) There is also a method for recording records in the table. Less, and have an index, it is more applicable. Suppose there is index on COL1, COL2, and there are few records in the TBL table, the SQL statement is as follows, using Group By, improve efficiency


New Post(0)