I recently had to sort out a database for a client. It was a mess….
The database was for people to contact, sort of like a telemarketing system (their choice, not mine…). The problem that they were having was that the database contained a lot of entries that had people with different names at the same address with the same phone number. As you can imagine, calling the same people over and over again just because their phone number has come up again was not the best way to make friends.
The solution was to remove the duplicate phone numbers. Easy, right? Well, not quite. After trying to find out how to do it with various methods around, it seemed that everyone that had done this before has used temporary tables to accomplish it. It was a big pain. You needed to create a new table, copy the data across with a DISTINCT keyword, then copy it back, hoping not to loose something along the way.
There is a better way!!!
That is right, there is a very easy way to get around this. The only problem is that it doesn’t do anything gracefully, so while it is better, it may not be the right tool for you to use.
Only use this method if you’re not that concerned about loosing some other data that you might have in your database.
I can’t stress enough that this is very much a brute-force method with all the subtlety of a standard house brick.
All it takes is adding a unique index to that column, and that can remove all of the duplicates. However, you do also need to add in another keyword. The way to do it is.
ALTER IGNORE TABLE table_name ADD UNIQUE INDEX(column_name)
It’s the ‘IGNORE’ keyword that ignores the errors that are normally thrown when the index is not unique. Thanks to the index needing to be unique, it just throws away all of the duplicate entries.
Now, I know that this is very much a “brute force” method, and doesn’t filter for the records that you might want to keep, but if you want those sort of subtle nuances, you’ll need to put a lot more work in then this.