Another example using a varchar column:
SELECT ID
FROM Books
GROUP BY ID
HAVING COUNT(ID) > 1
Ok the above is great for small tables, to manually track down records, or maybe as part of a larger query or subquery. However odds are you are going to need additional columns of data and probably the actual duplicate rows themselves. Well initial thought might be to expand the simple example query above to include the additional fields, but you will quickly find out that the query will yield no results. This is because the query above groups on the columns in question having a count greater than 1. Well if you add additional columns to the query that do not contain duplicates, then this condition is no longer 'True' and thus no results are returned.
SELECT Title
FROM Books
GROUP BY Title
HAVING COUNT(Title) > 1
The fix is to Join in another copy of the same table. One table's purpose is to focus on the duplicate rows, and the second table's purpose is to focus on the additional columns needed in the results. When joining on the same table an Alias must be given to distinguish between the two. Here is the expanded example from above, that will return all of the *actual* duplicate rows, and any additional information that was sought:
Lastly, here is a template of the above query that you might want to keep handy as sort of a 'fill-in-the-blanks' template (remove brackets - they are just placeholders and not required syntax) for your own 'finding duplicate rows' needs:
SELECT bAll.ID, bAll.PublishDate, bAll.Title, bAll.Price
FROM Books bAll
INNER JOIN (SELECT Title
FROM Books
GROUP BY Title
HAVING COUNT(Title) > 1) bDups
ON bAll.Title = bDups.Title
ORDER BY bAll.Title
I welcome any SQL experts to comment on streamlined ways to accomplish the identical task; I can certainly update the post with additional information. However with the plethora of examples available, too many seemed to be of the basic flavor example and I wanted to introduce the additional functionality that is probably often sought after.
SELECT [AliasAllTable].[Field1], [AliasAllTable].[Field2], [AliasAllTable].[Field3]
FROM [MainTable] [AliasMainTable]
INNER JOIN (SELECT [DuplicateFieldName]
FROM [MainTable]
GROUP BY [DuplicateFieldName]
HAVING COUNT([DuplicateFieldName]) > 1) [AliasDuplicateTable]
ON [AliasAllTable].[DuplicateFieldName] = [AliasDuplicateTable].[DuplicateFieldName]
ORDER BY [AliasAllTable].[DuplicateFieldName]
That's how I do it, too, basically.
ReplyDeleteThanks Allen great post, far from SQL expert.
ReplyDeleteYep, I am a developer at heart and know just enough SQL to be dangerous! If you or anyone has a more streamlined or current way of completing the task, please post back with the example and I will update the post as I mentioned.
ReplyDeleteI usually use the ROW_NUMBER method to find duplicates. Example:
ReplyDelete;WITH CTE
AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY title
ORDER BY ( SELECT 0 ) ) RN, book_id, title, author_last_name, author_first_name, rating
FROM books
)
SELECT book_id, title, author_last_name, author_first_name, rating
FROM CTE
WHERE RN > 1