A complete guide on “how to remove duplicates in excel.”
The evolution of technology has facilitated businesses to opt for advanced tools and automation, and here’s where data function becomes fundamental. One such tool businesses and institutions have been utilizing judiciously for extracting useful information from data is the MS Excel Spreadsheet.
For those who aren’t acquainted with this software program, MS Excel is an electronic spreadsheet that helps you organize data, visually represent them through charts and graphs, and perform various functions and formulas.
The spreadsheet contains rows and columns in which data is entered, with a total of 1048576 rows and 16383 columns. The composition of a row and column creates a cell, which is referenced by the column letter and the row number (Example: A1, B2, D1, etc.).
When working with a data set, turning the raw data into a meaningful one can be quite a challenging task. Here is where Excel comes in handy to clean the data by removing duplicate values using the remove duplicates function. Additionally, we will go over alternative ways to remove duplicates in excel.
How to remove duplicates in excel with the remove duplicates function:
To get started, we will create a new excel workbook. On the sidebar menu, select New -> click on Blank workbook.
You will then see a blank sheet; by default, your cursor will activate the A1 cell. I have already created a random data set for explaining through an example as we progress.
Note: Before I delete duplicates, I will copy my entire table to another sheet. This ensures that my original data set stays intact.
Let us get started with removing duplicates in excel, then.
- Click on any single cell in your table which contains data.
- Select the Data tab at the top side of your screen.
- In the Data Tools group, you will find the Remove Duplicates function
- After you click the Remove Duplicates button, you will notice the Remove Duplicates dialogue box appears. Here we will select the columns from which we want to delete the duplicate values. Since my table has headers, I have checked the My data has headers option.
- After you click OK, another dialog box appears from which we can know how many unique values remain and how many duplicate values were removed.
Note: Unless your table contains primary keys or you specifically know which duplicate rows to remove in the excel sheet, it is advisable to select all the rows to match the duplicate entries and to remain with unique values only. Else you might end up with something like this instead:
Here you will see that, from the checkboxes, I only selected the brand column instead of all the columns; while there were duplicate entries in this column, there were also unique values in the same row. Hence, we remain with an undesired output.
Primary keys are values in one or more columns that help to identify each row separately in a table uniquely. For example, in my table, the primary key is the product_id.
How to remove duplicates in excel with the advanced filter option:
Another way to find duplicates and remove them from the rows in excel is through the Advanced filtering option.
- To find the Advanced filter option, head to the Data tab, where you will see the Advanced filter option in Sort & Filter group.
- Click on a cell, click on the Advanced filter option, and a dialog box will appear. You can see all the selected cells in the List range, which you can change as per your preference.
- Leave the Criteria range blank, check the Unique records only box, and click OK to remove duplicate entries. You also have the option to copy the unique values to different cells. To achieve this, click the Copy to another location option from the dialog box and select the range of the cells where you want to display the unique values in the Copy to section.
How to remove duplicates in excel using conditional formatting:
You can also find and remove duplicate values with the help of conditional formatting in excel. There are two ways of approaching this about this-
With Primary key:
1) If your data contains a primary key, select the column with the primary key in it, and under the Home tab, you will find the Conditional Formatting option in the Styles group.
2) Then select the Highlight Cells Rules option, and from there, select the Duplicate Values option
3) A dialog box will appear, which gives us an option to format cells by Unique value or Duplicate value. Unless you want to customize it, leave it on the default setting, and click OK.
4) You will see that the cell colour of the duplicate rows has changed to Red along with the original values.
5) Right-click on the selected cells, select the Filter option, and click Filter by Selected Cell’s Color. Then right-click again and select the Sort option, choose Sort Smallest to Largest or vice-versa according to your preference.
Note: Since you have not selected all the columns, another dialogue box might appear asking you to continue with the current selection or expand it. Select expand, else only the selected range will get sorted.
6) Select the first row and then hold Ctrl on your keyboard to select duplicate values from the remaining rows; alternatively, right-click and select delete.
7) Your duplicate data will be removed and left with only the unique records. Finally, right-click on the column you had placed the filter and select the Clear Filter to show all the unique rows.
Without Primary Key
1) If you don’t have a primary key, it is pretty much the same steps as above except for one change– we will create a makeshift primary key.
2) Add a new column, name it anything you wish, and then in the first cell, enter “=ROW()” inside the parenthesis, select any cell from that row, and click enter.
3) You will notice a + sign on the right bottom edge of the cell, left-click on it and drag it downwards; you will notice the cells getting filled automatically.
4) Repeat the steps we went through for “With Primary key.”
Excel remains one of the number one tools for data cleaning; finding and removing duplicate values in excel is quite simple, and the same result can be achieved through different methods, as we saw above. Let us know your preferred method!