Remove duplicates from unsorted array c++

Problem
Given an unsorted array of size n. Write a program to remove all the duplicates from the array.

Example,

Input
{ 6, 6, 7, 6, 9, 1, 9, 0, 0, 1, 4, 5, 1 }

Output
{ 6, 7, 9, 1, 0, 4, 5}

The problem can be solved in many ways. We will discuss some of them with a detailed explanation.

Brute Force ( with extra storage )

In this approach, we use two nested loops. For each element in the array, we iterate all the elements that appear on the left side of the current element to see if it is repeated or not. If no match is found, then it means it is the first occurrence of the current element in the array. Therefore, we push the current element into a temporary array. If we found a match, we do nothing and simply move to the next iteration. Doing this ensures that if an element is repeated then only its leftmost occurrence is considered and the other occurrences are ignored.

Steps

  1. Let arr[] be the input array of size n.
  2. Let temp[] be a temporary array.
  3. Perform the following operation for each element in arr[]
    1. Check if the current element is equal to any element that appears before it in the array.
    2. If the above condition is true, then it means the current element is repeated and already included in temp. Do nothing.
    3. If the condition is false, push the current element to temp.
  4. Set n = temp.SIZE. The size of the array may have changed.
  5. Set arr = temp.
#include <iostream>
#include <vector>
using namespace std;

void remove_duplicates(int* arr, int& n) {

	// temporary list to store all the unique elements
	vector<int> temp;

	// iterate each element of arr[]
	for (int i = 0; i < n; ++i) {

		// checking if there exist an element arr[j] ( j < i )
		// that is equal to arr[i]
		int flag = 1;
		for (int j = 0; j < i; ++j) {
			if (arr[j] == arr[i]) {
				flag = 0;
				break;
			}
		}

		// flag == 0 means arr[i] is repeated
		// flag == 1 means that no element that appears
		// on left side of arr[i] is equal to arr[i]
		// therefore, we push arr[i] to temp
		if (flag)
			temp.push_back(arr[i]);

	}

	// after the completion of above loop
	// temp will contain all the unique elements in arr[] without repetition
	// size of temp may be smaller than the original array
	// so we set n = temp.size() and overwrite arr[] with temp
	n = temp.size();
	for (int i = 0; i < n; ++i)
		arr[i] = temp[i];

}

int main() {

	int arr[] = { 6, 6, 7, 6, 9, 1, 9, 0, 0, 1, 4, 5, 1 };
	int n = sizeof(arr) / sizeof(int);

	cout << "Input Array: ";
	for (int i = 0; i < n; ++i)	cout << arr[i] << ' ';
	cout << endl;

	remove_duplicates(arr, n);

	cout << "Output Array: ";
	for (int i = 0; i < n; ++i)	cout << arr[i] << ' ';
	cout << endl;

}

Output

Input Array: 6 6 7 6 9 1 9 0 0 1 4 5 1
Output Array: 6 7 9 1 0 4 5

Time Complexity: O(N2)
Space Complexity: O(N)

Brute Force ( without extra space )

The problem with the above approach is that it requires extra storage of size O(N). What if we want to remove duplicates without using extra storage?

The solution is very simple. Instead of storing the elements in a temporary list, we store the elements in the original array. We initialize a temporary variable K = 0. Instead of pushing the element to temp[], we insert the element at position K and increment K. See the code for better understanding.

#include <iostream>
#include <vector>
using namespace std;

void remove_duplicates(int* arr, int& n) {

	int k = 0;

	// iterate each element of arr[]
	for (int i = 0; i < n; ++i) {

		// checking if there exist an element arr[j] ( j < i )
		// that is equal to arr[i]
		int flag = 1;
		for (int j = 0; j < i; ++j) {
			if (arr[j] == arr[i]) {
				flag = 0;
				break;
			}
		}

		// flag == 0 means arr[i] is repeated
		// flag == 1 means that no element that appears
		// on left side of arr[i] is equal to arr[i]
		// therefore, we swap arr[k] and arr[i] then increment k
		if (flag)
			swap(arr[k++], arr[i]);

	}

	n = k;

}

int main() {

	int arr[] = { 1, 1, 2, 1, 0, 3, 5, 5, 6, 1, 4, 5, 1 };
	int n = sizeof(arr) / sizeof(int);

	cout << "Input Array: ";
	for (int i = 0; i < n; ++i)	cout << arr[i] << ' ';
	cout << endl;

	remove_duplicates(arr, n);

	cout << "Output Array: ";
	for (int i = 0; i < n; ++i)	cout << arr[i] << ' ';
	cout << endl;

}

Output

Input Array: 1 1 2 1 0 3 5 5 6 1 4 5 1
Output Array: 1 2 0 3 5 6 4

Time Complexity: O(N2)
Space Complexity: O(1)

Note: In the above program we are swapping arr[k++] and arr[i]. We are not doing arr[k++] = arr[i]. This is because if we overwrite arr[k] then if the original value at kth position is repeated in the future, the program will not be able to detect it.

Sorting Approach

The above two approach works fine if the size of the array is small but may exceed the time limit for large input.

In this approach, we first sort the array and then traverse each element. Since the array is sorted, we don’t have to use another loop to check if the current element is repeated or not.

#include <iostream>
#include <algorithm>
using namespace std;

void remove_duplicates(int* arr, int& n) {

	sort(arr, arr + n);
	
	int k = 0;
	
	if(n>0)
		arr[k++] = arr[0];
	
	// compare adjacent elements
	for (int i = 1; i < n; ++i) {		
		// if the current element is not equal to the previous element
		// set arr[k] to arr[i]
		if(arr[i] != arr[i-1])
			arr[k++] = arr[i];
	}

	n = k;

}

int main() {

	int arr[] = { 5, 5, 1, 3, 3, 2, 5, 2, 6, 1, 4, 5, 1 };
	int n = sizeof(arr) / sizeof(int);

	cout << "Input Array: ";
	for (int i = 0; i < n; ++i)	cout << arr[i] << ' ';
	cout << endl;

	remove_duplicates(arr, n);

	cout << "Output Array: ";
	for (int i = 0; i < n; ++i)	cout << arr[i] << ' ';
	cout << endl;

}

Output

Input Array: 5 5 1 3 3 2 5 2 6 1 4 5 1
Output Array: 1 2 3 4 5 6

Time Complexity: O(N) + Time to Sort
Space Complexity: O(1) + Space Required to Sort

Suppose we used HeapSort to sort the array. The time complexity of heap sort is O(NlogN) and space complexity is O(1). Thus, the final complexity of the above program will be

Time Complexity: O(N) + O(N*logN) = O(N*logN)
Space Complexity: O(1) + O(1) = O(1)

Note: You can sort array in many ways. The complexity of the above program will change depending on which algorithm you used. Also, using sorting algorithms like the bubble sort, selection sort is not recommended since their time complexity is O(N2).

There is one problem in this method, that is the order of the elements is not maintained. The output is in sorted order.

Using Hashing

In this method, we first initialize a large array with all zeros. We use this large array to tell if the value is repeated or not.

Steps

  1. Let arr[] be the input array of size N.
  2. Initialize a large array with all zeros. Let the name of the array be hash[].
  3. Perform the following steps for each element of arr[]
    1. Let X = current element of arr[]
    2. If hash[X] == 1, then it means the value X is repeated. Thus, we do nothing and move to the next element.
    3. If hash[X] == 0, then it means the value X is a new value. Therefore, we push X into the final list.
#include <iostream>
using namespace std;

void remove_duplicates(int *arr, int &n) {

	int hash[100000] = {0};
	int k = 0;
	
	// iterate each element of arr[]
	for(int i=0;i<n;++i){
		
		// if hash[a[i]] = 1, then it means a[i] is already present in the final list
		// thus do nothing and move to next iteration
		// if hash[a[i]] = 0, then it means this is first occurence of a[i]
		// we insert a[i] to the final list and change hash[a[i]] to 1 
		if(hash[arr[i]]==0){
			hash[arr[i]] = 1;
			arr[k++] = arr[i];
		}
		
	}
	
	n = k;

}

int main() {

	int arr[] = { 5, 1, 7, 6, 0, 1, 9, 0, 9, 1, 4, 5, 6 };
	int n = sizeof(arr) / sizeof(int);
	
	cout<<"Input Array: ";
	for(int i=0;i<n;++i)	cout<<arr[i]<<' ';
	cout<<endl;
	
	remove_duplicates(arr,n);
	
	cout<<"Output Array: ";
	for(int i=0;i<n;++i)	cout<<arr[i]<<' ';
	cout<<endl;
	
}

Output

Input Array: 5 1 7 6 0 1 9 0 9 1 4 5 6
Output Array: 5 1 7 6 0 9 4

Time Complexity: O(N)
Space Complexity: O(M)
M is the maximum value in the input array

There are 2 problems in this approach

  1. The size of the hash array should be greater than the maximum value in the array. Suppose if the maximum value in the array is in range ~1012. It is not possible to initialize an array of that size.
  2. The second problem is all the values in the array should be positive. But this problem can be easily solved by adding a positive value to each element of the array, making all values positive. Then later subtracting the same value from each element of the array.

Using set

In this approach, we traverse each element of the array and insert it into the set. Before we push the element to the set, we check if it already exists in the set or not. If the element is already in the set, we do nothing. If the element is not in the set, we insert the element into the set and push the element into the final list. This makes sure that each element is only considered once.

The insert() and find() operation on set takes O(logN) comparisons.

#include <iostream>
#include <set>
using namespace std;

void remove_duplicates(int* arr, int& n) {

	set<int> s;
	int k = 0;

	for (int i = 0; i < n; ++i) {
		// if s.find(x) == s.end() then it means that
		// x is not present in the set
		if (s.find(arr[i]) == s.end()) {
			s.insert(arr[i]);
			arr[k++] = arr[i];
		}
	}

	n = k;

}

int main() {

	int arr[] = { 5, 5, 1, 3, 3, 2, 5, 2, 6, 1, 4, 5, 1 };
	int n = sizeof(arr) / sizeof(int);

	cout << "Input Array: ";
	for (int i = 0; i < n; ++i)	cout << arr[i] << ' ';
	cout << endl;

	remove_duplicates(arr, n);

	cout << "Output Array: ";
	for (int i = 0; i < n; ++i)	cout << arr[i] << ' ';
	cout << endl;

}

Output

Input Array: 5 5 1 3 3 2 5 2 6 1 4 5 1
Output Array: 5 1 3 2 6 4

Time Complexity: O(N*logN)
Space Complexity: O(N)

Although the space complexity of this method is more than the previous approach, the advantage of using a set is that the order of elements is maintained.

Using Unordered Map

This method is similar to using a set. The main difference is that set uses a binary tree to store data. The unordered_map uses internal hashing. The complexity of insert() and find() operation in an unordered_map is O(1).

#include <iostream>
#include <unordered_map>
using namespace std;

void remove_duplicates(int* arr, int& n) {

	unordered_map<int, int> m;
	int k = 0;

	// iterate each element of arr[]
	for (int i = 0; i < n; ++i) {

		// if m.find(x) == m.end() then it means that
		// x is present in the map
		if (m.find(arr[i]) == m.end()) {
			m[arr[i]] = 1; // inserting arr[i] into the map
			arr[k++] = arr[i];
		}

	}

	n = k;

}

int main() {

	int arr[] = { 6, 1, 7, 9, 7, 1, 0, 9, 0, 1, 4, 0, 6 };
	int n = sizeof(arr) / sizeof(int);

	cout << "Input Array: ";
	for (int i = 0; i < n; ++i)	cout << arr[i] << ' ';
	cout << endl;

	remove_duplicates(arr, n);

	cout << "Output Array: ";
	for (int i = 0; i < n; ++i)	cout << arr[i] << ' ';
	cout << endl;

}

Output

Input Array: 6 1 7 9 7 1 0 9 0 1 4 0 6
Output Array: 6 1 7 9 0 4

Time Complexity: O(N)
Space Complexity: O(N)

There is only one problem in this approach. Although the space complexity of unordered_map is O(N) but in practice, it occupies a considerably larger amount of memory compared to other methods with the same space complexity.

We can reduce the space by using an ordered map instead of unordered_map but the ordered map works similarly to the set. Thus, the time complexity will increase to O(N*logN).

Read.

Leave a Comment

Your email address will not be published. Required fields are marked *