Intermediate Data Cleaning in Excel

πŸš€Intermediate Data Cleaning in Excel – Smarter Functions Every Analyst Must Know

Hello friends πŸ‘‹ Welcome back to Brundha_Sri Learning Hub!

In the previous blog, we learned the basic cleaning tools like TRIM, PROPER, SUBSTITUTE, etc. But in real-world projects, messy data often goes beyond extra spaces – you’ll see mixed text and numbers, duplicates, wrong formats, and unstructured fields.

That’s why today, we’re moving one level deeper πŸš€ with Intermediate Data Cleaning Functions.

These functions don’t just clean – they control, validate, and structure data so that your analysis becomes rock-solid.


πŸ”Ή 1. VALUE() – Convert Text to Numbers

πŸ‘‰ Meaning: Converts numbers stored as text into real numeric values.
πŸ‘‰ Useful when you import data from web/PDF/CSV, where numbers look correct but Excel treats them as text.

Syntax:

=VALUE(text)

Example:
Raw Data → "25" (stored as text, left-aligned)
Formula → =VALUE(A2)
Result → 25 (number, right-aligned ✅)

πŸ’‘ Real-time use: Cleaning age, salary, or price columns imported as text.


πŸ”Ή 2. LEFT(), RIGHT(), MID() – Extract Text

πŸ‘‰ Meaning: Pulls specific parts of text from the left, right, or middle.

Syntax:

=LEFT(text,num_chars)  
=RIGHT(text,num_chars)  
=MID(text,start_num,num_chars)  

Example (Phone Numbers):
Raw Data → +91-9876543210

  • =RIGHT(A2,10)9876543210 ✅ (removes country code)

  • =LEFT(A2,3)+91 ✅ (extracts code)

  • =MID(A2,5,5)98765 ✅ (middle part)

πŸ’‘ Real-time use: Splitting product codes, extracting country codes, invoice IDs, etc.


πŸ”Ή 3. LEN() – Count Characters

πŸ‘‰ Meaning: Returns the number of characters in a string (including spaces).

Syntax:

=LEN(text)

Example:
Raw Data → 9876543210
Formula → =LEN(A2)
Result → 10

πŸ’‘ Real-time use: Phone/email validation (check if phone numbers = 10 digits).


πŸ”Ή 4. FIND() vs SEARCH() – Locate Text

πŸ‘‰ Meaning: Finds the position of a character/text inside a string.

  • FIND() → Case-sensitive

  • SEARCH() → Not case-sensitive

Syntax:

=FIND(find_text,within_text)  
=SEARCH(find_text,within_text)  

Example (Email Check):
Raw Data → emma.brown@gmail.com
Formula → =SEARCH("@",A2)
Result → 11 ✅ (position of “@”)

πŸ’‘ Real-time use: Validate if emails contain “@” or “.com”.


πŸ”Ή 5. ROUND(), ROUNDUP(), ROUNDDOWN() – Clean Decimals

πŸ‘‰ Meaning: Controls decimal places by rounding.

Syntax:

=ROUND(number,num_digits)  
=ROUNDUP(number,num_digits)  
=ROUNDDOWN(number,num_digits)  

Example:
Raw Data → 25.678

  • =ROUND(A2,2)25.68

  • =ROUNDUP(A2,2)25.68

  • =ROUNDDOWN(A2,2)25.67

πŸ’‘ Real-time use: Cleaning prices, salaries, percentages for reports.


πŸ”Ή 6. CONCAT() & TEXTJOIN() – Join Data

πŸ‘‰ Meaning: Combines multiple text fields into one.

Syntax:

=CONCAT(text1,text2,…)  
=TEXTJOIN(delimiter,ignore_empty,text1,text2,…)  

Example (Names):
First Name → Emma
Last Name → Brown

  • =CONCAT(A2," ",B2)Emma Brown

  • =TEXTJOIN(" ",TRUE,A2:B2)Emma Brown

πŸ’‘ Real-time use: Combine first+last names, join address fields, or merge IDs.


πŸ”Ή 7. REMOVE DUPLICATES (Excel Tool)

πŸ‘‰ Meaning: Deletes duplicate rows from a dataset.

Steps:

  1. Select dataset

  2. Go to Data → Remove Duplicates

  3. Choose columns to check

  4. Done ✅

πŸ’‘ Real-time use: Removing duplicate customer names, phone numbers, transactions.


πŸ“ Example Dataset

Before Cleaning (Messy Data)

Name (Raw) Phone (Raw) Age (Raw) Salary (Raw) Email (Raw)
John Smith +91-9876543210 "25" 25000.5678 john.smith@GMAIL.com
Emma Brown +1-202-555-0198 30 30000.4499 emma.brown@gmail.com
John Smith 9876543210 "25" 25000.5678 john.smith@GMAIL.com
Rahul Sharma +91-987.654.3210 28 28999.9999 rahul.sharma@ outlook.com

After Cleaning (Using Functions)

Name (Cleaned) Phone (Cleaned) Age (Number) Salary (Rounded) Email (Validated)
John Smith 9876543210 25 25000.57 john.smith@gmail.com
Emma Brown 2025550198 30 30000.45 emma.brown@gmail.com
Rahul Sharma 9876543210 28 29000.00 rahul.sharma@outlook.com

✅ Duplicates removed (only 3 rows remain)
✅ Numbers converted
✅ Emails validated
✅ Salaries rounded


🎀 Wrap-Up

“Friends, now you’re not just cleaning – you’re controlling the data. Functions like VALUE, LEFT/RIGHT/MID, LEN, SEARCH, ROUND, CONCAT, and Remove Duplicates give you full power to transform messy fields into reliable datasets.

Next time you see country codes in phone numbers, duplicate names, or inconsistent decimals – you’ll know exactly what to do πŸ’ͺ.”

πŸ‘‰ In Next, we’ll go into Advanced Data Cleaning Functions (IF, ISNUMBER, ISTEXT, TEXT Functions for validation, FILTER, UNIQUE, Power Query basics). Stay tuned πŸš€


Comments

Popular posts from this blog

Excel for Data Analysts: The Ultimate Beginner’s Guide

Learn Excel for Data Analytics: Step-by-Step Roadmap

Exploring the Excel Ribbon Menu: A Complete Beginner’s Guide