Intermediate Data Cleaning in Excel
πIntermediate Data Cleaning in Excel – Smarter Functions Every Analyst Must Know
Hello friends π Welcome back to Brundha_Sri Learning Hub!
In the previous blog, we learned the basic cleaning tools like TRIM, PROPER, SUBSTITUTE, etc. But in real-world projects, messy data often goes beyond extra spaces – you’ll see mixed text and numbers, duplicates, wrong formats, and unstructured fields.
That’s why today, we’re moving one level deeper π with Intermediate Data Cleaning Functions.
These functions don’t just clean – they control, validate, and structure data so that your analysis becomes rock-solid.
πΉ 1. VALUE() – Convert Text to Numbers
π Meaning: Converts numbers stored as text into real numeric values.
π Useful when you import data from web/PDF/CSV, where numbers look correct but Excel treats them as text.
Syntax:
=VALUE(text)
Example:
Raw Data → "25"
(stored as text, left-aligned)
Formula → =VALUE(A2)
Result → 25
(number, right-aligned ✅)
π‘ Real-time use: Cleaning age, salary, or price columns imported as text.
πΉ 2. LEFT(), RIGHT(), MID() – Extract Text
π Meaning: Pulls specific parts of text from the left, right, or middle.
Syntax:
=LEFT(text,num_chars)
=RIGHT(text,num_chars)
=MID(text,start_num,num_chars)
Example (Phone Numbers):
Raw Data → +91-9876543210
-
=RIGHT(A2,10)
→9876543210
✅ (removes country code) -
=LEFT(A2,3)
→+91
✅ (extracts code) -
=MID(A2,5,5)
→98765
✅ (middle part)
π‘ Real-time use: Splitting product codes, extracting country codes, invoice IDs, etc.
πΉ 3. LEN() – Count Characters
π Meaning: Returns the number of characters in a string (including spaces).
Syntax:
=LEN(text)
Example:
Raw Data → 9876543210
Formula → =LEN(A2)
Result → 10
✅
π‘ Real-time use: Phone/email validation (check if phone numbers = 10 digits).
πΉ 4. FIND() vs SEARCH() – Locate Text
π Meaning: Finds the position of a character/text inside a string.
-
FIND()
→ Case-sensitive -
SEARCH()
→ Not case-sensitive
Syntax:
=FIND(find_text,within_text)
=SEARCH(find_text,within_text)
Example (Email Check):
Raw Data → emma.brown@gmail.com
Formula → =SEARCH("@",A2)
Result → 11
✅ (position of “@”)
π‘ Real-time use: Validate if emails contain “@” or “.com”.
πΉ 5. ROUND(), ROUNDUP(), ROUNDDOWN() – Clean Decimals
π Meaning: Controls decimal places by rounding.
Syntax:
=ROUND(number,num_digits)
=ROUNDUP(number,num_digits)
=ROUNDDOWN(number,num_digits)
Example:
Raw Data → 25.678
-
=ROUND(A2,2)
→25.68
✅ -
=ROUNDUP(A2,2)
→25.68
✅ -
=ROUNDDOWN(A2,2)
→25.67
✅
π‘ Real-time use: Cleaning prices, salaries, percentages for reports.
πΉ 6. CONCAT() & TEXTJOIN() – Join Data
π Meaning: Combines multiple text fields into one.
Syntax:
=CONCAT(text1,text2,…)
=TEXTJOIN(delimiter,ignore_empty,text1,text2,…)
Example (Names):
First Name → Emma
Last Name → Brown
-
=CONCAT(A2," ",B2)
→Emma Brown
✅ -
=TEXTJOIN(" ",TRUE,A2:B2)
→Emma Brown
✅
π‘ Real-time use: Combine first+last names, join address fields, or merge IDs.
πΉ 7. REMOVE DUPLICATES (Excel Tool)
π Meaning: Deletes duplicate rows from a dataset.
Steps:
-
Select dataset
-
Go to Data → Remove Duplicates
-
Choose columns to check
-
Done ✅
π‘ Real-time use: Removing duplicate customer names, phone numbers, transactions.
π Example Dataset
Before Cleaning (Messy Data)
Name (Raw) | Phone (Raw) | Age (Raw) | Salary (Raw) | Email (Raw) |
---|---|---|---|---|
John Smith | +91-9876543210 | "25" | 25000.5678 | john.smith@GMAIL.com |
Emma Brown | +1-202-555-0198 | 30 | 30000.4499 | emma.brown@gmail.com |
John Smith | 9876543210 | "25" | 25000.5678 | john.smith@GMAIL.com |
Rahul Sharma | +91-987.654.3210 | 28 | 28999.9999 | rahul.sharma@ outlook.com |
After Cleaning (Using Functions)
Name (Cleaned) | Phone (Cleaned) | Age (Number) | Salary (Rounded) | Email (Validated) |
---|---|---|---|---|
John Smith | 9876543210 | 25 | 25000.57 | john.smith@gmail.com |
Emma Brown | 2025550198 | 30 | 30000.45 | emma.brown@gmail.com |
Rahul Sharma | 9876543210 | 28 | 29000.00 | rahul.sharma@outlook.com |
✅ Duplicates removed (only 3 rows remain)
✅ Numbers converted
✅ Emails validated
✅ Salaries rounded
π€ Wrap-Up
“Friends, now you’re not just cleaning – you’re controlling the data. Functions like VALUE, LEFT/RIGHT/MID, LEN, SEARCH, ROUND, CONCAT, and Remove Duplicates give you full power to transform messy fields into reliable datasets.
Next time you see country codes in phone numbers, duplicate names, or inconsistent decimals – you’ll know exactly what to do πͺ.”
π In Next, we’ll go into Advanced Data Cleaning Functions (IF, ISNUMBER, ISTEXT, TEXT Functions for validation, FILTER, UNIQUE, Power Query basics). Stay tuned π
Comments
Post a Comment