Feature Engineering

Alchemy for ML: Turning Raw Data into Signal

If data cleaning is the science of ML preparation, Feature Engineering is the art.

Feature Engineering is the process of using domain knowledge to create new variables (features) that make machine learning algorithms work better. A brilliant algorithm with poor features will always be beaten by a simple algorithm with brilliant features.

Encoding: Speaking the Machine’s Language

Machine learning models are mathematical. They require numbers, not words. You cannot simply feed the word “Red” or “Blue” into a neural network.

We use Encoding to translate:

  • Label Encoding: Converting categories to numbers (e.g., Small=1, Medium=2, Large=3). This is great for ordinal data where the order matters.

  • One-Hot Encoding: Creating a new binary column for every category (e.g., “Is_Red”, “Is_Blue”). This is safer for categories without a natural order, preventing the model from assuming that Blue (2) is “greater than” Red (1).

Scaling: Levelling the Playing Field

Imagine a dataset with two columns: “Age” (mostly 20-80) and “Salary” (mostly 30,000-100,000).

Because the numbers in “Salary” are so much bigger, the model might assume that Salary is thousands of times more important than Age. To prevent this, we use Scaling to squeeze all data into the same range, typically between 0 and 1 (Normalisation) or centred around zero (Standardisation).

Creating Interaction Features

Sometimes the strongest signal comes from combining two variables. If you are predicting house prices, the “Total Price” and “Total Square Footage” are useful. But creating a new feature called “Price Per Square Foot” (Price divided by Size) might reveal a much stronger pattern about the value of the property.

This is where domain expertise shines. You are not just processing data; you are translating real-world logic into inputs the machine can understand.

Next up: We are almost ready to train. But before we do, we must ensure we aren’t cheating. Part 6 covers Splitting and Data Leakage.

Get in touch to talk to a data engineering expert

Recent

High Digital Named Best Computer Software Business of the Year 2026 We spend a lot of time on our blog and LinkedIn poking fun at the tech industry. We joke about Burger Fish product launches, the dangers of treating A...
Building the Factory: Automating Your Data Pipeline

We have reached the end of our journey. Over the last six posts, we have taken a raw, messy dataset and transformed it into a...

Don’t Cheat: Proper Splitting and Avoiding Data Leakage

We have now arrived at the most dangerous phase of the data preparation pipeline.

You have col...

Contact us

Complete the form and we’ll get in touch

Please enable JavaScript in your browser to complete this form.
Checkboxes

How Can We Help?

  • Building a new data product?

    Let's bring your vision to life.

  • Getting AI-ready?

    We'll prepare your data for intelligent insights.

  • Need custom application development?

    Scalable, secure, and built for growth.

  • Database challenges?

    Optimization, migration, or architecture - we've got you covered.

  • Exploring AI solutions?

    Our experts can guid your next big move.

  • Need better reporting & analytics?

    We create dashboards and visualisations that turn your data into clear, actionable insights.

Send a message or schedule a call for a free consultation

Awards & accreditations

High Digital: top bi data company
High Digital: top bi data company
Cyber Essentials Plus
High Digital: Innovate UK
High Digital : ISO 27001
High Digital : ISO 27001

'Our customers love to work with us'

Clutch logo

5 icon star icon star icon star icon star icon star

Read our reviews