Automated Machine Learning on Azure: Raw Data to Production

What if you could train, evaluate, and select the best machine learning model for your business problem — without writing a single line of ML code? Azure AutoML makes this a reality. In this post, we walk through a real fraud detection use case end-to-end, from data registration to model evaluation, using both the Azure Studio UI and the Python SDK.

1. What is Automated Machine Learning?

Machine learning model development is traditionally a resource-intensive process. Data scientists spend weeks selecting algorithms, engineering features, tuning hyperparameters, and validating results. For businesses without large ML teams, this creates a significant barrier to entry.

Automated Machine Learning (AutoML) solves this by systematically iterating through combinations of algorithms, preprocessing steps, and hyperparameters — selecting the best-performing pipeline based on a metric you define. Azure AutoML, part of Microsoft’s Azure Machine Learning platform, industrialises this process at scale.

Under the hood, Azure AutoML handles three major phases automatically:

Featurization — imputing missing values, encoding categoricals, normalising numerics, detecting data types
Algorithm sweeping — trying LightGBM, XGBoost, RandomForest, ExtremeRandomTrees, and more
Ensembling — optionally combining top models into a voting or stacking ensemble for higher accuracy

Key insight: Azure AutoML doesn’t just pick one model — it runs a full experiment of trials, each with a different algorithm and preprocessing combination, scored via cross-validation. You get a ranked leaderboard of all trials, not just the winner.

2. When Should Businesses Use AutoML?

AutoML is exceptionally well-suited to a broad range of business problems. Here are the most impactful use cases:

🏦

Fraud & Risk Detection

Identify anomalous transactions or account activity. AutoML handles class imbalance and selects metrics like AUC that matter for rare-event detection.

📉

Customer Churn Prediction

Predict which customers are likely to leave before they do. AutoML rapidly prototypes classification models from CRM data without custom ML engineering.

📦

Demand Forecasting

Forecast inventory, staffing, or energy demand. Azure AutoML supports time-series forecasting with automatic lag detection and seasonality handling.

🏥

Healthcare Risk Stratification

Classify patient risk levels for readmission or disease progression — accelerating model development while maintaining clinical interpretability.

🛒

Price Optimisation

Regression AutoML models predict optimal price points based on competitor data, seasonality, and demand signals.

🏗️

Predictive Maintenance

Classify equipment failure likelihood from sensor data. AutoML evaluates tree-based and boosting models best suited for structured IoT data.

3. Business Advantages of Azure AutoML

Advantage	What it means for your business
Speed to insight	Go from raw data to a ranked set of trained models in hours, not weeks
No ML expertise required	Business analysts and engineers can run experiments without data science backgrounds
Cost efficiency	Pay only for compute used during training; no full-time ML team overhead
Reproducibility	Every trial is logged, versioned, and auditable in Azure ML Studio
Responsible AI built-in	Feature importance, model explainability, and fairness metrics generated automatically
Enterprise-grade MLOps	Best model can be registered, deployed, and monitored directly from the same platform

4. The Use Case: Bank Account Fraud Detection

For this walkthrough, we use the Bank Account Fraud Dataset (NeurIPS 2022) — a large-scale synthetic dataset with approximately one million transactions and 30 features including payment type, employment status, housing status, device OS, and more.

The target variable is fraud_bool — a binary label indicating whether a bank account application is fraudulent. We use AUC weighted as our primary metric, which accounts for class imbalance and measures the model’s ability to rank fraudulent cases above legitimate ones.

5. Registering Data as an MLTable Asset

Azure AutoML requires data registered as an MLTable asset — not a raw file. An MLTable is a versioned, schema-aware data asset that tells Azure how to read, parse, and validate your data before training begins.

Step 1: Re-partition the Parquet file

Azure AutoML enforces a 20 MB row group size limit on Parquet files. Large datasets must be re-partitioned first:

import pyarrow.parquet as pq
import os

table = pq.read_table("path/to/your/data.parquet")

os.makedirs("./train_mltable", exist_ok=True)

pq.write_table(
    table,
    "./train_mltable/train_data.parquet",
    row_group_size=50000   # well under the 20 MB limit
)
print(f"Rows: {len(table)}")
print(f"Size: {os.path.getsize('./train_mltable/train_data.parquet') / 1e6:.1f} MB")

Step 2: Create the MLTable configuration file

# ./train_mltable/MLTable
paths:
  - file: ./train_data.parquet
transformations:
  - read_parquet:
      include_path_column: false

Step 3: Register as a versioned data asset

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

credential = DefaultAzureCredential()
ml_client = MLClient(
    credential,
    subscription_id="<your-subscription-id>",
    resource_group_name="<your-resource-group>",
    workspace_name="<your-workspace>"
)

ml_client.data.create_or_update(
    Data(
        path="./train_mltable",
        type=AssetTypes.MLTABLE,
        name="fraud_train_mltable",
        version="1",
        description="Fraud detection training data"
    )
)
print("MLTable registered successfully")

Why MLTable? Unlike a raw URI file, MLTable includes schema metadata. Azure AutoML uses this to validate column types, detect the target variable, and apply featurization correctly — before a single model is trained.

📸 The registered fraud_train_mltable data asset in Azure ML Studio — Version 1, Type: Table.

6. Launching AutoML from Azure Studio (UI Walkthrough)

Azure Machine Learning Studio provides a guided, no-code interface for submitting AutoML jobs. Navigate to Jobs → + New job → Train automatically.

Step 1 — Training Method

Azure offers three training methods. Select Train automatically to launch the AutoML wizard — this submits a fully managed AutoML job without writing a single line of code.

📸 Step 1: Select ‘Train automatically’ to launch the AutoML wizard.

Step 2 — Basic Settings

Assign a meaningful job name and create a new experiment to group related runs. Experiments act as logical containers — all trials from this job will appear grouped under this experiment name in Studio.

📸 Step 2: Job named ‘fraud-automl-classification-UI’ under experiment ‘fraud-detection-automl’.

Step 3 — Task Type & Data

Select Classification as the task type, then choose your registered MLTable asset. Only MLTable-type assets appear — raw URI_FILE assets are filtered out as unsupported. This is why correct data registration matters.

📸 Step 3: Classification task selected with fraud_train_mltable showing as a supported Table asset.

Step 4 — Task Settings

Set the target column to fraud_bool. Key configuration options include:

Primary metric: AUC weighted — optimal for imbalanced fraud data
Validation: 5-fold cross validation — each model scored across 5 non-overlapping data splits
Featurization: Auto — Azure handles encoding, imputation, and scaling automatically

📸 Step 4: Target column set to fraud_bool, with 5-fold cross validation and AUC weighted as primary metric.

Step 5 — Compute

Select your Azure ML compute cluster. We use compute1 — a Standard_DS3_v2 with 4 vCPUs, 14 GB RAM at $0.27/hr.

📸 Step 5: compute1 cluster selected — Standard_DS3_v2, 4 vCPUs, 14GB RAM, $0.27/hr.

Step 6 — Review & Submit

The review screen summarises all configuration before submission. Verify task type, data asset, target column, validation strategy, and compute before clicking Submit training job.

📸 Step 6: Final review showing all settings — Classification, fraud_train_mltable, fraud_bool target, 5-fold CV, compute1.

7. The Job in Action

Once submitted, the job overview page shows real-time status. Azure begins by provisioning compute, then starts iterating through algorithm and preprocessing combinations. The Tags panel live-updates with each completed trial’s algorithm, score, and preprocessor.

📸 Live job overview showing Status: Running, Primary metric: AUC weighted, Featurization: Auto, with real-time trial tags.

8. Submitting an AutoML Job via Python SDK

For teams that prefer code-first workflows or need to integrate AutoML into CI/CD pipelines, the Azure ML Python SDK offers full programmatic control:

from azure.ai.ml import MLClient, Input
from azure.ai.ml.automl import classification
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential

# 1. Connect to workspace
credential = DefaultAzureCredential()
ml_client = MLClient(
    credential,
    subscription_id="<your-subscription-id>",
    resource_group_name="<your-resource-group>",
    workspace_name="<your-workspace>"
)

# 2. Reference your registered MLTable
training_data = Input(
    type=AssetTypes.MLTABLE,
    path="azureml:fraud_train_mltable:1"
)

# 3. Configure the AutoML classification job
automl_job = classification(
    compute="compute1",
    experiment_name="fraud-detection-automl",
    training_data=training_data,
    target_column_name="fraud_bool",
    primary_metric="AUC_weighted",
    n_cross_validations=5,
    enable_model_explainability=True,
)

# 4. Set limits to control cost and runtime
automl_job.set_limits(
    timeout_minutes=120,
    trial_timeout_minutes=15,
    max_trials=20,
    max_concurrent_trials=1,
    enable_early_termination=True
)

# 5. Submit
returned_job = ml_client.jobs.create_or_update(automl_job)
print(f"Job submitted: {returned_job.name}")
print(f"Studio URL: {returned_job.studio_url}")

Polling Job Status

import time

job_name = returned_job.name
while True:
    job = ml_client.jobs.get(job_name)
    print(f"Status: {job.status}")
    if job.status in ["Completed", "Failed", "Canceled"]:
        break
    time.sleep(30)

Retrieving & Registering the Best Model

from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

# Get best child run
best_child = ml_client.jobs.get(f"{job_name}_best")

# Register model to Azure ML registry
registered_model = ml_client.models.create_or_update(
    Model(
        path=f"azureml://jobs/{best_child.name}/outputs/best_model",
        name="fraud-detection-model",
        description="Best AutoML model — fraud detection experiment",
        type=AssetTypes.MLFLOW_MODEL
    )
)
print(f"Registered: {registered_model.name} v{registered_model.version}")

9. Interpreting the Results

Once trials complete, the Models + child jobs tab presents a ranked leaderboard of all trained models. Each row shows algorithm, preprocessing scaler, AUC score, training duration, and key hyperparameters.

Models + child jobs tab — 6 trained models ranked by AUC weighted, with algorithm names, durations, and hyperparameters.

In our experiment, the top results were:

Best Model Summary

Clicking the top model reveals a full summary — algorithm pipeline, AUC score, sampling percentage, and deployment status. From here you can deploy, download, or explain the model in one click.

Best model detail — MaxAbsScaler + LightGBM, AUC weighted: 0.89301, 100% sampling, ready to deploy or register.

Metrics Dashboard

The metrics tab provides over 15 performance indicators alongside four diagnostic charts: ROC curve, Precision-Recall curve, Calibration curve, Confusion Matrix, Cumulative Gains, and Lift curve.

Metrics dashboard for best model — ROC, Precision-Recall, Calibration, Lift curves, Confusion Matrix, and 15+ metric tiles.

Metric	Value	What it means
Accuracy	0.9888	Overall correct predictions — misleading for imbalanced data
AUC weighted	0.8930	Primary metric — model’s ability to rank fraud above non-fraud
AUC micro	0.9975	Micro-averaged across classes
Precision (weighted)	0.9832	Of predicted fraud cases, how many are real fraud
Recall (macro)	0.5235	Of actual fraud cases, how many did the model catch
Log loss	0.0458	Prediction confidence — lower is better

Outputs & Artifacts

Azure AutoML automatically saves a rich set of artifacts for the best model under the Outputs + logs tab — everything needed to reproduce, deploy, or extend the model.

Outputs + logs — model.pkl, mlflow-model/, scoring scripts, featurization_summary.json, pipeline_graph.json, and more.

Artifact	Purpose
model.pkl	Serialised trained model — ready for inference
mlflow-model/	MLflow-packaged model for standardised deployment
scoring_file_v1_0_0.py	Auto-generated inference script
featurization_summary.json	Transformations applied to each feature
conda_env_v1_0_0.yml	Exact environment spec for reproducible inference
pipeline_graph.json	Full preprocessing + model pipeline definition

Pro tip: Click View generated code on any child run to see the full Python code that reproduces the exact pipeline AutoML selected — bridging no-code AutoML and custom ML engineering.

10. Conclusion

Azure AutoML eliminates the most time-consuming parts of the ML lifecycle — algorithm selection, hyperparameter tuning, and cross-validation — without sacrificing transparency or control. In our fraud detection experiment, six production-quality models were trained, evaluated, and ranked in under 20 minutes of wall-clock time, with the best model achieving an AUC of 0.893 on a highly imbalanced dataset.

For businesses looking to operationalise machine learning without building large data science teams, Azure AutoML offers a compelling combination of speed, governance, and enterprise integration. The full audit trail, versioned artifacts, and built-in explainability make it suitable for regulated industries including banking, insurance, and healthcare.

Tags: AI Accountability Machine Learning Microsoft RAI Components

1. What is Automated Machine Learning?

2. When Should Businesses Use AutoML?

Fraud & Risk Detection

Customer Churn Prediction

Demand Forecasting

Healthcare Risk Stratification

Price Optimisation

Predictive Maintenance

3. Business Advantages of Azure AutoML

4. The Use Case: Bank Account Fraud Detection

5. Registering Data as an MLTable Asset

Step 1: Re-partition the Parquet file

Step 2: Create the MLTable configuration file

Step 3: Register as a versioned data asset

6. Launching AutoML from Azure Studio (UI Walkthrough)

Step 1 — Training Method

Step 2 — Basic Settings

Step 3 — Task Type & Data

Step 4 — Task Settings

Step 5 — Compute

Step 6 — Review & Submit

7. The Job in Action

8. Submitting an AutoML Job via Python SDK

Polling Job Status

Retrieving & Registering the Best Model

9. Interpreting the Results

Best Model Summary

Metrics Dashboard

Outputs & Artifacts

10. Conclusion

Leave a Reply Cancel reply