- Kilometer age is the essential feature in the UK car market, followed by the year of production (F- score).
- Multiple data processing methodologies were applied: Cardinality reduction, encoding, quantile processing, and min-max scaling.
- Mutual information (MI) scores and Pearson correlation coefficients differ.
- XGB Regressor shows the best metrics models after Bayesian optimization. The ensemble model was again trained, reaching the best indicators: R2 = 0.449 and RMSE= 0.214.
The United Kingdom’s automotive market has been experiencing fluctuations in car prices due to various factors such as economic conditions, government policies, and technological advancements. Accurate car price prediction is essential for buyers, sellers, manufacturers, and other stakeholders in the industry to make informed decisions. This study aims to develop a data-driven model that can accurately predict car prices in the UK automotive market, taking into account variables such as make, model, age, mileage, fuel type, transmission type, and other relevant factors.
To address the problem of predicting car prices in the UK automotive market, we propose the following solution approach using Bayesian optimization in machine learning models:
- Data Collection and Preprocessing: Missing values handling, data type transformation, cardinality reduction, label encoding, outlier handling (quantile processing)
- Feature Engineering: Feature scaling, heatmap correlation, mutual information analysis.
- Model Selection: XGB Regressor, LGBM Regressor, Gradient Boosting Regressor, Ada Boost Regressor, K Neighbors Regressor, Random Forest, Regressor, Bayesian Ridge, Ridge, Lasso, Elastic Net, and Support Vector Regression
- Hyperparameter Tuning with Bayesian Optimization.
- Model Training and Validation.
- Model Evaluation
- Model deployment into a web app
Several stakeholders would benefit from accurate and reliable predictions in the context of UK car price prediction. These stakeholders and their associated benefits include:
- Car Buyers: Accurate car price predictions empower buyers to make informed decisions when purchasing a vehicle. They can evaluate whether a car is pretty priced, avoiding overpaying or falling victim to fraudulent deals.
- Car Sellers (Individuals and Dealerships): Sellers benefit from a data-driven price estimation, ensuring they list their vehicles at competitive market prices. This fact helps them attract potential buyers and facilitates quicker sales transactions.
- Car Manufacturers: Accurate predictions also enable them to optimize inventory levels and better anticipate future demands
The data was obtained from Kaggle. The data is divided into a train and test dataset, with 19237 and 8245 instances, respectively. The features include Levy, Manufacturer, Model, Prod. year, Category, Leather interior, Fuel type, Engine volume, Mileage, Cylinders, Gear box type, Drive wheels, Doors, Wheel, Color and Airbags.