- Mercado Libre marketplace is a valuable and updated data source of the car sales market based on their features.
- Web scraping script got each item’s price sales, links, photos, and car features. This procedure can be applied to different products. Also, being possible to define a specific location (Buenos Aires, Argentina, for this project)
- Multiple data cleaning techniques and EDA were applied to find the different trends in raw data.
- Price prediction using a pipeline with linear modeling was developed. Satisfactory prediction metrics were reached. A simple web app was developed using Streamlit to provide an intuitive user interface.
The lack of comprehensive data and price uncertainty for new and used cars have become significant concerns in the current automotive market. Several factors, including rapid technological advancements, fluctuating market conditions, and changing consumer preferences, contribute to this issue. Additionally, inconsistencies in car valuation methods and limited access to historical pricing data further exacerbate the problem.
As a result, determining the appropriate pricing for new and used cars has become increasingly challenging. Sellers face the risk of undervaluing their vehicles, leading to potential revenue loss, while buyers may be deterred by overpriced cars or feel dissatisfied after purchase. This situation calls for improved data collection, analysis, and sharing to establish fair and accurate car pricing, ultimately benefiting all stakeholders in the market.
Three-step solution approach was adopted:
- Data Collection through Web Scraping: Leverage web scraping tools to collect current market data about price sales and car features from automotive marketplaces.
- Data Analysis: This step involves identifying patterns, correlations, and trends that can impact car prices. Additionally, remove any inconsistencies or outliers, and handle missing values to ensure the dataset is reliable and accurate for developing a predictive model.
- Machine Learning Model for Car Price Prediction: Pipeline with machine learning model and sample web app.
This project was developed for a local car agency located in Argentine. The app provides accurate price of reference of new and used car, which implies faster sales for the company through increasing the likelihood of attracting potential buyers. The insights provided by the model enable sellers to make informed decisions about vehicle offerings, promotions, or trade-in valuations.
Price sales data of new (0 km) and used cars and their features were obtained from Mercado Libre (The most significant marketplace in South America). A specific web scraping algorithm was developed to get the price and features of the published car (web scraping script can be accessed here). The web scraping script works based on the following steps.
- The user introduces the product to search and geographical location and country.
- The script searches in the marketplace browser, getting the results on multiple pages.
- The script accesses the results links to get each item’s price, car features, and photos through HTML parsing.
- The features are organized into a dictionary.
- Data is transformed into a CSV file.
In this sample, the data obtained is geographically limited to the city of Buenos Aires (the Capital of Argentina). However, the methodology presented in this work can be applied to different cities or countries.