

Simple and accurate predictions for air pollution levels (PM2.5 particles) for the next several hours.
Other apps on the market are complicated and use AQI (Air Quality Index), which is hard for users to understand. For companies, it's difficult to reliably measure consistent inputs for the AQI.
Smoglens predicts only the concentration of tiny PM2.5 particles (particulate matter 2.5 micrometers in diameter). These particles are very harmful but can be filtered out using air purifiers when inside or respirators when outside. Other inputs that go into the AQI cannot be easily filtered out using available air purifiers. We're focusing on predicting what we can actually take action on.
We pulled 2+ years of localized time series data from different APIs for weather, traffic, air pollution, and others (not yet used in the presented model).
The data was messy - sensors failed, measurements were missing, APIs had bugs. So we built a spatial interpolation system using hexagons (H3 indexing) to fill gaps and aggregate data with different coordinates. For the predictions, we built an ensemble of several models (LightGBM, XGBoost, CatBoost) voting on the PM2.5 prediction for the next 6 hours. The biggest impact came from feature selection/engineering. We engineered 69 features from the available data - accounting for PM2.5 lags, rolling stats, weather features, traffic features, and others. This helped us build a usable model.
Demo day video
Tech stack

.png)






.jpg)


