Introduction
We have tried to provide a complete solution evolving around weather data based on both connected weather stations and meteorological models. Those data are both used directly or included in agronomic models to highlight the need for irrigation or the potential spread of diseases.
Méthodologie
In this talk, we will present the different technological pieces allowing the farmers to access remotely, through web and mobile applications, the weather conditions on their fields:
- An event driven architecture based on Flink and Kafka for managing data sent by the stations, performing temporal aggregations and storing the results in PostgreSQL databases and Deltalake
- An API based on FastAPI for serving the data to the diverse components of the application
- A batch processing architecture based on Airflow and kedro for monitoring the health our 30000 stations and processing data extraction from partners
- A compute engine based on Spark over Databricks for massive stations data analysis and assessment of the performance of weather forecasts models
At the end of the talk, we hope that the audience will have an overview of the decisions behind the choice of the different technologies and how their combination is used to provide useful information to our farmers
Originalité / perspective
Our system offers a consistent and resilient approach to deal with large amount of weather data coming from both connected stations and forecasts providers. Thanks to a network of 30000 stations, we started a long-term work on providing accurate ultra-local weather data and forecasts with models specifically tailored for agricultural use cases. By leveraging distributed computing, we aim at extending our network to offer virtual stations based on dense measurements and refine forecasts to account for sub-grid variations.