ADR-001: Adopt a Database-First Architecture Using PostgreSQL¶

Status: Accepted Date: 2026-03-06

Context¶

The early exploration phase of the project relied on CSV files generated by scraping scripts.

This approach allowed quick experimentation and made it easy to demonstrate to stakeholders that structured data could be extracted from regatta result pages.

However, as the project evolved from experimentation toward a structured data platform, several limitations of a file-based workflow became clear:

Difficult traceability between raw data and processed data
Increasing complexity managing multiple CSV files
Limited ability to enforce relational structure and constraints
Poor scalability as the dataset grows
Difficult integration with APIs and future web applications

The project required a more robust and scalable system of record capable of supporting ingestion pipelines, entity normalisation, and application-layer access.

Decision¶

The system adopts a database-first architecture, using PostgreSQL as the central system of record.

In this model:

All raw ingestion data is stored in the database
Normalised canonical entities are stored in relational tables
CSV files are used only as temporary artefacts for data review or mapping workflows
All application logic and APIs operate on top of the database

PostgreSQL was selected because:

It is a mature and widely used relational database
It supports complex relational modelling
It provides excellent JSON support for raw ingestion data
It integrates easily with modern Python frameworks such as FastAPI
It supports future scalability and deployment options

Consequences¶

Adopting a database-first architecture provides several benefits:

Clear separation between raw data and canonical data
Improved traceability and reproducibility of transformations
Strong relational structure for modelling entities and relationships
Simplified integration with APIs and future applications
Better scalability as the dataset grows

However, it also introduces some trade-offs:

Increased initial complexity compared to simple CSV workflows
Need for database schema design and migrations
Additional infrastructure requirements for deployment

Overall, the database-centric architecture provides a solid foundation for building a scalable data platform and supports the long-term evolution of the project.

ADR-001: Adopt a Database-First Architecture Using PostgreSQL¶

Context¶

Decision¶

Consequences¶

Comments