Case study: SGN

Data Warehouse Migration to AWS

The Client

SGN (previously known as Scotia Gas Networks) is a UK gas distribution company which manages the network that distributes natural and green gas to 5.9 million homes and businesses across Scotland and the south of England.

The Challenge

SGN wanted to migrate their Data Warehouse to AWS. Large volumes of data were on a legacy on-prem database, metadata was stored within teams and as expert knowledge, reporting was time-consuming and laborious. There were thousands of lines of evolved PL/SQL code written for data wrangling. In addition, this was in the energy sector, so reporting was a safety and regulatory priority.

The Cloudwick Solution

Cloudwick proposed the solution to replace the legacy on-premise data warehouse with Amazon Redshift, a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. The solution is supported by AWS Data Pipeline to orchestrate the same transformations which were being done on-prem. The complex ETL processes are migrated to PySpark to use Spark SQL and make minimal changes to the original code. We chose that solution in order to maximize future maintainability, but we also future-proofed the work by choosing PySpark as our toolkit.

Cloudwick employed the modern CI/CD pipelines to reliably deploy the code. This allowed us to use test cases and verify that the procedures matched the original results and have confidence in the release management process. Cloudwick utilised GitLab as our source code management system, and used its CI/CD capabilities to automatically test and push passing master builds to AWS S3 for deployment.

We implemented the AWS Data Pipeline which was used to automatically provision EC2 instances with our latest required configuration, and run the latest deployed version of our PySpark code automatically. Having an automated process significantly reduced Cloudwick’s code-compile-debug cycle and resulted in a more refined end result. Database Migration Service was used to import the data into S3 from the legacy database. Infrastructure-as-Code allowed us to replicate our test environment precisely into the staging environment so our developers could work with dummy data and ensure that the deploy was reliable.

Finally, AWS CloudWatch along with AWS Simple Notification Service was used to inform the relevant data owners or responsible parties of successful data loads, or alert them to any problems.

Benefits

The database migration was done successfully, with a CI/CD process that allowed SGN to keep a strong foundation of continuous integration development process with a dependable release management process. The migration enabled SGN to perform reporting from a central location and making future scalability much less problematic.

The Next Steps

SGN is planning to ingest more data sources and perform transformations using AWS Glue, so Cloudwick is planning to have a more elaborate deploy procedure with a user acceptance phase using CodePipeline, while using CloudFormation to spin up development and test environments on demand, in order to take full advantage the CI/CD process.