In part one of this blog series, we looked at our approach to migration by leveraging the Azure data factory, tuning the performance of Data Factory Pipeline, using Parquet files and increasing computing power.
In this blog, we’ll look at some of the challenges encountered during the migration, cost-saving actions and our approach to data validation.
The vital part of the project was ensuring that the integrity and correctness of the data were maintained during the migration process. As a part of standard DataOps best practices, a thorough data validation activity was performed to make sure that the data stayed meaningful and remained of use to drive business decisions.
The approach to data validation was two-fold:
In the first approach, all the distinct data types in the SQL server that were part of the db migration were listed and sampled randomly across one row per data type. The same row was fetched from Snowflake and matched.
List of tables based on data type
select table_catalog,table_schema,table_name,COLUMN_NAME,* from
INFORMATION_SCHEMA.COLUMNS
where DATA_TYPE = 'money' and TABLE_NAME not like 'sys%'
Issues like that shown below were uncovered in this validation process and were corrected for all instances of the data type.
The second approach involved writing an automated script that accomplished the following results (More details can be found here):
Validation Started for Table: Test1
—-> Validation Passed for table Test1
Validation Started for Table: Test2
—-> Validation Passed for table Test2
Validation Started for Table: Test3
—-> Table has 0 records
Validation Started for Table: Test4
—-> Validation Failed for table Test4
Validation Started for Table: Test5
—-> Validation Passed for table Test5
============= Report =============
# Total Number of Tables Scanned: 5
# Tables with 0 records: 1
# Passed: 3
# Failed: 1
# Percentage 60.5556123
Redis is taking it in the chops, as both maintainers and customers move to the Valkey Redis fork.
GitLab Duo Chat is a natural language interface which helps generate code, create tests and access code summarizations.
Expect attacks on the open source software supply chain to accelerate, with attackers automating attacks in common open source software…
The emergence of low/no-code platforms is challenging traditional notions of coding expertise. Gone are the days when coding was an…
Datadog today published a State of DevSecOps report that finds 90% of Java services running in a production environment are…
Linux dodged a bullet. If the XZ exploit had gone undiscovered for only a few more weeks, millions of Linux…