DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • DevOps Chats
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Communities
    • AWS Community Hub
    • CloudBees
    • IT as Code
    • Rocket on DevOps.com
    • Traceable on DevOps.com
    • Quali on DevOps.com
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Digital Anarchist
  • Media Kit
  • About
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DevSecOps
  • Leadership Suite
  • Practices
  • ROELBOB
  • Low-Code/No-Code
  • IT as Code
  • More Topics
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps

Home » Blogs » Database Migration From Microsoft Azure to Snowflake on AWS: Part 2

Snowflake Aryaka cloud security migration

Database Migration From Microsoft Azure to Snowflake on AWS: Part 2

By: Amol Gawas on May 3, 2022 Leave a Comment

In part one of this blog series, we looked at our approach to migration by leveraging the Azure data factory, tuning the performance of Data Factory Pipeline, using Parquet files and increasing computing power. 

In this blog, we’ll look at some of the challenges encountered during the migration, cost-saving actions and our approach to data validation.

DevOps/Cloud-Native Live! Boston

Challenges Encountered 

  • Data Factory export failed for tables if the column name had spaces or special characters. In such cases, aliases were used (replacing spaces with an underscore) for column names and the query method was used as a source.
  • Data Factory export failed for the column data type “Geography” and had to be cast to varchar and then exported to Blob.
  • Some column names in SQL server were keywords in Snowflake, which didn’t allow it to migrate. For example, ‘LOCALTIME’ is a column name in SQL but is also a keyword in Snowflake, and it is addressed using inverted commas as escape characters (“”) while creating the Snowflake table.
  • During ingestion into Snowflake, the flat files were named after tables, listing files based on the name pattern. For example, if we wanted to list the file with table name test_1, Snowflake would list all the files starting with the name test_1. It is essential to make sure that each file name is unique.
  • If the file to be ingested was incorrect with respect to the table, Snowflake did not return any error messages or warnings. It simply ingested the file and put NULL wherever column mapping was not found.
  • While creating the tables, if the column name had spaces, escape characters were used so that the same spaces could be maintained. But for ingestion of such columns, the column mapping from a flat file would be the same aliases that were used in the source query.
  • Triggers and compute columns were not supported in Snowflake and had to be eliminated while creating the tables.
  • Also, data types that did not exist in the Snowflake schema had to be changed, keeping in mind the correctness of data.

Cost-Saving Actions

  • Use of the Parquet file facilitated a compression rate of up to 75%. The 800GB of data was compressed to 234GB in multiple files which reduced the storage cost of Blobs.
  • Detailed POC and analysis discovered that Snowflake ingestion was optimal with Small for moderate sized tables and Medium for large sized tables which kept Snowflake costs in check.
  • Self-hosted IR saved on the data pipeline operating cost as the compute power of Azure VM was used.

Cost Estimation 

 

Data Validation

The vital part of the project was ensuring that the integrity and correctness of the data were maintained during the migration process. As a part of standard DataOps best practices, a thorough data validation activity was performed to make sure that the data stayed meaningful and remained of use to drive business decisions.

The approach to data validation was two-fold:

In the first approach, all the distinct data types in the SQL server that were part of the db migration were listed and sampled randomly across one row per data type. The same row was fetched from Snowflake and matched.

List of tables based on data type

select table_catalog,table_schema,table_name,COLUMN_NAME,* from
INFORMATION_SCHEMA.COLUMNS
where DATA_TYPE = 'money' and TABLE_NAME not like 'sys%'

Issues like that shown below were uncovered in this validation process and were corrected for all instances of the data type.

The second approach involved writing an automated script that accomplished the following results (More details can be found here):

  • Script touched each table of each database
  • A set of five records were randomly sampled from both sides and matched
  • If the tables were empty, it was recorded as tables with zero records
  • If the sample sets did not match the script, it wrangled the sets and sorted them one column at a time. If the sample matched the script, it was recorded. This wrangle took care of the scenario wherein the first n number of columns had duplicate records
  • Any failed validation attempt would be visible in a generated log report:

Validation Started for Table: Test1

—-> Validation Passed for table Test1

Validation Started for Table: Test2

—-> Validation Passed for table Test2

Validation Started for Table: Test3

—-> Table has 0 records

Validation Started for Table: Test4

—-> Validation Failed for table Test4

Validation Started for Table: Test5

—-> Validation Passed for table Test5

============= Report =============

# Total Number of Tables Scanned: 5

# Tables with 0 records: 1

# Passed: 3

# Failed: 1

# Percentage 60.5556123

Recent Posts By Amol Gawas
  • Database Migration from Microsoft Azure to Snowflake on AWS: Part 1
More from Amol Gawas
Related Posts
  • Database Migration From Microsoft Azure to Snowflake on AWS: Part 2
  • SQL Server MDF to NDF Distribution
  • Avoiding Database Migration Pitfalls with Continuous Software Delivery
    Related Categories
  • Blogs
  • Business of DevOps
  • DevOps Practice
  • DevOps Toolbox
  • Enterprise DevOps
    Related Topics
  • AWS
  • cloud
  • database
  • database migration
  • Snowflake
  • SQL server
Show more
Show less

Filed Under: Blogs, Business of DevOps, DevOps Practice, DevOps Toolbox, Enterprise DevOps Tagged With: AWS, cloud, database, database migration, Snowflake, SQL server

Sponsored Content
Featured eBook
The State of Open Source Vulnerabilities 2020

The State of Open Source Vulnerabilities 2020

Open source components have become an integral part of today’s software applications — it’s impossible to keep up with the hectic pace of release cycles without them. As open source usage continues to grow, so does the number of eyes focused on open source security research, resulting in a record-breaking ... Read More
« Teleport Raises $110 Million Series C at $1.1 Billion Valuation Led by Bessemer Venture Partners with Participation from New Investor Insight Partners
vFunction Launches the First AI-Based Modernization Assessment Solution to Quantify and Identify the Root Cause of Technical Debt in Applications »

TechStrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Accelerating Continuous Security With Value Stream Management
Monday, May 23, 2022 - 11:00 am EDT
The Complete Guide to Open Source Licenses 2022
Monday, May 23, 2022 - 3:00 pm EDT
Building a Successful Open Source Program Office
Tuesday, May 24, 2022 - 11:00 am EDT

Latest from DevOps.com

DevSecOps Deluge: Choosing the Right Tools
May 20, 2022 | Gary Robinson
Managing Hardcoded Secrets to Shrink Your Attack Surface 
May 20, 2022 | John Morton
DevOps Institute Releases Upskilling IT 2022 Report 
May 18, 2022 | Natan Solomon
Creating Automated GitHub Bots in Go
May 18, 2022 | Sebastian Spaink
Is Your Future in SaaS? Yes, Except …
May 18, 2022 | Don Macvittie

Get The Top Stories of the Week

  • View DevOps.com Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Download Free eBook

The 101 of Continuous Software Delivery
New call-to-action

Most Read on DevOps.com

Why Over-Permissive CI/CD Pipelines are an Unnecessary Evil
May 16, 2022 | Vladi Sandler
Apple Allows 50% Fee Rise | @ElonMusk Fans: 70% Fake | Micro...
May 17, 2022 | Richi Jennings
Making DevOps Smoother
May 17, 2022 | Gaurav Belani
DevOps Institute Releases Upskilling IT 2022 Report 
May 18, 2022 | Natan Solomon
Creating Automated GitHub Bots in Go
May 18, 2022 | Sebastian Spaink

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.