DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DataOps
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • Techstrong.tv Podcast
    • Techstrong.tv Video Podcast
    • Techstrong.tv - Twitch
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Content
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Techstrong TV
    • Techstrong.tv Podcast
    • Techstrong.tv Video Podcast
    • Techstrong.tv - Twitch
  • Media Kit
  • About
  • Sponsor
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DataOps
  • DevSecOps
  • DevOps Onramp
  • Platform Engineering
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps
    • ROELBOB
Hot Topics
  • Azure Migration Strategy: Tools, Costs and Best Practices
  • Blameless Integrates Incident Management Platform With Opsgenie
  • OpenAI Hires 1,000 Low Wage Coders to Retrain Copilot | Netflix Blocks Password Sharing
  • Red Hat Brings Ansible Automation to Google Cloud
  • Three Trends That Will Transform DevOps in 2023

Home » Blogs » Database Migration From Microsoft Azure to Snowflake on AWS: Part 2

Database Migration From Microsoft Azure to Snowflake on AWS: Part 2

Avatar photoBy: Amol Gawas on May 3, 2022 Leave a Comment

In part one of this blog series, we looked at our approach to migration by leveraging the Azure data factory, tuning the performance of Data Factory Pipeline, using Parquet files and increasing computing power. 

In this blog, we’ll look at some of the challenges encountered during the migration, cost-saving actions and our approach to data validation.

TechStrong Con 2023Sponsorships Available

Challenges Encountered 

  • Data Factory export failed for tables if the column name had spaces or special characters. In such cases, aliases were used (replacing spaces with an underscore) for column names and the query method was used as a source.
  • Data Factory export failed for the column data type “Geography” and had to be cast to varchar and then exported to Blob.
  • Some column names in SQL server were keywords in Snowflake, which didn’t allow it to migrate. For example, ‘LOCALTIME’ is a column name in SQL but is also a keyword in Snowflake, and it is addressed using inverted commas as escape characters (“”) while creating the Snowflake table.
  • During ingestion into Snowflake, the flat files were named after tables, listing files based on the name pattern. For example, if we wanted to list the file with table name test_1, Snowflake would list all the files starting with the name test_1. It is essential to make sure that each file name is unique.
  • If the file to be ingested was incorrect with respect to the table, Snowflake did not return any error messages or warnings. It simply ingested the file and put NULL wherever column mapping was not found.
  • While creating the tables, if the column name had spaces, escape characters were used so that the same spaces could be maintained. But for ingestion of such columns, the column mapping from a flat file would be the same aliases that were used in the source query.
  • Triggers and compute columns were not supported in Snowflake and had to be eliminated while creating the tables.
  • Also, data types that did not exist in the Snowflake schema had to be changed, keeping in mind the correctness of data.

Cost-Saving Actions

  • Use of the Parquet file facilitated a compression rate of up to 75%. The 800GB of data was compressed to 234GB in multiple files which reduced the storage cost of Blobs.
  • Detailed POC and analysis discovered that Snowflake ingestion was optimal with Small for moderate sized tables and Medium for large sized tables which kept Snowflake costs in check.
  • Self-hosted IR saved on the data pipeline operating cost as the compute power of Azure VM was used.

Cost Estimation 

 

Data Validation

The vital part of the project was ensuring that the integrity and correctness of the data were maintained during the migration process. As a part of standard DataOps best practices, a thorough data validation activity was performed to make sure that the data stayed meaningful and remained of use to drive business decisions.

The approach to data validation was two-fold:

In the first approach, all the distinct data types in the SQL server that were part of the db migration were listed and sampled randomly across one row per data type. The same row was fetched from Snowflake and matched.

List of tables based on data type

select table_catalog,table_schema,table_name,COLUMN_NAME,* from
INFORMATION_SCHEMA.COLUMNS
where DATA_TYPE = 'money' and TABLE_NAME not like 'sys%'

Issues like that shown below were uncovered in this validation process and were corrected for all instances of the data type.

The second approach involved writing an automated script that accomplished the following results (More details can be found here):

  • Script touched each table of each database
  • A set of five records were randomly sampled from both sides and matched
  • If the tables were empty, it was recorded as tables with zero records
  • If the sample sets did not match the script, it wrangled the sets and sorted them one column at a time. If the sample matched the script, it was recorded. This wrangle took care of the scenario wherein the first n number of columns had duplicate records
  • Any failed validation attempt would be visible in a generated log report:

Validation Started for Table: Test1

—-> Validation Passed for table Test1

Validation Started for Table: Test2

—-> Validation Passed for table Test2

Validation Started for Table: Test3

—-> Table has 0 records

Validation Started for Table: Test4

—-> Validation Failed for table Test4

Validation Started for Table: Test5

—-> Validation Passed for table Test5

============= Report =============

# Total Number of Tables Scanned: 5

# Tables with 0 records: 1

# Passed: 3

# Failed: 1

# Percentage 60.5556123

Recent Posts By Amol Gawas
  • Database Migration from Microsoft Azure to Snowflake on AWS: Part 1
Avatar photo More from Amol Gawas
Related Posts
  • Database Migration From Microsoft Azure to Snowflake on AWS: Part 2
  • Redgate extends DevOps to SQL Server databases with Microsoft Visual Studio 2017
  • Azure vs. AWS for the Enterprise
    Related Categories
  • Blogs
  • Business of DevOps
  • DevOps Practice
  • DevOps Toolbox
  • Enterprise DevOps
    Related Topics
  • AWS
  • cloud
  • database
  • database migration
  • Snowflake
  • SQL server
Show more
Show less

Filed Under: Blogs, Business of DevOps, DevOps Practice, DevOps Toolbox, Enterprise DevOps Tagged With: AWS, cloud, database, database migration, Snowflake, SQL server

« Teleport Raises $110 Million Series C at $1.1 Billion Valuation Led by Bessemer Venture Partners with Participation from New Investor Insight Partners
vFunction Launches the First AI-Based Modernization Assessment Solution to Quantify and Identify the Root Cause of Technical Debt in Applications »

Techstrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Automating Day 2 Operations: Best Practices and Outcomes
Tuesday, February 7, 2023 - 3:00 pm EST
Shipping Applications Faster With Kubernetes: Myth or Reality?
Wednesday, February 8, 2023 - 1:00 pm EST
Why Current Approaches To "Shift-Left" Are A DevOps Antipattern
Thursday, February 9, 2023 - 1:00 pm EST

Sponsored Content

The Google Cloud DevOps Awards: Apply Now!

January 10, 2023 | Brenna Washington

Codenotary Extends Dynamic SBOM Reach to Serverless Computing Platforms

December 9, 2022 | Mike Vizard

Why a Low-Code Platform Should Have Pro-Code Capabilities

March 24, 2021 | Andrew Manby

AWS Well-Architected Framework Elevates Agility

December 17, 2020 | JT Giri

Practical Approaches to Long-Term Cloud-Native Security

December 5, 2019 | Chris Tozzi

Latest from DevOps.com

Azure Migration Strategy: Tools, Costs and Best Practices
February 3, 2023 | Gilad David Maayan
Blameless Integrates Incident Management Platform With Opsgenie
February 3, 2023 | Mike Vizard
OpenAI Hires 1,000 Low Wage Coders to Retrain Copilot | Netflix Blocks Password Sharing
February 2, 2023 | Richi Jennings
Red Hat Brings Ansible Automation to Google Cloud
February 2, 2023 | Mike Vizard
Three Trends That Will Transform DevOps in 2023
February 2, 2023 | Dan Belcher

TSTV Podcast

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays

GET THE TOP STORIES OF THE WEEK

Most Read on DevOps.com

New Relic Bolsters Observability Platform
January 30, 2023 | Mike Vizard
OpenAI Hires 1,000 Low Wage Coders to Retrain Copilot | Netflix Blocks Password Sharing
February 2, 2023 | Richi Jennings
Jellyfish Adds Tool to Visualize Software Development Workflows
January 31, 2023 | Mike Vizard
Cisco AppDynamics Survey Surfaces DevSecOps Challenges
January 31, 2023 | Mike Vizard
Five Great DevOps Job Opportunities
January 30, 2023 | Mike Vizard
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2023 ·Techstrong Group, Inc.All rights reserved.