DevOps.com

  • Latest
    • Articles
    • Features
    • Most Read
    • News
    • News Releases
  • Topics
    • AI
    • Continuous Delivery
    • Continuous Testing
    • Cloud
    • Culture
    • DataOps
    • DevSecOps
    • Enterprise DevOps
    • Leadership Suite
    • DevOps Practice
    • ROELBOB
    • DevOps Toolbox
    • IT as Code
  • Videos/Podcasts
    • Techstrong.tv Podcast
    • Techstrong.tv Video Podcast
    • Techstrong.tv - Twitch
    • DevOps Unbound
  • Webinars
    • Upcoming
    • On-Demand Webinars
  • Library
  • Events
    • Upcoming Events
    • On-Demand Events
  • Sponsored Content
  • Related Sites
    • Techstrong Group
    • Container Journal
    • Security Boulevard
    • Techstrong Research
    • DevOps Chat
    • DevOps Dozen
    • DevOps TV
    • Techstrong TV
    • Techstrong.tv Podcast
    • Techstrong.tv Video Podcast
    • Techstrong.tv - Twitch
  • Media Kit
  • About
  • Sponsor
  • AI
  • Cloud
  • Continuous Delivery
  • Continuous Testing
  • DataOps
  • DevSecOps
  • DevOps Onramp
  • Platform Engineering
  • Low-Code/No-Code
  • IT as Code
  • More
    • Application Performance Management/Monitoring
    • Culture
    • Enterprise DevOps
    • ROELBOB

Home » Blogs » Apache Hop Integrates DevOps and DataOps – Techstrong TV

Apache Hop Integrates DevOps and DataOps – Techstrong TV

Avatar photoBy: Mike Vizard on August 26, 2022 Leave a Comment

Matt Caster, chief solution architect for Neo4j, explains how open source Apache Hop projects advances integration of DevOps and DataOps. The video is below followed by a transcript of the conversation.

Mike Vizard: Hey, guys, thanks for the throw. We’re here with Matt Casters, who is Chief Solution Architect at Neo4j and we’re talking about a new open source project called Apache Hop, that helps you move data round more easily. Matt, welcome to the show. 

TechStrong Con 2023Sponsorships Available

Matt Casters: Thank you very much for having me, Michael.

Vizard: So walk us through what exactly is Apache Hop, what’s the problem we’re trying to solve, and it seems like we’re trying to move more data than ever, and when I was a kid people told me moving data was not necessarily a great idea, but we’re doing it anyway. So what’s the challenge and what are we trying to solve?

Casters: Oh yeah, yeah. So that’s a great observation. I remember when everything was going to be the new mainframe, do everything in one box, right? And the opposite happened; we saw more and more services on prem, in the cloud, and all mixed up. See, I think, you know, more specialized services, in the cloud, on prem, graph databases, you know, advent of containerizations, virtual machines, and containers. So it’s become like a really complicated ecosystem out there and I think a lot of organizations are struggling to find expertise to service and cater to all those needs.

I think what we’re trying to do is make the life of the developer easy by wrapping up user interfaces and easy-to-use tools around all these technologies so that you get like UMLI diagrams that you can work with and that can be easily understood by not only the experts but medium-level educated people on subjects, right? For example, if you’re working with a Kafka queue you might need to know how it works in general, but you don’t need to dive into the Java or Python APIs to get the best out of it. I mean that’s kind of it. 

So lower the maintenance costs. And this comes down to the DevOps side of things. And while we try to make the life of the developer easy we position ourselves exclusively on the side of the organization. We’re not made to just make life fun for the developer, but we really try to make the life fun for the developer, but we really try to make life for the company or the organization that uses the software better. 

So with that in mind you come to the conclusion that, okay, we need to also focus on configuration management, life cycle management, version control, unit testing, security investment, and that companies put into often many years of work of data orchestration, data integration, ETL work, right? So that is what I think Hop has been trying to focus on the last couple years by making sure that all these technologies are standard in the tool and easy to use. 

Vizard: Do you think that we’re reducing the need for specialists, ’cause back in the day you had to have somebody who knew how to program with ETL and that capability and now with this we’ve got more of a visual tool, and maybe some people still prefer CLIs and APIs, but this is designed for developers to just easily handle this task themselves without any outside intervention, right?

Casters: Yeah, that’s the lore. We fully acknowledge that it’s like a 95/5-percent kind of deal, right; you’re never going to do the last 5-percent without some scripting or coding. But solving the 95-percent is a big deal, it means that all of a sudden you open up possibilities to people that we would never have expected of using Apache Hop. Right? I’ve seen salespeople use it to extract the sales numbers from Salesforce. I would’ve never expected that in the past. It’s like, “Oh, they can figure it out.” “Oh, I can just enter the details: username, password.” “Oh, I can read this” and before you know it people start to use that. 

Vizard: Citizen developers are here with us today. One of the things about Hop is that it’s lightweight and I don’t think people appreciate just how much heavy lifting went into ETL historically. So why does it matter that this thing is lightweight and metadata-driven and what’s the impact? 

Casters: Well, at some point we saw the advent of LANDesk, you know, microservices, but also in its previous life, Kettle, before that was used was for plotting trains in more like an IOT settings. Call these Edge devices, maybe Raspberry PIs or very tiny devices. And there’s a real use case out there for a device that doesn’t have to do it or anything; it’s just trimmed down to the bare essentials for what it needs to do. Maybe it would be like measure temperature and air humidity and pressure and then send it off every 30 seconds or something, over some – using some Rescall. 

And then this whole Hop library trims down to something like 30-40 megabytes and starts up fast and uses next to no memory. That is not in contrast to saying like, oh, maybe I run it on an AWS LAN in a small services. And it also gives us the ability to create new packets, right? You can only remove them, remove functionality, but you can add functionality quite easily as well. And that is really important. We’ve also already seen people implement machine learning or libraries or Python iteration and plug-ins for Hop. So this will be our next goal, is to create a marketplace around that plug-in kernel kind of architecture. 

Vizard: Do you think that the functions of DevOps and DataOps are starting to converge more? Is this becoming more of a team sport or is DataOps just going to be folded into DevOps? What’s the relationship?

Casters: Well, what I’ve seen personally is now that we have the ability to lock everything that a workflow or pipeline does into a graph, a Neo4j graph, it makes it easy to see where the arrow was, right? Can use graph algorithms to figure out not just where an arrow was, but figure out what the execution pat was in milliseconds. So the convergence of, “Oh, the developer has an easy time figuring out what went wrong, but also on Monday morning when you come in the office and see that the Saturday night run failed for some reason, or the monthly load of the data warehouse or whatever people are doing, they don’t have to troll through millions of lines of loading texts for a large workflow, but just see straightaway what went wrong, where it went wrong, saves a lot of time. 

So is that DevOps? For sure. For sure DevOps. It’s also ease of development. There’s really like an overlap and we need to do better in providing a lot of these tools. They’re still too primitive in a lot of our software, As if locking text is enough, right? We can do so much more. [Laughs] We’re executing stuff; we know where it went wrong. Why guess? Why go through the effort? [Laughs] 

Vizard: There’s a lot of focus these days on the volume of data that people are trying to work with, but it seems to me it’s also the speed in which that data is starting to emerge more, and people are trying to work in near-real time, and the data is going to be updated. We’re moving beyond batch. So are we really prepared for not just the volume of data, but the speed at which that data is starting to emerge. And for that matter, there’s a lot more types of that data, right?

Casters: Yeah. So being from the Hop from the beginning opted to choose Beam to work as an abstraction layer so that we could execute on Spark, Flank, and Dataflow for the really fast use cases, but also, yeah, like real-time streaming, and just like playing, just streaming, Spark streaming and cloud Dataflow are really cool. And this comes along with the advent of all the streaming back-answer messaging services: Kafka, Pulsar, jewel clouds pops up. I think repertoire, right? I probably forgot like Azure. [Laughs] 

But yeah, so these services have become really popular in making it possible to create cool Hop and Spoke architectures, where you can send stuff in all directions. Yeah, so it falls under the sort of like streaming category, and you can do all sorts of cool load-it-and-see exercises there. And opening that up is quite easy for us in Hop; all the pipelines are streaming, whether it’s in batch or not. So yeah.

Vizard: So what’s your best advice to folks as they kind of look at these DevOps/DataOps challenges? What do you see as the best practice that people do the right thing? And for that matter, what kind of mistakes are people making that maybe they should avoid?

Casters: So I think when it comes to these best practices, and then we’re talking about the classical list, put everything in version control. You know, keep your passwords and usernames and everything separate. Don’t hardcode anything. Like these are not specific to ETL data orchestration, but they’re genera programming best practices – development best practices. So even if it is like graphical programming, it still applies. Same with unit testing and integration testing.

What we’ve seen in the past is that if you want to do, for example, unit testing, there’s a lot of plumbing that you need to put in place. So specialized Docker containers and so on and so on and so on. What we’ve tried to do is always look at the return investment of metadata. So what is the return investment for setting up like a lot of infrastructure and a lot of manual plumbing? It’s very low, right? So you see some benefit from a unit test like that, but it’s not great. So by building the integration testing and unit testing into Hop, into GUI, making it very easy to create, we’re saying like, “Okay, now you really don’t have any reason not to do it anymore. Right? [Laughs] Well, apart from laziness or not knowing what the value is. But at least those best practices are not different from any other development job that you might have.

Vizard: All right, so we finally reached a point where we’re going to automate the orchestration of the data, just like the rest of any other DevOps process for that matter. So it’s onwards and upwards. Matt, thanks for being on the show.

Casters: Hey, you’re welcome. It’s good to talk to you, Michael. 

Vizard: All right. Back to you guys in the studio. 

Recent Posts By Mike Vizard
  • Atlassian Extends Automation Framework’s Reach
  • GitLab Strengthens Remote DevOps Management
  • Harness Acquires Propelo to Surface Software Engineering Bottlenecks
Avatar photo More from Mike Vizard
Related Posts
  • Apache Hop Integrates DevOps and DataOps – Techstrong TV
  • Google Looks to Simplify Building of Workflows
  • DataOps: Time to Bring DevOps Ideas to Big Data
    Related Categories
  • Blogs
  • Doin' DevOps
    Related Topics
  • Apache Hop
  • data orchestration
  • devops
  • Neo4j
Show more
Show less

Filed Under: Blogs, Doin' DevOps Tagged With: Apache Hop, data orchestration, devops, Neo4j

« Plutora Releases Test Environment QuickStart Bundle – Techstrong TV
Growing Adoption of Compliance as Code and Policy as Code – Techstrong TV »

Techstrong TV – Live

Click full-screen to enable volume control
Watch latest episodes and shows

Upcoming Webinars

Evolution of Transactional Databases
Monday, January 30, 2023 - 3:00 pm EST
Moving Beyond SBOMs to Secure the Software Supply Chain
Tuesday, January 31, 2023 - 11:00 am EST
Achieving Complete Visibility in IT Operations, Analytics, and Security
Wednesday, February 1, 2023 - 11:00 am EST

Sponsored Content

The Google Cloud DevOps Awards: Apply Now!

January 10, 2023 | Brenna Washington

Codenotary Extends Dynamic SBOM Reach to Serverless Computing Platforms

December 9, 2022 | Mike Vizard

Why a Low-Code Platform Should Have Pro-Code Capabilities

March 24, 2021 | Andrew Manby

AWS Well-Architected Framework Elevates Agility

December 17, 2020 | JT Giri

Practical Approaches to Long-Term Cloud-Native Security

December 5, 2019 | Chris Tozzi

Latest from DevOps.com

Stream Big, Think Bigger: Analyze Streaming Data at Scale
January 27, 2023 | Julia Brouillette
What’s Ahead for the Future of Data Streaming?
January 27, 2023 | Danica Fine
The Strategic Product Backlog: Lead, Follow, Watch and Explore
January 26, 2023 | Chad Sands
Atlassian Extends Automation Framework’s Reach
January 26, 2023 | Mike Vizard
Software Supply Chain Security Debt is Increasing: Here’s How To Pay It Off
January 26, 2023 | Bill Doerrfeld

TSTV Podcast

On-Demand Webinars

DevOps.com Webinar ReplaysDevOps.com Webinar Replays

GET THE TOP STORIES OF THE WEEK

Most Read on DevOps.com

What DevOps Needs to Know About ChatGPT
January 24, 2023 | John Willis
Microsoft Outage Outrage: Was it BGP or DNS?
January 25, 2023 | Richi Jennings
Optimizing Cloud Costs for DevOps With AI-Assisted Orchestra...
January 24, 2023 | Marc Hornbeek
Five Great DevOps Job Opportunities
January 23, 2023 | Mike Vizard
Dynatrace Survey Surfaces State of DevOps in the Enterprise
January 24, 2023 | Mike Vizard
  • Home
  • About DevOps.com
  • Meet our Authors
  • Write for DevOps.com
  • Media Kit
  • Sponsor Info
  • Copyright
  • TOS
  • Privacy Policy

Powered by Techstrong Group, Inc.

© 2023 ·Techstrong Group, Inc.All rights reserved.