PingCAP’s Innovative TiDB Database – Techstrong.TV

By: Mitch Ashley on October 20, 2022 Leave a Comment

PingCAP CEO Max Liu discusses PingCAPs innovative TiDB database and cloud technologies for OLTP + OLAP, and PingCAP’s commitment to open source going back more than seven years. Max shares some exciting news about the very first HTAP Summit, coming to the bay area soon. The video is below followed by a transcript of the conversation.

Mitch Ashley: Well it’s a great pleasure being joined by Max Liu. Max is cofounder and CEO of PingCAP. Welcome Max.

Max Liu: Hi Mitch, good to see you.

Ashley: Good to see you, thanks for joining us. So, we’re gonna talk about database, one of my favorite topics. Before we do that, would you introduce yourself? Tell us a little bit about you and also tell us a little bit about PingCAP.

Liu: Well, thanks for having me. I’m Max Liu, CEO and cofounder of PingCAP. And our product is TiDB, T-I-D-B which is an open source distributed database. You know, I’m a software engineer for more than 15 years and I still enjoy coding. Before I started PingCAP I spent lots of my time, you know, designing and table schema carefully and fixing those database scaling issues and trying to make coding faster. And since, so many engineers, no developers, wasting their time again and again doing the same thing.

So, we started the TiDB project to build a dream database for engineers to handle scalability from terabytes of data to petabytes of data to let developers focus on C-code and SQL, sorry, and, for their business logic. So they can enjoy, you know, sleep so you can get more sleep. Not only, you can still keep your hair, not like me, right? So, enjoy or other fun.

And, also, many people have found that our company name is interesting. PingCAP is actually composed by two parts, ping and CAP. So, CAP is the C-A-P theorem, you know? Standing for consistency, availability, and tradition tolerance. You know, it’s like an ideal case for distributive database, right? So, Ping is just the narrowed comment while you are trying to connecting to anything you will ping it, right? So, I love this theorem so much. And, we want to keep as close as possible approaching CAP. So, that is where we, you know, use the name PingCAP. That is where the name comes from.

Ashley: Makes total sense.

Liu: Yeah, it’s quite tactile, right?

Ashley: [Laughs] well, you know, it cares meaning and that’s one of the great things. And, I can sense already about your story, Max is, it’s great to talk to fellow entrepreneurs who – you know, it’s one thing you would say, “That’s a great idea, let’s go build a product for that.” It’s another thing when you’ve experienced that challenge, you’ve lived this problem, you’ve spent maybe, many hours, maybe many years, right? Kinda, being the developer but, trying to be a DBA, and a developer, and a data analyst, and a data designer. But, you’re not really, you’re a developer, right? You don’t want all the hassles of all those other jobs. So, how do you make it easier, how do you make it better for people? Which, sounds like what you’ve tackled, what you’ve done with PingCAP and with TiDB. Am I on path here?

Liu: Yeah, it’s not easy.

Ashley: [Laughs]. Well, let’s talk about the database market in general and, as I mentioned to you earlier, you know, I started doing database work long long ago. It’s not a new topic by any stretch. And, of course, there’s a lot of database products in the market. So, what’s unique and different about TiCAP, TiDB, excuse me, and PingCAP that helps you stand out? So, why would the developers say, “Oh, that’s what I want. I’d like to use TiDB ’cause that does these things for me.”?

Liu: Well, that depends on the values you provide for you customers or developers. So, there are many key values that PingCAP brings to our customers, users, and a wide range of the community and, of course, through open source. So, first of all, we are open source believers. Now, open source is the core philosophy of PingCAP which is always leading the wheel of TiDBs development. So, beside TiDB we also contribute a lot to the community.

We have donate two open source projects to CNCF apps. One is TiPV as a distributed key value storage. And, the other is Chaos Mesh so, as a chaos testing platform. You know those projects can help other software developers to build a more scalable, more resilient system. While you’re building a distributed system you’re trying to, you know, simulate, kind of, you know, disk force and network down and network recover, things like that. So, that’s why we need, you know, a chaos testing platform.

So, second, although we know the database market, you know, is so crowded, right? But, there is no such product can be used as a primary database to solve both, transactional processing and analytical processing like TiDB. Now as a foundation, TiDB is designed as a scalable, online transaction database. So, it can easily support hundreds of terabytes to, you know, petabytes of data while still serving millions of requests per second. So, what’s more you can even run realtime analysis on the same database without moving the data from, you know, TiDB to some other OLEP data warehouse.

Let’s imagine if you are building a SaaS system, there is always a operational dashboard for your customer, right? While you’re logging into any sass system you got a summary, right? That is operational dashboard. You need to generate that dashboard, realtime, on the same database. It is more natural for developers to operate their data on the same source of code. So, this technology is called HTAP, hybrid transactional processing and analytical processing. So, the concept is not new but, the implementation in a cloud native way is fresh. I think it’s a disruptive trend in the database industry to make, you know, everyone’s life and work easier.

Ashley: That’s really interesting to combine those two things together because, often times you thought of the old data warehousing or data lakes or different environments to do analytics in. But, actually being able to do analytics on top of transactional applications and data on the same environment, what are some of the things you have to do to be able to handle those two different kinds of workloads? ‘Cause, they can be very different, right? We want really fast transactional responses.

Liu: Yeah.

Ashley: Sometimes, pretty complex analytical questions that we’re asking, right?

Liu: Yes, you’re right. So, we basically have two different storage engine, a row store and a column row store but with the smart optimizer on top of both row store and column row store, right? So, if there is occurring the optimizer will predict is busy OLTP query or is busy OLEP query. Or, even better it can be a hybrid query we can query, you know, just this single row from a row store and do some, you know, aggregation on the column row store. It’s a little bit technical, you know.

Ashley: No, no, I get it. So, I mean, I assume your analytical queries might tend to be more by column versus your transactional by row. Is that a good generalization? I mean, it starts there?

Liu: Yeah, yeah, exactly.

Ashley: Interesting. And, you can mix those and do hybrid or both. So, is it views into the data or is the data redundant so that they can handle different loads on the two types of storage or is it just views into the same data?

Liu: So basically, we have a replication algorithm. It is called a raft. So, we use raft to replicate data and store them both as a row store and as a column row store so, you have two copy, right? So, then you can design a optimizer to choose what kind of data, which piece should I use, right?

Ashley: So that –

Liu: Just by, basically, replicating them, you know.

Ashley: All that admin work to set up your analytics environment. Essentially, you get that with the database, right? It comes with it?

Liu: Yeah, but –

Ashley: You’re doing the replication for them.

Liu: Yeah. From the user perspective, they operate on the same database. There is no – I don’t need to build my skill set, you know? Like, I need to know how to do manual shouting for a OLTP database and using some kind of ETL tools to load the data to OLTP data warehouse. And, I need to learn different kind of cycle, different kind of, you know, query optimization, you know, for two database to optimize it, right? And, so many things you need a totally different skill set for it.

Ashley: Tell us a bit about the cloud part of this strategy. So, are you primarily or only working in the cloud, do you also work on premise in customers own data centers? Where does TiDB live?

Liu: This is a good question. We invest a lot into TiDB cloud. TiDB cloud is a cloud service on top of TiDB, which allows us to, you know, provide, you know, a faster attempt to value for our customers. And meanwhile, reduce the burden of maintenance. You know, maintaining a distributing system is kind of a pain, right? So, just like other open source infrastructure companies such as Elastic, Confluence and et cetera. So, as a team behind, you know, TiDB we treat the relationship between open source and cloud strategy seriously. So, I would say that PingCAP will ensure the core components of TiDB 100 open source. And, without any functional loss. We actually achieve this by three actions.

First, we build an active open source community. You know, TiDB is backed by more than 800 contributors across multiple countries and industries. We actually built a demo on top of TiDB cloud which is called OSSinside.io. So, everyone can easily check any open source GitHub repositories with details of stats, contributions, commits, and you can even compare to different projects and so on and so forth. And, for the demo itself, it is open source too. You know, as I mentioned, you know, we are open source believers, right?

So, the second, we make sure TiDB is environment agnostic. So, the goal of TiDB is to achieve consistent user experience and a multiple deployment form. So, basically you can deploy TiDB anywhere, in public cloud, in private cloud, in VNs, containers, and bare metal.

Third, I think this is the most important one, TiDB is designed as an open system. So, we keep investing the integration with different other ecosystem such as Kafka, Link, Spark, Snowflake, Data dog, and so on. So, this openness in the capability are also work to our call service as well.

Ashley: Very interesting. One of my questions is, are there different users for the analytics capabilities versus the high speed transaction or, do they tend to be the same groups of people? Is it primarily developers using both, or do you end up with different users for different capabilities?

Liu: Well, they’re, kind of, different users. Database is so generic, you know, for any digital native business company they all have a database, they have different user scenarios. Not for those huge companies, big companies, they enjoy the scalability of OLTP features. So, they don’t need to worry about how to scale my system. Those big companies, especially for internet companies, they have a big dev team, right? They have a database team. So, the database team care about the scalability of the database but, the big data system team, they care about the analysis of big data.

But, for those, you know, medium company and startup, they just want a single database and they can handle everything. So, I don’t need to, you know, hire more engineers, I have no resource, you know, for those small companies, medium companies. I have no resource to hire, to build to different large team for OLTP database for big data, right? So, they want a single technology, a single database to solve everything. So, it’s different.

Ashley: Makes sense. You mentioned distributed database then also, since you can your own data replication across the environments, the column and row environments, I assume you can also do distribution across different locations in cloud providers or a cloud provider. So, if you want it distributed closer to the edge or, you know, closer to the data center or different geographic locations that’s also part of TiDB, is that correct?

Liu: Yeah. You can deploy TiDB to different zones. In all this is basically a default capability for distributed database. If you don’t have this kind of capability nobody’s going to use it, right? But, for edge functions, edge scenarios, currently, we don’t have the ability to support that. It’s just the, everything on cloud or deployed by yourself.

Ashley: I guess, as the cloud comes closer to the edge you can be on the edge that way, right?

Liu: Yeah. The open source database TiDB chose to compatible with mass scale protocol. So, all of those mass scale users, they don’t need to, you know, do lots of migrations, right? They can simply, you know, just move the data to TiDB and everything just works. And you know, mass scale is a TCP protocol, right? So, usually if you are using some kind of scenarios, edge functions and they are talking about using a HTTP protocol. So, you need a kind of, proxy and to route the request into, you know, TiDB cloud.

Ashley: Interesting work. Tell me a little bit about the open source versus the commercial version. What are the differences in the product? Are you doing newer features, kind of, things that are more experimental, you’re trying out a market first in the open source or, are there more management capabilities in the paid for version? How do you distinguish the two?

Liu: Well, actually there are three different versions. Let me explain a little bit more. So, first one is TiDB community edition. So, it has all the core capabilities so that developers can enjoy, you know, the latest TiDB features and able to contribute back. So, you can even get a nightly version, every day to enjoy the new features, right?

The second one, and for sure, it’s TiDB cloud. So, the new features are first released in the community version and after, you know, validation and polishing by a large number of community users they will come back into TiDB cloud. And then, what’s next is TiDB long term support version. So, this is kind of, our most stable features and professional service support by PingCAP provide for enterprise users. So, this kind of users they might not be interested in the new features immediately. I will wait, right, wait until it’s extremely stable. Maybe, a year later I will use it in production, right?

Ashley: Makes sense for maybe a finance company, someone in finance industry or manufacturing.

Liu: Yeah, you’re right. Especially for those banks. So, this release model actually helps us to get feedback faster from, you know, day to day operations of TiDB cloud by ourself and from, you know, the community. And, that also creates a faster loopback to TiDB’s roadmap. So, the verified and polished features will be posted to enterprise users through LTS release faster. This is kind of like, you know a flywheel, right? So, we can drive a faster time to value for both community users and enterprise users no matter how TiDB is deployed. So, to summarize, we have three release to different deployments but, a consist user experience in the process.

Ashley: Excellent. Well, I wish we had more time, love to hear more about it. Hope you’ll come back. Where can folks find out more, download the open source, or try out the cloud version? How can they check things out with you?

Liu: Well, thank you Mitch. Good to meet you, talk to you. Thank you.

Ashley: You bet. And so, folks can go to what, PingCAP.com? That correct? Your website?

Liu: Oh, oh, oh, one thing. So, I’m very excited to share that we will host the very first HTAP summit on November 1st. Yeah, you might be in the area. It’s very, you know, meaningful value computer history museum where I enjoyed a lot with so many historical moments and memories. So, at the summit we will, you know, be joined by many dev industry leaders and worldwide developers. And, also, includes some of important customers from Asia, for Europe, from the United States to discuss disruption and innovation. And, of course, all of you are more than welcome to check PingCAP.com on the HTAP summit page. And, I do look forward, you know, to connecting all of you in November.

Ashley: Well, excellent. I hope you have a great conference on the first of November and folks head out.

Liu: Oh, thank you very much.

Ashley: As well as head over to PingCAP and check out TiDB. Thanks again Max, we appreciate you being with us. Max Liu who is cofounder and CEO of PingCAP. Thanks again Max.

Liu: Thanks Mitch.

Mitch Ashley

« Cisco Unveils 800G Networking Platform to Advance DataOps

Fire at Data Center Causes Chaos | 20% Costlier Cloud »