Welcome to The Long View—where we peruse the news of the week and strip it to the essentials. Let’s work out what really matters.
This week: Data centers cause climate change, and Meta is rolling out Precision Time Protocol.
1. Doin’ DevOps Warms the Planet
First up this week: Data centers are making drought conditions worse. So says a Virginia Tech prof.
Analysis: Cheap chillers waste water
We all know that energy-hungry infrastructure is a worry. But climate change is causing droughts, which are becoming a problem for data centers in water-stressed locations. One key is to use chiller designs that don’t evaporate water, but the irony is that those chillers use more energy.
Diana Olick: Microsoft, Meta and others face rising drought risk to their data centers
“In ‘severe’ drought”
Drought conditions are worsening in the U.S., and that is having an outsized impact on … data centers [which] generate massive amounts of heat. … Water is the cheapest and most common method used to cool [them].
In just one day, the average data center could use 300,000 gallons of water to cool itself — the same water consumption as 100,000 homes, according to researchers at Virginia Tech. … Realizing the water risk in New Mexico, Meta … ran a pilot program on its Los Lunas data center to reduce relative humidity from 20% to 13%, lowering water consumption. It has since implemented this in all of its [data] centers.
Just over half … of the nation is in drought conditions, and over 60% of the lower 48 states. … That is a 9% increase from just one month ago. Much of the west and Midwest is in ‘severe’ drought.
David Lumb: Internet Outages Could Spread as Temperatures Rise
“Water is projected to get scarcer”
2022 is expected to be the sixth-hottest year on record as average temperatures reached 1.57 degrees Celsius above the 20th century average. We’re on track to normalize that temperature gain every year. … And it could get worse.
As our world warms up, power outages and water shortages have ravaged many parts of the planet. Data centers may be among the first to feel the … pinch. They need lots of energy to keep their servers powered, air conditioning and often water to cool the servers. … As climate change threatens energy availability, Big Tech has engaged more sustainable strategies. These include shifting more of their energy reliance to renewables like solar and wind … recycling more water and tinkering with other cooling options.
As one-fifth of the data centers in the country get their water from moderately to highly stressed regions supplying water … US cities are already getting nervous. [And] water is projected to get scarcer. But droughts are hard to evade when you also need to be as close as possible to customers you’re serving.
Horse’s mouth? Virginia Tech Assistant Professor Landon Marston:
It takes a massive amount of water to produce the electricity needed, which means that data centers indirectly use a lot of water through their large electricity demand. … When locating new data centers … environmental considerations should be included in the discussion alongside infrastructure, regulatory, workforce, client proximity, and tax considerations.
How are these data centers using water, exactly? aaarrrgggh explaaaiiinnns:
Water use is from evaporation in cooling towers, plus blow-down needed to reduce suspended solids in condenser water as water evaporates. There are technologies available to reduce blow-down some … but it has an energy penalty.
You can also design systems so that you only use evaporation mode cooling when outside temperatures are over ~100F, and use a dry-cooling mode the rest of the time. Both changes increase electricity use to reduce water use.
Sounds like it’s about money. Ohhh, u/E_Snap:
Same thing that almond and avocado farmers have done in California. They’ve convinced the state that giving them as much free water as they can take is more important than flushing your toilet at home.
But BeepBoopBeep’s idea ignores latency:
I have no clue why they don’t build data enters in the Midwest—with the largest source of water on the planet—and just run the water through external free cooling when the winters cool down the water for free. As long as the water source is not polluted and re-introduced into the original source for water, it’s fine.
2. Precision Time Protocol at Meta — Why?
Precision Time Protocol (PTP) is a telecoms thing, right? Why does Meta care about rolling it out to all its infrastructure? Surely good old NTP is accurate enough?
Analysis: Eventual consistency needs accurate time for perf
It turns out that Meta gets way better performance by decreasing the difference between nodes’ clocks. Waiting for database consistency requires padding the pause for expected time inaccuracy. Improving accuracy has a huge effect on perf.
Sebastian Moss: Meta to deploy new network timing protocol
“Uses hardware timestamping and transparent clocks”
While Network Time Protocol (NTP) allows for precision within milliseconds, PTP allows for precision within nanoseconds. … Servers need to keep accurate and coordinated time.
PTP was actually first deployed in 2002. … A Stratum network computer holds the current time and sends a time reference to any other computer on a network that asks what time it is, via a network data packet. [But] latency impacted the speed at which systems could be informed of the time.
PTP uses hardware timestamping and transparent clocks to improve consistency and symmetry, respectively. … PTP is already pushed by the telecom industry as networks transition to 5G connectivity, as its added precision and accuracy is necessary for higher bandwith 5G.
Meta’s Oleg Obleukhov and Ahmad Byagowi rent the curtain asunder:
[PTP] allows us to synchronize the systems that drive our products and services down to nanosecond precision. PTP’s predecessor, Network Time Protocol (NTP), provided us with millisecond precision, but as we scale … we need to ensure that our servers are keeping time as accurately and precisely as possible … for everyone, across time zones and around the world.
Imagine a situation in which a client writes data and immediately tries to read it. In large distributed systems, chances are high that the write and the read will land on different back-end nodes. … Adding precise and reliable timestamps on a back end and replicas allows us to simply wait until the replica catches up. … One could argue that we don’t really need PTP for that. NTP will do just fine. … But experiments we ran comparing our state-of-the-art NTP implementation and an early version of PTP showed a roughly 100x performance difference.
There are several additional use cases, including event tracing, cache invalidation, privacy violation detection improvements, latency compensation … and simultaneous execution in AI, many of which will greatly reduce hardware capacity requirements. This will keep us busy for years ahead.
In other news, the world has voted to stop adding leap seconds. Which is giving hoytech flashbacks:
In 2015 I was working at a “fintech” company and a leap second was announced. … When the previous leap second was applied, a bunch of our Linux servers had kernel panics for some reason, so needless to say everyone was really concerned about a leap second happening during trading.
I spent a month in the lab, simulating the leap second by fast forwarding clocks for all our different applications, testing different NTP implementations. … I had heaps of meetings with our partners trying to figure out what their plans were … and test what would happen if their clocks went backwards. I had to learn about how to install the leap seconds file into a bunch of software I never even knew existed, write various recovery scripts, and at one point was knee-deep in ntpd and Solaris kernel code.
The day before it was scheduled, the whole trading world agreed to halt the markets for 15 minutes before/after the leap second, so all my work was for nothing. I’m not sure what the moral is here.