Tag Archives: Storage & Databases

Announcing Google Cloud Spanner as a Vault storage backend



HashiCorp Vault is a powerful open source tool for secrets management, popular with many Google Cloud Platform (GCP) customers today. Vault provides "secret management as a service," acting as a static secret store for encrypted key-value pairs; a secret generation tool to dynamically generate on-the-fly credentials; and pass-through encryption service so applications do not need to roll their own encryption. We strive to make Google Cloud an excellent platform on which to operationalize Vault.

Using Vault will require up-front configuration choices such as which Vault storage backend to use for data persistence. Some storage backends support high availability while others are single tenant; some operate entirely in your own datacenter while others require outbound connections to third-party services; some require operational knowledge of the underlying technologies, while others work without configuration. These options required you to consider tradeoffs across issues such as consistency, availability, scalability, replication, operationalization and institutional knowledge . . . until now.

Today we're pleased to announce Cloud Spanner as a storage backend for HashiCorp Vault. Building on the scalability and consistency of Google Cloud Spanner, Vault users gain all the benefits of a traditional relational database, the scalability of a globally-distributed data store and the availability (99.999% SLA for multi-region configurations) of a fully managed service.

With support for high-performance transactions and global consistency, using Cloud Spanner as a Vault storage backend brings a number of features and benefits:
  • High availability - In addition to Cloud Spanner's built-in high availability for data persistence, the Vault storage backend also supports running Vault in high availability mode. By default, Vault runs as a single tenant, relying on the storage backend to provide distributed locking and leader election. Cloud Spanner's global distribution and strong, consistent transactions allow for a highly available Vault cluster with just a single line of configuration.
  • Transactional support - Vault backends optionally support batch transactions for update and delete operations. Without transactional support, large operations—such as deleting an entire prefix or bootstrapping a cluster—can result in hundreds of requests. This can bottleneck the system or overload the underlying storage backend. The Cloud Spanner Vault storage backend supports Vault's transactional interface, meaning it collects a batch of related update/delete operations and issues a single API call to Spanner. Not only does this reduce the number of HTTP requests and networking overhead, but it also ensures a much speedier experience for bulk operations.
  • Enterprise-grade security - Cloud Spanner follows the same security best practices as other Google products. Data stored at rest in Cloud Spanner is encrypted by default, and Cloud Spanner uses IAM to provide granular permission management. Google’s infrastructure has many security differentiators, including backing by Google’s custom-designed security chip Titan, and Google’s private network backbone.
  • Google supported - This backend was designed and developed by Google developers, and is available through the Google open-source program. It's open for collaboration to the entire Vault community with the added benefit of support from the Google engineering teams.

Getting started


To get started, download and install the latest version of HashiCorp Vault. The Google Cloud Spanner Vault storage backend was added in Vault 0.9.4 (released on February 20, 2018), so ensure you're running Vault 0.9.4 or later before you continue.

Next, create a Cloud Spanner instance and schema for storing our Vault data using the gcloud CLI tool. You can also create the instance and the schema using the web interface or API directly:

$ gcloud spanner instances create my-instance \
  --config=nam3 \
  --description=my-instance \
  --nodes=3

$ gcloud spanner databases create my-database --instance=my-instance

$ gcloud spanner databases ddl update my-database --instance=my-instance --ddl="$(cat <

Next, create a Vault configuration file with the Google Cloud Spanner storage backend configuration:

# config.hcl
storage "spanner" {
  database   = "projects/my-default-project/instances/my-instance/databases/my-database"
}

Start Vault with the configuration file. This example uses Vault's built-in development mode, which does not represent best practices or a production installation, but it's the fastest way to try the new Cloud Spanner Vault storage backend.

$ export VAULT_ADDR=http://127.0.0.1:8200
$ sudo vault server -dev -config=config.hcl

During this process, Vault authenticates and connects to Cloud Spanner to populate the data storage layer. After a few seconds, you can view the table data in the web interface and see that data has been populated. Vault is now up-and-running. Again, this is not a production-grade Vault installation. For more details on a production-grade Vault installation, please read the Vault production hardening guide. You can now create, read, update and delete secrets:

$ vault write secret/my-secret foo=bar

To learn more about the backend configuration options, read the HashiCorp Vault Google Cloud Spanner storage backend documentation. To learn more about Google Cloud Spanner, check out the Google Cloud Spanner documentation.


Toward a great Vault experience on GCP


The Cloud Spanner Vault storage backend enables organizations to leverage the consistency, availability, scalability, replication and security of Cloud Spanner while also supporting Vault's own high availability requirements. In addition to supporting our customers, we're delighted to continue our long-standing relationship with HashiCorp as part of our ongoing partnership. We're excited to see how this new storage backend enables organizations to be more successful with Vault on GCP. Be sure to follow us on Twitter and open a GitHub issue if you have any questions.

How Google Cloud Storage offers strongly consistent object listing thanks to Spanner



Here at Google Cloud, we're proud of the fact that, all of our listing operations are consistent across Google Cloud Storage. They're consistent across all Cloud Storage bucket locations, including regional and multi-regional buckets. They're consistent whether you're listing buckets in a project or listing objects within a bucket. If you create a Cloud Storage bucket or object, and then you follow that up with a request to fetch a list of resources, your resource will be in that response.

Why is this important? Strong list consistency is a big deal when you run data and analytics workloads. Here's an explanation from Johannes Fabian Rußek, Technical Product Owner at Spotify on why strongly consistent listing operations are so important to his business:
When you do not have consistent listings, there is a possibility of missing files. You cannot rely on the consistency of the data being read as you develop your products. Even worse, inconsistent listings lead to unforeseen issues. For example, our processing tooling will succeed reading partial data and may potentially produce seemingly valid outputs. Problems like these have a tendency to quickly propagate throughout the dependency tree. 
When that happens, in the best-case we notice the failure and recompute all datasets produced within the dependency tree. In the worst-case, the failure goes unnoticed and we create invalid reports and statistics. Considering the large amount of data pipelines we run, even with a low probability of that happening, a lack of list-consistency in cloud storage offerings was a major blocker for data-processing at Spotify.
Not all cloud storage services provide list-after-write consistency, which can cause challenges for some common use cases. Typically, when a user uploads an object to a cloud storage bucket, an unpredictable and unbounded amount of time passes before that object shows up in that bucket’s list of contents. This is a very weak consistency model called “eventual consistency.” In practice, if a user uploads a new object and then tries to find it from a browser on another computer, they might not see the object that they just uploaded. Similar issues impact workloads distributed across multiple compute nodes. By offering strong list consistency across all Google Cloud Storage objects, you avoid having to wrangle with these sorts of problems. Again, here’s Spotify’s Johannes Fabian Rußek:
We considered multiple workarounds, such as using a global consistency cache based on NFS, porting Netflix’s s3mper as well as persisting listings in a manifest file stored alongside the data. All of the considered solutions were suboptimal as they either introduced a single point of failure or required us to put significant resources into developing our own solution and adjusting our tooling. Strong list consistency in Cloud Storage means we can continue using our existing data-processing stack without modifications and without worrying that data may be corrupted.

List consistency on Cloud Storage is an essential feature for data processing at Spotify. We use a Hadoop-based data processing stack built on top of the Hadoop Distributed File System, which means we rely on its filesystem-like guarantees. Consistency is critical to running our business, and its absence creates many challenges.

Spanner: the secret to Cloud Storage strong list consistency 

Up until last year, Cloud Storage stored information about its buckets and objects (metadata) in a group of technologies powered by an internal Google technology called “Megastore." Megastore enabled Cloud Storage to provide important features like read-after-write consistency quickly and at very high volumes. But as is typical for object storage, Cloud Storage only provided eventual consistency for list-after-write operations.

Last year we migrated all of Cloud Storage metadata to Spanner, Google’s globally distributed and strongly consistent relational database. Spanner’s specialty is scaling horizontally while providing strong consistency guarantees and high availability. The same technology is available to Google Cloud customers today as Cloud Spanner.

Migrating to Spanner afforded Cloud Storage some key new features, including strong listing consistency. Other beneficiaries of strong listing consistency are MapReduce or Hadoop tasks, in which many workers need to produce separate work that will later be collected by some other processor. With strong listing consistency, workers can separately upload their results to a bucket, confident that a collecting job will always be able to collate all of the results with no exceptions. This is another example of how strong or external consistency makes application development more efficient.

Summary


Cloud Storage now provides strong consistency for the following operations in all types of buckets in all regions:

  • Read-after-write consistency: Reading an object after writing it has completed, for both new objects and overwrites of existing objects 
  • Read-after-update consistency: Reading an object’s metadata after updating its metadata 
  • Read-after-delete consistency: Reading an object will fail with a 404 immediately after it has been deleted 
  • List-after-write consistency: Fetching a list of buckets and objects will always reflect any changes that have previously completed 
  • Granting additional permissions for access to resources: For example, when you grant a new user permission to read an object, that user can immediately read the object
  • Some operations or permissions changes provide bounded consistency. Such as read operations on public cached objects, designed to achieve top cache performance. Details can be found here.

We’re excited by the new functionality that hosting Cloud Storage metadata on Spanner brings to the table. You can learn more about Cloud Storage’s consistency model in our documentation: https://cloud.google.com/storage/docs/consistency.

Why we used Elastifile Cloud File System on GCP to power drug discovery



[Editor’s note: Last year, Silicon Therapeutics talked about how they used Google Cloud Platform (GCP) to perform massive drug discovery virtual screening. In this guest post, they discuss the performance and management benefits they realized from using the Elastifile Cloud File System and CloudConnect. If you’re looking for a high-performance file system that integrates with GCP, read on to learn more about the environment they built.]

Here, at Silicon Therapeutics, we’ve seen the benefits of GCP as a platform for delivering massive scale-out compute, and have used it as an important component of our drug discovery workload. For example, in our past post we highlighted the use of GCP for screening millions of compounds against a conformational ensemble of a flexible protein target to identify putative drug molecules.

However, like a lot of high-performance computing workflows, we encounter data challenges. It turns out, there are a lot of data management and storage considerations involved with running one of our core applications, molecular dynamics (MD) simulations, which involve the propagation of atoms in a molecular system over time. The time-evolution of atoms is determined by numerically solving Newton's equations of motion, where forces between the atoms are calculated using molecular mechanics force fields. These calculations typically generate thousands of snapshots containing the atomic coordinates, each with tens of thousands of atoms, resulting in relatively large trajectory files. As such, running MD on a large dataset (e.g. the entirety of the ~100,000 structures in the Protein Data Bank (PDB)) could generate a lot of data (over a petabyte).

In scientific computing, decreasing the overall time-to-result and increasing accuracy are crucial in helping to discover treatments for illnesses and diseases. In practice, doing so is extremely difficult due to the ever-increasing volume of data and the need for scalable, high-performance, shared data access and complex workflows. Infrastructure challenges, particularly around file storage, often consume valuable time that could be better spent on core research, thus slowing the progress of critical science.

Our physics-based workflows create parallel processes that generate massive amounts of data, quickly. Supporting these workflows requires flexible, high-performance IT infrastructure. Furthermore, analyzing the simulation results to find patterns and discover new druggable targets means sifting through all that data—in the case of this run, over one petabyte. That kind of infrastructure would be prohibitively expensive to build internally.

The public cloud is a natural fit for our workflows, since in the cloud, we can easily apply thousands of parallel compute nodes to a simulation or analytics job. However, while cloud is synonymous with scalable, high-performance compute, delivering complementary scalable, high-performance storage in the cloud can be problematic. We’re always searching for simpler, more efficient ways to store, manage, and process data at scale, and found that the combination of GCP and the Elastifile cross-cloud data fabric could help us resolve our data challenges, thus accelerating the pace of research.
Our HPC architecture used Google Compute Engine CPUs and GPUs, Elastifile for distributed file storage, and Google Cloud Storage plus Elastifile to manage inactive data.

Why high-performance, scale-out file storage is crucial


To effectively support our bursty molecular simulation and analysis workflows, we needed a cloud storage solution that could satisfy three key requirements:

  • File-native primary storage - Like many scientific computing applications, the analysis software for our molecular simulations was written to generate and ingest data in file format from a file system that ensures strict consistency. These applications won’t be refactored to interface directly with object storage systems like Google Cloud Storage any time soon—hence the need for a cloud-based, POSIX-compliant file system. 
  • Scalable global namespace - Stitching together file servers on discrete cloud instances may suffice for simple analyses on small data sets. However, the do-it-yourself method comes up short as datasets grow and when you need to share data across applications (e.g., in multi-stage workflows). We needed a modern, fully-distributed, shared file system to deliver the scalable, unified namespace that our workflows require. 
  • Cost-effectiveness - Finally, when managing bursty workloads at scale, rigid storage infrastructure can be prohibitively expensive. Instead, we needed a solution that could be rapidly deployed/destroyed, to keep our infrastructure costs aligned to demand. And ideally, for maximum flexibility, we also wanted a solution that could facilitate data portability, both 1) between sites and clouds, and 2) between formats—file format for “active” processing and object format for cost-optimized “inactive” storage/archival/backup.


Solving the file storage problem


To meet our storage needs and support the evolving requirements of our research, we worked with Elastifile, whose cross-cloud data fabric was the backbone of our complex molecular dynamics workflow.

The heart of the solution is the Elastifile Cloud File System (ECFS), a software-only, distributed file system designed for performance and scalability in cloud and hybrid-cloud environments. Built to support the noisy, heterogeneous environments encountered at cloud-scale, ECFS is well-suited to primary storage for data-intensive scientific computing workflows. To facilitate data portability and policy-based controls, Elastifile file systems are exposed to applications via Elastifile “data containers.” Each file system can span any number of cloud instances within a single namespace, while maintaining the strict consistency required to support parallel, transactional applications in complex workflows.

By deploying ECFS on GCP, we were able to simplify and optimize a molecular dynamics workflow. We then applied it to 500 unique proteins as a proof of concept for the aforementioned PDB-wide screen. For this computation, we leveraged a SLURM cluster running on GCP. The compute nodes were 16 n1-highcpu-32 instances, with 8 GPUs attached to every instance for a total of 120 K80 GPUs and 512 CPUs. The storage capacity was provided by a 6 TB Elastifile data container mounted on all the compute nodes.
Defining SLURM configuration to allocate compute and storage resources
Before Elastifile, provisioning and managing storage for such workflows was a complex, manual process. We partitioned the input datasets manually and created several different clusters, each with their own disks. This was because a single large disk often led to NFS issues, specifically with large metadata. In the old world, once the outputs of each cluster were completed, we stored the disks as snapshots. For access, we spun up an instance and shared the credentials for data access. This access pattern was error-prone as well as insecure. Also, at scale, manual processes such as these are time-consuming and introduce risk of critical errors and/or data loss.

With Elastifile, however, deploying and managing storage resources was quick and easy. We simply specified the desired storage capacity, and the ECFS cluster was automatically deployed, configured and made instantly available to the SLURM-managed compute resources . . . all in a matter of minutes. Also, if we want, we can expand the cluster later for additional capacity, with the push of a button. This future-proofs the infrastructure to be able to handle dynamically changing workflow requirements and data scale. By simplifying and automating the deployment process for a cloud-based file system, Elastifile reduced the complexity and risk associated with manual storage provisioning.
Specifying desired file system attributes and policies via Elastifile's unifed management console
In addition, by leveraging Elastifile’s CloudConnect service, we were able to seamlessly promote and demote data between ECFS and Cloud Storage, minimizing infrastructure costs. Elastifile CloudConnect makes it easy to move the data to Google buckets from Elastifile’s data container, and once the data has moved, we can tear down the Elastifile infrastructure, reducing unnecessary costs.
Leveraging Elastifile's CloudConnect UI to monitor progress of data "check in" and "check out" operations between file and object storage
This data movement is essential to our operations, since we need to visualize and analyze subsets of this data on our local desktops. Moving forward, leveraging Elastifile’s combination of data performance, parallelism, scalability, shareability and portability will help us perform more—and larger-scale—molecular analyses in shorter periods of time. This will ultimately help us find better drug candidates, faster.
Visualizing the protein structure, based on the results of the molecular dynamics analyses
As a next step, we’ll work to scale the workflow to all of the unique protein structures in the PDB and perform deep-learning analysis on the resulting data to find patterns associated with proteins dynamics, druggability and tight-binding ligands.

To learn more about how Elastifile supports highly-parallel, on-cloud molecular analysis on GCP, check out this demo video and be sure to visit them at www.elastifile.com.

Why you should pick strong consistency, whenever possible



Do you like complex application logic? We don’t either. One of the things we’ve learned here at Google is that application code is simpler and development schedules are shorter when developers can rely on underlying data stores to handle complex transaction processing and keeping data ordered. To quote the original Spanner paper, “we believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.”1

Put another way, data stores that provide transactions and consistency across the entire dataset by default lead to fewer bugs, fewer headaches and easier-to-maintain application code.

Defining database consistency


But to have an interesting discussion about consistency, it’s important to first define our terms. A quick look at different databases on the market shows that not all consistency models are created equal, and that some of the related terms can intimidate even the bravest database developer. Below is a short primer on consistency:



Term
Definition
What Cloud Spanner Supports
Consistency
Consistency in database systems refers to the requirement that any given database transaction must change affected data only in allowed ways. Any data written to the database must be valid according to all defined rules.2
Cloud Spanner provides external consistency, which is strong consistency + additional properties (including serializability and linearizability). All transactions across a Cloud Spanner database satisfy this consistency property, not just those within a replica or region.
Serializability
Serializability is an isolation property of transactions, where every transaction may read and write multiple objects. It guarantees that transactions behave the same as if they had executed in some serial order. It's okay for that serial order to be different from the order in which transactions were actually run.3
Cloud Spanner provides external consistency, which is a stronger property than serializability, which means that all transactions appear as if they executed in a serial order, even if some of the reads, writes and other operations of distinct transactions actually occurred in parallel.
Linearizability
Linearizability is a recency guarantee on reads and writes of a register (an individual object). It doesn’t group operations together into transactions, so it does not prevent problems such as write skew, unless you take additional measures such as materializing conflicts.4
Cloud Spanner provides external consistency, which is a stronger property than linearizability, because linearizability does not say anything about the behavior of transactions.
Strong Consistency
All accesses are seen by all parallel processes (or nodes, processors, etc.) in the same order (sequentially)5

In some definitions, a replication protocol exhibits "strong consistency" if the replicated objects are linearizable.
The default mode for reads in Cloud Spanner is "strong," which guarantees that they observe the effects of all transactions that committed before the start of the operation, independent of which replica receives the read.
Eventual Consistency
Eventual consistency means that if you stop writing to the database and wait for some unspecified length of time, then
eventually all read requests will return the same value.6
Cloud Spanner supports bounded stale reads, which offer similar performance benefits as eventual consistency but with much stronger consistency guarantees.


Cloud Spanner, in particular, provides external consistency, which provides all the benefits of strong consistency plus serializability. All transactions (across rows, regions and continents) in a Cloud Spanner database satisfy the external consistency property, not just those within a replica. External consistency states that Cloud Spanner executes transactions in a manner that's indistinguishable from a system in which the transactions are executed serially, and furthermore, that the serial order is consistent with the order in which transactions can be observed to commit. External consistency is a stronger property than both linearizability and serializability.

Consistency in the wild


There are lots of use cases that call for external consistency. For example, a financial application might need to show users' account balances. When users make a deposit, they want to see the result of this deposit reflected immediately when they view their balance (otherwise they may fear their money has been lost!). There should never appear to be more or less money in aggregate in the bank than there really is. Another example might be a mail or messaging app: You click "send" on your message, then immediately view "sent messages" because you want to double check what you wrote. Without external consistency, the app’s request to retrieve your sent messages may go to a different replica that's behind on getting all state changes, and have no record of your message, resulting in a confusing and reduced user experience.

But what does it really mean from a technical standpoint to have external consistency? When performing read operations, external consistency means that you're reading the latest copy of your data in global order. It provides the ability to read the latest change to your data across rows, regions and continents. From a developer’s perspective, it means you can read a consistent view of the state of the entire database (not just a row or object) at any point in time. Anything less introduces tradeoffs and complexity in the application design. That in turn can lead to brittle, hard-to-maintain software and can cause innumerable maintenance headaches for developers and operators. Multi-master architectures and multiple levels of consistency are workarounds for not being able to provide the external consistency that Cloud Spanner does.

What’s the problem with using something less than external consistency? When you choose a relaxed/eventual consistency mode, you have to understand which consistency mode you need to use for each use case and have to hard code rigid transactional logic into your apps to guarantee the correctness and ordering of operations. To take advantage of "transactions" in database systems that have limited or no strong consistency across documents/objects/rows, you have to design your application schema such that you never need to make a change that involves multiple "things" at the same time. That’s a huge restriction and workarounds at the application layer are painful, complex, and often buggy.

Further, these workarounds have to be carried everywhere in the system. For example, take the case of adding a button to set your color scheme in an admin preferences panel. Even a simple feature like this is expected to be carried over immediately across the app and other devices and sessions. It needs a synchronous, strongly consistent update—or a makeshift way to obtain the same result. Using a workaround to achieve strong consistency at the application level adds a velocity-tax to every subsequent new feature—no matter how small. It also makes it really hard to scale the application dev team, because everyone needs to be an expert in these edge cases. With this example, a unit test that passes on a developer workstation does not imply it will work in production at scale, especially in high concurrency applications. Adding workarounds to an eventually consistent data store often introduces bugs that go unnoticed until they bite a real customer and corrupt data. In fact, you may not even recognize the workaround is needed in the first place.

Lots of application developers are under the impression that the performance hit of external or strong consistency is too high. And in some systems, that might be true. Additionally, we're firm believers that having choice is a good thing—as long as the database does not introduce unnecessary complexity or introduce potential bugs in the application. Inside Google, we aim to give application developers the performance they need while avoiding unnecessary complexity in their application code. To that end, we’ve been researching advanced distributed database systems for many years and have built a wide variety of data stores to get strong consistency just right. Some examples are Cloud Bigtable, which is strongly consistent within a row; Cloud Datastore, which is strongly consistent within a document or object; and Cloud Spanner, which offers strong consistency across rows, regions and continents with serializability. [Note: In fact, Cloud Spanner offers a stronger guarantee of external consistency (strong consistency + serializability), but we tend to talk about Cloud Spanner having strong consistency because it's a more broadly accepted term.]


Strongly consistent reads and Cloud Spanner


Cloud Spanner was designed from the ground up to serve strong reads (i.e., strongly consistent reads) by default with low latency and high throughput. Thanks to the unique power of TrueTime, Spanner provides strong reads for arbitrary queries without complex multi-phase consensus protocols and without locks of any kind. Cloud Spanner’s use of TrueTime also provides the added benefit of being able do global bounded-staleness reads.

Better yet, Cloud Spanner offers strong consistency for multi-region and regional configurations. Other globally distributed databases present a dilemma to developers: If they want to read the data from geographically distributed regions, they forfeit the ability to do strongly consistent reads. In these other systems, if a customer opts to have strongly consistent reads, then they forfeit the ability to do reads from remote regions.

To take maximum advantage of the external consistency guarantees that Cloud Spanner provides and to maximize your application's performance, we offer the following two recommendations:
  1. Always use strong reads, whenever possible. Strong reads, which provide strong consistency, ensure that you are reading the latest copy of your data. Strong consistency makes application code simpler and applications more trustworthy.
  2. If latency makes strong reads infeasible in some situations, then use reads with bounded-staleness to improve performance, in places where strong reads with the latest data are not necessary. Bounded-staleness semantics ensures you read a guaranteed prefix of the data (for example, within a specified period of time) that is consistent, as opposed to eventual consistency where you have no guarantees and your app can read almost anything forwards or back in time from when you queried it.
Foregoing strong consistency has some real risks. Strong reads across a database ensure that you're reading the latest copy of your data and that it maintains the referential integrity of the entire dataset, making it easier to reason about concurrent requests. Using weaker consistency models introduces the risk of software bugs and can be a waste of developer hours—and potentially—customer trust.

What about writes?


Strong consistency is even more important for write operations—especially read-modify-write transactions. Systems that don't provide strong consistency in such situations create a burden for application developers, as there's always a risk of putting your data into an inconsistent state.

Perhaps the most insidious type of problem is write skew. In write skew, two transactions read a set of objects and make changes to some of those objects. However, the modifications that each transaction makes affect what the other transaction should have read. For example, consider a database for an airline based in San Francisco. It’s the airline’s policy to always have a free plane in San Francisco, in the event that this spare plane is needed to replace another plane with maintenance problems or for some other need. Imagine two transactions that are both reserving planes for upcoming flights out of San Francisco:

Begin Transaction
SELECT * FROM Airplanes WHERE location = "San Francisco" AND Availability = "Free";
If number of airplanes is > 1:  # to enforce "one free plane" rule
Pick 1 airplane
Set its Availability to "InUse"
Commit
Else: Rollback


Without strong consistency (and, in particular, serializable isolation for these transactions), both transactions could successfully commit, thus potentially breaking our one free plane rule. There are many more situations where write skew can cause problems7.

Because Cloud Spanner was built from the ground up to be a relational database with strong, transactional consistency—even for complex multi-row and multi-table transactions—it can be used in many situations where a NoSQL database would cause headaches for application developers. Cloud Spanner protects applications from problems like write skew, which makes it appropriate for mission-critical applications in many domains including finance, logistics, gaming and merchandising.

How does Cloud Spanner differ from multi-master replication?


One topic that's often combined with scalability and consistency discussions is multi-master replication. At its core, multi-master replication is a strategy used to reduce mean time to recovery for vertically scalable database systems. In other words, it’s a disaster recovery solution, and not a solution for global, strong consistency. With a multi-master system, each machine contains the entire dataset, and changes are replicated to other machines for read-scaling and disaster recovery.

In contrast, Cloud Spanner is a truly distributed system, where data is distributed across multiple machines within a replica, and also replicated across multiple machines and multiple data centers. The primary distinction between Cloud Spanner and multi-master replication is that Cloud Spanner uses paxos to synchronously replicate writes out of region, while still making progress in the face of single server/cluster/region failures. Synchronous out-of-region replication means that consistency can be maintained, and strongly consistent data can be served without downtime, even when a region is unavailable—no acknowledged writes are delayed/lost due to the unavailable region. Cloud Spanner’s paxos implementation elects a leader so that it's not necessary to do time-intensive quorum reads to obtain strong consistency. Additionally, Cloud Spanner shards data horizontally across servers, so individual machine failures impact less data. While a node is recovering, replicated nodes on other clusters that contain that dataset can assume mastership easily, and serve strong reads without any visible downtime to the user.

A strongly consistent solution for your mission-critical data


For storing critical, transactional data in the cloud, Cloud Spanner offers a unique combination of external, strong consistency, relational semantics, high availability and horizontal scale. Stringent consistency guarantees are critical to delivering trustworthy services. Cloud Spanner was built from the ground up to provide those guarantees in a high-performance, intuitive way. We invite you to try it out and learn more.

See more on Cloud Spanner and external consistency.

1 https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf
2 https://en.wikipedia.org/wiki/Consistency_(database_systems)
3 Kleppmann, Martin. Designing Data-Intensive Applications. O’Reilly, 2017, p. 329.
4 Kleppmann, Martin. Designing Data-Intensive Applications. O’Reilly, 2017, p. 329.
5 https://en.wikipedia.org/wiki/Strong_consistency
6 Kleppmann, Martin. Designing Data-Intensive Applications. O’Reilly, 2017, p. 322.
7 Kleppmann, Martin. Designing Data-Intensive Applications. O'Reilly, 2017, p. 246.

With Multi-Region support in Cloud Spanner, have your cake and eat it too



Today, we’re thrilled to announce the general availability of Cloud Spanner Multi-Region configurations. With this release, we’ve extended Cloud Spanner’s transactions and synchronous replication across regions and continents. That means no matter where your users may be, apps backed by Cloud Spanner can read and write up-to-date (strongly consistent) data globally and do so with minimal latency for end users. In other words, your app now has an accurate, consistent view of the data it needs to support users whether they’re around the corner or around the globe. Additionally, when running a Multi-Region instance, your database is able to survive a regional failure.

This release also delivers an industry-leading 99.999% availability SLA with no planned downtime. That’s 10x less downtime (< 5min / year) than database services with four nines of availability.

Cloud Spanner is the first and only enterprise-grade, globally distributed and strongly consistent database service built specifically for the cloud that combines the benefits and familiarity of relational database semantics with non-relational scale and performance. It now supports a wider range of application workloads, from a single node in a single region to massive instances that span regions and continents. At any scale, Cloud Spanner behaves the same, delivering a single database experience.


Since we announced the general availability of Cloud Spanner in May, customers, from startups to enterprises, have rethought what a database can do, and have been migrating their mission critical production workloads to it. For example, Mixpanel, a business analytics service, moved their sharded MySQL database to Cloud Spanner to handle user-id lookups when processing events from their customers' end-users web browser and mobile devices.

No more trade-offs


For years, developers and IT organizations were forced to make painful compromises between the horizontal scalability of non-relational databases and the transactions, structured schema and complex SQL queries offered by traditional relational databases. With the increase in volume, variety and velocity of data, companies had to layer additional technologies and scale-related workarounds to keep up. These compromises introduced immense complexity and only addressed the symptoms of the problem, not the actual problem.

This summer, we announced an alliance with marketing automation provider Marketo, Inc., which is migrating to GCP and Cloud Spanner. Companies around the world rely on Marketo to orchestrate, automate, and adapt their marketing campaigns via the Marketo Engagement Platform. To meet the demands of its customers today, and tomorrow, Marketo needed to be able to process trillions of activities annually, creating an extreme-scale big data challenge. When it came time to scale its platform, Marketo did what many companies do  it migrated to a non-relational database stack. But if your data is inherently transactional, going to a system without transactions and keeping data ordered and readers consistent is very hard.

"It was essential for us to have order sequence in our app logic, and with Cloud Spanner, it’s built in. When we started looking at GCP, we quickly identified Cloud Spanner as the solution, as it provided relational semantics and incredible scalability within a managed service. We hadn’t found a Cloud Spanner-like product in other clouds. We ran a successful POC and plan to move several massive services to Cloud Spanner. We look forward to Multi-Region configurations, as they give us the ability to expand globally and reduce latencies for customers on the other side of the world" 
— Manoj Goyal, Marketo Chief Product Officer

Mission-critical high availability


For global businesses, reliability is expected but maintaining that reliability while also rapidly scaling can be a challenge. Evernote, a cross-platform app for individuals and teams to create, assemble, nurture and share ideas in any form, migrated to GCP last year. In the coming months, it will mark the next phase of its move to the cloud by migrating to a single Cloud Spanner instance to manage over 8 billion plus pieces of its customers’ notes, replacing over 750 MySQL instances in the process. Cloud Spanner Multi-Region support gives Evernote the confidence it needs to make this bold move.
"At our size, problems such as scalability and reliability don't have a simple answer, Cloud Spanner is a transformational technology choice for us. It will give us a regionally distributed database storage layer for our customers’ data that can scale as we continue to grow. Our whole technology team is excited to bring this into production in the coming months."
Ben McCormack, Evernote Vice President of Operations

Strong consistency with scalability and high performance


Cloud Spanner delivers scalability and global strong consistency so apps can rely on an accurate and ordered view of their data around the world with low latency. Redknee, for example, provides enterprise software to mobile operators to help them charge their subscribers for their data, voice and texts. Its customers' network traffic currently runs through traditional database systems that are expensive to operate and come with processing capacity limitations.
“We want to move from our current on-prem per-customer deployment model to the cloud to improve performance and reliability, which is extremely important to us and our customers. With Cloud Spanner, we can process ten times more transactions per second (using a current benchmark of 55k transactions per second), allowing us to better serve customers, with a dramatically reduced total cost of ownership." 
— Danielle Royston, CEO, Redknee

Revolutionize the database admin and management experience


Standing up a globally consistent, scalable relational database instance is usually prohibitively complex. With Cloud Spanner, you can create an instance in just a few clicks and then scale it simply using the Google Console or programmatically. This simplicity revolutionizes database administration, freeing up time for activities that drive the business forward, and enabling new and unique end-user experiences.

A different way of thinking about databases


We believe Cloud Spanner is unique among databases and cloud database services, offering a global relational database, not just a feature to eventually copy or replicate data around the world. At Google, Spanner powers apps that process billions of transactions per day across many Google services. In fact, it has become the default database internally for apps of all sizes. We’re excited to see what your company can do with Cloud Spanner as your database foundation.

Want to learn more? Check out the many whitepapers discussing the technology behind Cloud Spanner. Then, when you’re ready to get started, follow our Quickstart guide to Cloud Spanner, or Kelsey Hightower’s post How to get started with Cloud Spanner in 5 minutes.

Commvault and Google Cloud partner on cloud-based data protection and simpler “lift and shift” to the cloud



Today at Commvault Go 2017, we announced a new strategic alliance with Commvault to enable you to benefit from advanced data protection in the cloud as well as on-premises, and to make it easier to “lift-and-shift” workloads to Google Cloud Platform (GCP).

At Google Cloud, we strive to provide you with the best offerings not just to store but also to use your data. For example, if you’re looking for data protection, you can benefit from our unique Coldline class as part of Google Cloud Storage, which provides immediate access to your data at archival storage prices. You can test this for free. Try serving an image or video directly from the Coldline storage tier and it will return within milliseconds. Then there’s our partner Forsythe, whose data analytics-as-a-service offering allows you to bring your backup data from Commvault to Google Cloud Storage and then analyze it using GCP machine learning and data loss prevention services.

We work hard with our technology partners to deliver solutions that are easy to use and cost-effective. We're working with Commvault on a number of initiatives, specifically:
  • Backup to Google Cloud Storage Coldline: If you use Commvault, you can now use Coldline in addition to Regional and Nearline classes as your storage target. Check out this video to see how easy it is to set up Cloud Storage with Commvault.
  • Protect workloads in the cloud: As enterprises move their applications to Google Compute Engine, you can use the same data protection policies that you use on-premises with Commvault’s data protection software. Commvault supports a wide range of common enterprise applications from SAP, Exchange, SQL, DB2, and PostgreSQL, to big data applications such as GPFS, MongoDB, Hadoop and many more.
  • G Suite backup with Commvault: You can now use the Commvault platform to backup and recover data from G Suite applications such as Gmail and Drive.
We're excited to work with Commvault to bring more capabilities to our joint customers in the future, such as enhanced data visibility via analytics and the ability to migrate and/or recover VMs in Compute Engine for on-premises workloads.

If you’re planning to attend Commvault Go this week, visit our booth to learn more about our partnership with Commvault and how to use GCP for backup and disaster recovery with Commvault!

Cloud SQL for PostgreSQL adds high availability and replication


Cloud SQL for PostgreSQL users, we heard you loud and clear, and added support for high availability (HA) and read replicas, helping you ensure your database workloads are fault tolerant.

The beta release of high availability provides isolation from failures, and read replicas provide additional read performance requirements for demanding workloads.
"As a global retail company, who uses digital innovation and data collaboration to enrich consumers' experiences with retailers and venues, we have very high availability requirements and we trust Cloud SQL for PostgreSQL with our data." 
— Peter McInerney, Senior Director of Technical Operations at Westfield Retail Solutions
"We love Postgres and rely upon it for many production workloads. While our Compute Engine VMs running Postgres have never gone down, the added peace of mind provided by HA and read replicas combined with reduction in operations makes the decision to move to the new Cloud SQL for PostgreSQL a simple one." 
— Jason Vertrees, CTO at RealMassive
Additional enhancements include database instance cloning and higher performance instances with up to 64 vCPU cores and 416GB of RAM. Cloud SQL for PostgreSQL is also now part of the Google Cloud Business Associates Agreement (BAA) for HIPAA covered customers. And in case you missed it, we added support for 19 extensions this summer.


Understanding the high availability configuration


The high availability configuration on Cloud SQL for PostgreSQL is backed by Google's new Regional Disks, which synchronously replicate data at the block-level between two zones in a region. Cloud SQL continuously health-checks HA instances and automatically fails over if an instance is not healthy. The combination of synchronous disk replication and automatic failover provides isolation from many types of infrastructure, hardware and software failures.

What triggers a failover?

Both primary and standby instances of Cloud SQL for PostgreSQL send a heartbeat signal, which is evaluated to define an availability state for the master instance. Cloud SQL monitors the heartbeats, and if it doesn’t detect multiple heartbeats from the primary instance (and the standby instance is healthy), starts a failover operation.

What happens during and after a failover?


During failover, Cloud SQL transfers the IP address and name of the primary instance to the standby instance and initializes the database. After failover, an application resumes connection to the new master instance without needing to change its connection string because the IP address moved automatically. Regional disks, meanwhile, ensure that all previously committed database transactions, right up to the time of the failure, were persisted and available after failover.

How to create a high availability instance


Creating a new HA instance is easy, as is upgrading existing Cloud SQL for PostgreSQL single instances. When creating or editing an instance, expand the "Configuration options" and select "High availability (regional)" in the Availability section:


Quickly scale out with read replicas


Read replicas are useful for scaling out read load and for ad-hoc reporting. To create a read replica, select your primary Cloud SQL for PostgreSQL instance and click "Replicas" on the Instance details page.

By combining high availability instances and read replicas, you can build a fault-tolerant, high performance PostgreSQL database cluster:

Get started


Sign up for a $300 credit to try Cloud SQL and the rest of GCP. Start with inexpensive micro instances for testing and development, and then, when you’re ready, you can easily scale them up to serve performance-intensive applications. As a bonus, everyone gets the 100% sustained use discount during the beta period, regardless of usage.

With Cloud SQL, we still feel like we’re just getting started. We hope you’ll come along for the ride and let us know what you need to be successful at Issue Tracker and by joining the Cloud SQL discussion group. Please keep the feedback coming!

Cloud SQL for PostgreSQL updated with new extensions



Among relational databases, PostgreSQL is the open-source solution of choice for a wide range of workloads. Back in March, we added support for PostgreSQL in Cloud SQL, our managed database service, with a limited set of features and extensions. Since then, we’ve been amazed by your interest, with many of you taking the time to suggest desired PostgreSQL extensions on the Issue Tracker and the Cloud SQL discussion group. This feedback has resulted in us adding the following 19 extensions, across four categories:
  • PostGIS: better support for geographic applications
  • Data type: a variety of new data types
  • Language: enhanced functionality with new processing languages
  • Miscellaneous: text search, cryptographic capabilities and integer aggregators, to name but a few
An extension is a piece of software that adds functionality, often data types and procedural languages, to PostgreSQL itself. If you already have a Cloud SQL for PostgreSQL database instance running, you can enable one or more of these extensions.

We're continuing our journey with PostgreSQL on Cloud SQL. As we prepare for general availability, we’re working on automatic failover for high availability, read replicas, additional extensions and precise restores with point-in-time recovery. Stay tuned!

Thanks for your feedback and please keep it coming on the Issue Tracker and in the Cloud SQL discussion group! Your input helps shape the future of Cloud SQL and all Google Cloud products.

Guest post: How Seenit uses Google Cloud Platform and Couchbase to power our video collaboration platform



Editor’s Note: In this guest post, Seenit CTO Dave Starling walks us through how they use Google Cloud Platform (GCP) and Couchbase to build their innovative crowdsourced video platform.

Since we started Seenit in 2014, our goal has been to give businesses the tools to tell interesting stories through crowdsourced video. But getting there wasn’t simple. What we envisioned for Seenit didn’t exist at the time we started, challenging us to define our product architecture from ground zero. We learned a lot, which is why today I thought I’d share a little on how we’re using Couchbase and GCP to bring Seenit to life.

When we first began looking at what we wanted to build as a platform, we came up with a list of requirements for our database and cloud provider. We chose to run Couchbase on GCP because it offered us distributed architecture that’s highly scalable and available globally. Our clients are typically large enterprises, sometimes in dozens of countries all over the world. We wanted to make sure that everyone, no matter where they are, could get a consistently good user experience.

By applying Couchbase’s N1QL and Full Text Search (FTS) with Google Cloud Machine Learning APIs, our customers can easily filter submissions by objects, words or phrases. And because everything is on GCP, we can duplicate our entire platform within minutes on 12 VMs.

Here’s how it works:

  1. We use Google Compute Engine to autoscale between two and 20 servers.
  2. Google Cloud Storage allows for unified object storage and retrieval. Near-infinite scalability means the service is capable of handling everything from small applications to builds of exabyte-scale systems.
  3. Couchbase’s Full Text Search (FTS) enables us to examine all the words in every document and match them with designated criteria.
  4. Cloud Machine Learning APIs sort clips by objects, gender of speakers and sentiment. The APIs all speak the same language so communication is seamless.

Last year, when we began looking for a machine learning platform, we wanted something that would talk JSON, store JSON and search JSON. We knew a machine learning platform that did all of that would integrate nicely into our Couchbase system. TensorFlow fit our criteria. We love that it isn’t restricted. We can build our own domain-specific models and use Google tools to train them.

Although TensorFlow is an open source machine learning platform, we use it through Cloud Machine Learning Engine. It’s a fully managed service, which is great for us because that way we don’t need to build and manage our own hardware. This allows us to do a lot of manipulation and extract a lot of really interesting data. It’s fully integrated in Couchbase, especially in full text search but also into N1QL, so we can search and extract intelligence and provide value to our customers. It’s a serverless architecture with the advantage of the custom hardware that Google started doing.

It’s also been great that we feel engaged with the community and product and engineering teams. As a startup, it’s important to feel like you can stand on the shoulders of giants, so to speak. The support we get from organizations like Google and Couchbase allow us to do lots of things that we otherwise wouldn’t be able to do with the resources we had.

There’s plenty more to share, but I’ll stop here. If you want to learn more, you might want to check out the joint talk GCP Product Manager Anil Dhawan and I recently gave at Couchbase Connect.

I also recommend checking out Couchbase and other tools on Cloud Launcher. You can use free trial credits to play around and even deploy something of your own. Good luck!

How to get started with Cloud Spanner in 5 minutes




The general availability of Cloud Spanner is really good news for developers. For the first time, you have direct access to horizontally-scaling, cloud-native infrastructure that provides global transactions (think apps that involve payments, inventory, ticketing, or financial trading) and that defaults to the “gold standard” of strong/external consistency without compromising latency. Try that with either a traditional RDBMS or non-relational database.


Thanks to the GCP Free Trial that offers $300 in credits for one year, you can get your feet wet with a single-node Cloud Spanner cluster over the course of a couple weeks. Here’s how to do that, using the Spanner API via gcloud. (Click here for the console-based approach.)


  1. In Cloud Console, go to the Projects page and either create a new project or open an existing project by clicking on the project name.
  2. Open a terminal window and set your project as the default for gcloud. Do this by substituting your project ID (not project name) with the command:

gcloud config set project [MY_PROJECT_ID]

  1. Enable billing for your project.
  2. Enable the Cloud Spanner API for your project.
  3. Set up authentication and authorization (Cloud Spanner uses OAuth 2.0 out of the box) with the following command:


    gcloud auth application-default login

    API client libraries now automatically pick up the created credentials. You need to run the command only once per local user environment. (Note: This approach is suitable for local development; for production use, you’ll want to use a different method for auth.)
  4. Next, create a single-node instance:


    gcloud spanner instances create test-instance
    --config=regional-us-central1 \
    --description="Test Instance" --nodes=1

  5. Finally, create a database. To create a database called test-db:


    gcloud spanner databases create test-db --instance=test-instance

Alternatively, you can download sample data and interact with it using the language of your choice.

That’s it — you now have your very own Cloud Spanner database. Again, your GCP credit should allow you to run it cost-free for a couple weeks. From there, you can download sample data and interact with it using the language of your choice.