Category Archives: Google Cloud Platform Blog

Product updates, customer stories, and tips and tricks on Google Cloud Platform

Get higher availability with Regional Persistent Disks on Google Kubernetes Engine



Building highly available stateful applications on Kubernetes has been a challenge for a long time. As such, many enterprises have to write complex application logic on top of Kubernetes APIs for running applications such as databases and distributed file systems.

Today, we’re excited to announce the beta launch of Regional Persistent Disks (Regional PD) for Kubernetes Engine, making it easier for organizations of all sizes to build and run highly available stateful applications in the cloud. While Persistent Disks (PD) are zonal resources, and applications built on top of PDs can become unavailable in the event of a zonal failure, Regional PDs provide network-attached block storage with synchronous replication of data between two zones in a region. This approach maximizes application availability without sacrificing consistency. Regional PD automatically handles transient storage unavailability in a zone, and provides an API to facilitate cross-zone failover (learn more about this in the documentation).

Regional PD has native integration with the Kubernetes master that manages health monitoring and failover to the secondary zone in case of an outage in the primary zone. With a Regional PD, you also take advantage of replication at the storage layer, rather than worrying about application-level replication. This offers a convenient building block for implementing highly available solutions on Kubernetes Engine, and can provide cross-zone replication to existing legacy services. Internally, for instance, we used Regional PD to implement high availability in Google Cloud SQL for Postgres.

To understand the power of Regional PD, imagine you want to deploy Wordpress with MySQL in Kubernetes. Before Regional PD, if you wanted to build an HA configuration, you needed to write complex application logic, typically using custom resources, or use a commercial replication solution. Now, simply use Regional PD as the storage backend for the Wordpress and MySQL databases. Because of the block-level replication, the data is always present in another zone, so you don’t need to worry if there is a zonal outage.

With Regional PD, you can build a two-zone HA solution by simply changing the storage class definition in the dynamic provisioning specification—no complex Kubernetes controller management required!

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: repd
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  replication-type: regional-pd
  zones: us-central1-a, us-central1-b
Alternatively, you can manually provision a Regional PD using the gcloud command line tool:
gcloud beta compute disks create gce-disk-1
    --region europe-west1
    --replica-zones europe-west1-b,europe-west1-c
Regional PDs are now available in Kubernetes Engine clusters. To learn more, check out the documentation, Kubernetes guide and sample solution using Regional PD.

Better cost control with Google Cloud Billing programmatic notifications



By running your workloads on Google Cloud Platform (GCP) you have access to the tools you need to build and scale your business. At the same time, it’s important to keep your costs under control by informing users and managing their spending.

Today, we’re adding programmatic budget notifications to Google Cloud Billing, a powerful feature that helps you stick to your budget and take automatic action when your budget is out of control.

Monitor your costs
You can use Cloud Billing budget notifications with third-party or homegrown cost-management solutions, as well as Google Cloud services. For example, as an engineering manager, you can set up budget notifications to alert your entire team through Slack every time you hit 80 percent of your budget.

Control your costs
You can also configure automated actions based on the notifications to control your costs, such as selectively turning off particular resources or terminating all resources for a project. For example, as a PhD student working at a lab with a fixed grant amount, you can use budget notifications to trigger a cap to terminate your project when you use up your grant. This way, you can be confident that you won’t go over budget.

Work with your existing workflow and tools
To make it easy to get started with budget notifications, we’ve included examples of reference architectures for a few common use cases in our documentation:
  • Monitoring - listen to your PubSub notifications with Cloud Functions
  • Forward notifications to Slack - send custom billing alerts with the current spending for your budget to a Slack channel
  • Cap (disable) billing on a project - disable billing for a project and terminate all resources to make sure you don’t overspend
  • Selectively control resources - when you want to terminate expensive resources but not disable your whole environment.
Get started
You can set up programmatic budget notifications in a few simple steps:

  1. Navigate to Billing in the Google Cloud Console and create your budget.
  2. Enable Cloud Pub/Sub, then set up a Cloud Pub/Sub topic for your budget.
  3. When creating your budget you will see a new section “Manage notifications” where you can configure your Cloud Pub/Sub topic: 

  4. Set up a Cloud Function to listen to budget notifications and trigger an action.
  5. Cloud Billing sends budget notifications multiple times per day, so you will always have the most up-to-date information on your spending.
You can get started today by reading the Google Cloud Billing documentation. If you’ll be at Google Cloud Next ‘18, be sure to come by my session on Google Cloud Billing and cost control.

Google Cloud named a leader in latest Forrester Research Public Cloud Platform Native Security Wave



Today, we are happy to announce that Forrester Research has named Google Cloud as one of just two leaders in The Forrester WaveTM: Public Cloud Platform Native Security (PCPNS), Q2 2018 report, and rated Google Cloud highest in strategy. The report evaluates the native security capabilities and features of public cloud providers, such as encryption, identity and access management (IAM) and workload security.

The report finds that most security and risk professionals (S&R) now believe that “native security capabilities of large public cloud platforms actually offer more affordable and superior security than what S&R teams could deliver themselves if the workloads remained on-premises.”

The report particularly highlights that “Google has been continuing to invest in PCPNS. The platform’s security configuration policies are very granular in the admin console as well as in APIs. The platform has a large number of security certifications, broad partner ecosystem, offers deep native support for guest operating systems and Kubernetes containers, and supports auto-scaling (GPUs can be added to instances).”

In this wave, Forrester evaluated seven public cloud platforms against 37 criteria, looking at current offerings, strategy and market presence. Of the seven vendors, Google Cloud scored highest on strategy, and received the highest score possible in its strategic plans in physical security, certifications and attestations, hypervisor security, guest OS workload security, network security, and machine learning criteria.

Further, Forrester cited Google Cloud’s security roadmap. As part of our roadmap, Google Cloud continues to redefine what’s possible in the cloud with unique security capabilities like Access Transparency, Istio, Identity-Aware Proxy, VPC Service Controls, and Asylo.

“The vendor plans: to 1) provide ongoing security improvements to the admin console using device trust, location, etc., 2) implement hardware-backed encryption key management, and 3) improve visibility into the platform by launching a unified risk dashboard."

At Google, we have worked for over a decade to build a secure, scalable and flexible cloud foundation. Our belief is that if you put security first, everything else will follow. Security continues to be top of mind—from our custom hardware like our Titan chip, to data encryption both at rest and in transit by default. On this strong foundation, we offer enterprises a rich set of controls and capabilities to meet their security and compliance requirements.

You can download the full Forrester Public Cloud Platform Native Security Wave Q2 2018 report here. To learn more about GCP, visit our website, and sign up for a free trial.

Dialogflow adds versioning and other new features to help enterprises build vibrant conversational experiences



At Google I/O recently, we announced that Dialogflow has been adopted by more than half a million developers, and that number is growing rapidly. We also released new features that make it easier for enterprises to create, maintain, and improve a vibrant conversational experience powered by AI: versions and environments, an improved history tool for easier debugging, and support for training models with negative examples.

Versions and Environments BETA
Dialogflow’s new versions and environments feature gives enterprises a familiar approach for building, testing, deploying, and iterating conversational interfaces. Using this beta feature, you can deploy multiple versions of agents (which represent the conversational interface of your application, device, or bot in Dialogflow) to separate, customizable environments, giving you more control over how new features are built, tested, and deployed. For example, you may want to maintain one version of your agent in production and another in your development environment, do quality assurance on a version that contains just a subset of features, or develop different draft versions of your agent at the same time.
Creating versions and environments
Publishing a version


Managing versions within an environment

You can publish your agent either to Google Assistant production or to the Action Console's Alpha and Beta testing environments. You can even invite and manage testers of your digital agents—up to 20 for Alpha releases, and up to 200 for Beta releases!

Get started right away: Explore this tutorial to learn more about how to activate and use versions and environments in Dialogflow.

Improved conversation history tool for easier debugging
Dialogflow’s history tool now cleanly displays conversations between your users and your agent, and flags places where your agent was unable to match an intent. It also links to diagnostics via new integration with Google Stackdriver Logging so you can easily diagnose and quickly fix performance issues. We’ve also expanded the diagnostic information shown in the test console, so you can see the raw request being sent to your fulfillment webhook as well as the response.

Training with negative examples for improved precision
It can be frustrating for end-users when certain phrases trigger unwanted intents. To improve your digital agent’s precision, you can now add negative examples as training phrases for fallback intents. For example, by providing the negative example “Buy bus ticket to San Francisco” in the Default Fallback Intent for an agent that only books flights, instead of classifying that request as a purchase intent for an airplane ticket, the agent will respond by clarifying which method of transportation is supported.


Try Dialogflow today using a free credit
See the quickstart to set up a Google Cloud Platform project and quickly create a digital agent with Dialogflow Enterprise Edition. Remember, you get a $300 free credit to get started with any GCP product (good for 12 months).

Introducing Shared VPC for Google Kubernetes Engine



Containers have come a long way, from the darling of cloud-native startups to the de-facto way to deploy production workloads in large enterprises. But as containers grow into this new role, networking continues to pose challenges in terms of management, deployment and scaling.

We are excited to introduce Shared Virtual Private Cloud (VPC) to help Google Kubernetes Engine tackle scale and reliability needs of container networking for enterprise workloads.

Shared VPC for better control of enterprise network resources
In large enterprises, you often need to place different departments into different projects, for purposes of budgeting, access control, accounting, etc. And while isolation and network segmentation are recommended best practices, network segmentation can pose a challenge for sharing resources. In Compute Engine environments, Shared VPC networks let enterprise administrators give multiple projects permission to communicate via a single, shared virtual network without giving control of critical resources such as firewalls. Now, with the general availability of Kubernetes Engine 1.10, we are extending the shared VPC model to connect Kubernetes Engine clusters with versions 1.8 and above—connecting from multiple projects to a common VPC network.

Before Shared VPC, it was possible to achieve this setup in a crippled, insecure way, by bridging projects in their own VPC with Cloud VPNs. The problem with this approach was that it required N*(N-1)/2 connections to obtain full connectivity between each project. An additional challenge was that the network for each cluster wasn’t fully configurable, making it difficult for one project to communicate with another without a NAT gateway in between. Security was another concern since the organization administrator had no control over firewalls in the other projects.

Now, with Shared VPC, you can overcome these challenges and compartmentalize Kubernetes Engine clusters into separate projects, for the following benefits to your enterprise:
  • Sharing of common resources - Shared VPC makes it easy to use network resources that must be shared across various teams, such as a set of IPs within RFC 1918 shared across multiple subnets and Kubernetes Engine clusters.
  • Security - Organizations want to leverage more granular IAM roles to separate access to sensitive projects and data. By restricting what individual users can do with network resources, network and security administrators can better protect enterprise assets from inadvertent or deliberate acts that can compromise the network. While a network administrator can set firewalls for every team, a cluster administrator’s responsibilities might be to manage workloads within a project. Shared VPC provides you with centralized control of critical network resources under the organization administrator, while still giving flexibility to the various organization admins to manage clusters in their own projects.
  • Billing - Teams can use projects and separate Kubernetes Engine clusters to isolate their resource usage, which helps with accounting and budgeting needs by letting you view billing for each team separately.
  • Isolation and support for multi-tenant workloads - You can break up your deployment into projects and assign them to the teams working on them, lowering the chances that one team’s actions will inadvertently affect another team’s projects.

Below is an example of how you can map your organizational structure to your network resources using Shared VPC:

Here is what, Spotify, one of the many enterprises using Kubernetes Engine, has to say:
“Kubernetes is our preferred orchestration solution for thousands of our backend services because of its capabilities for improved resiliency, features such as autoscaling, and the vibrant open-source community. Shared VPC in Kubernetes Engine is essential for us to be able to use Kubernetes Engine with our many GCP projects."
- Matt Brown, Software Engineer, Spotify

How does Shared VPC work?
Host project contains one or more shared network resources while the service project(s) map to the different teams or departments in your organization. After setting up the correct IAM permissions for service accounts in both the host and service projects, the cluster admin can instantiate a number of Compute Engine resources in any of the service projects. This way, critical resources like firewalls are centrally managed by the network or security admin, while cluster admins are able to create clusters in the respective service projects.

Shared VPC is built on top of Alias IP. Kubernetes Engine clusters in service projects will need to be configured with a primary CIDR range (from which to draw Node IP addresses), and two secondary CIDR ranges (from which to draw Kubernetes Pod and Service IP addresses). The following diagram illustrates a subnet with the three CIDR ranges from which the clusters in the Shared VPC are carved out.

The Shared VPC IAM permissions model
To get started with Shared VPC, the first step is to set up the right IAM permissions on service accounts. For the cluster admin to be able to create Kubernetes Engine clusters in the service projects, the host project administrator needs to grant the compute.networkUser and container.hostServiceAgentUser roles in the host project, allowing the service project's service accounts to use specific subnetworks and to perform networking administrative actions to manage Kubernetes Engine clusters. For more detailed information on the IAM permissions model, follow along here.

Try it out today!
Create a Shared VPC cluster in Kubernetes Engine and get the ease of access and scale for your enterprise workloads. Don’t forget to sign up for the upcoming webinar, 3 reasons why you should run your enterprise workloads on Kubernetes Engine.

Google Kubernetes Engine 1.10 is generally available and ready for the enterprise



Today, we’re excited to announce the general availability of Google Kubernetes Engine 1.10, which lays the foundation for new features to enable greater enterprise usage. Here on the Kubernetes Engine team, we’ve been thinking about challenges such as security, networking, logging, and monitoring that are critical to the enterprise for a long time. Now, in parallel to the GA of Kubernetes Engine 1.10, we are introducing a train of new features to support enterprise use cases. These include:
  • Shared Virtual Private Cloud (VPC) for better control of your network resources
  • Regional Persistent Disks and Regional Clusters for higher-availability and stronger SLAs
  • Node Auto-Repair GA, and Custom Horizontal Pod Autoscaler for greater automation
Better yet, these all come with the robust security that Kubernetes Engine provides by default.
Let’s take a look at some of the new features that we will add to Kubernetes Engine 1.10 in the coming weeks.
Networking: global hyperscale network for applications with Shared VPC
Large organizations with several teams prefer to share physical resources while maintaining logical separation of resources between departments. Now, you can deploy your workloads in Google’s global Virtual Private Cloud (VPC) in a Shared VPC model, giving you the flexibility to manage access to shared network resources using IAM permissions while still isolating your departments. Shared VPC lets your organization administrators delegate administrative responsibilities, such as creating and managing instances and clusters, to service project admins while maintaining centralized control over network resources like subnets, routes, and firewalls. Stay tuned for more on Shared VPC support this week, where we’ll demonstrate how enterprise users can separate resources owned by projects while allowing them to communicate with each other over a common internal network.

Storage: high availability with Regional Persistent Disks
To make it easier to build highly available solutions, Kubernetes Engine will provide support for the new Regional Persistent Disk (Regional PD). Regional PD, available in the coming days, provides durable network-attached block storage with synchronous replication of data between two zones in a region. With Regional PDs, you don’t have to worry about application-level replication and can take advantage of replication at the storage layer. This replication offers a convenient building block for implementing highly available solutions on Kubernetes Engine.
Reliability: improved uptime with Regional Clusters, node auto-repair
Regional clusters, to be generally available soon, allows you to create a Kubernetes Engine cluster with a multi-master, highly-available control plane that spreads your masters across three zones in a region—an important feature for clusters with higher uptime requirements. Regional clusters also offers a zero-downtime upgrade experience when upgrading Kubernetes Engine masters. In addition to Regional Clusters, the node auto-repair feature is now generally available. Node auto-repair monitors the health of the nodes in your cluster and repairs any that are unhealthy.
Auto-scaling: Horizontal Pod Autoscaling with custom metrics
Our users have long asked for the ability to scale horizontally any way they like. In Kubernetes Engine 1.10, Horizontal Pod Autoscaler supports three different custom metrics types in beta: External (e.g., for scaling based on Cloud Pub/Sub queue length - one of the most requested use cases), Pods (e.g., for scaling based on the average number of open connections per pod) and Object (e.g., for scaling based on Kafka running in your cluster).

Kubernetes Engine enterprise adoption

Since we launched it in 2014, Kubernetes has taken off like a rocket. It is becoming “the Linux of the cloud,” according to Jim Zemlin, Executive Director of the Linux Foundation. Analysts estimate that 54 percent of Fortune 100 companies use Kubernetes across a spectrum of industries including finance, manufacturing, media, and others.
Kubernetes Engine, the first production-grade managed Kubernetes service, has been generally available since 2015. Core-hours for the service have ballooned: in 2017 Kubernetes Engine core-hours grew 9X year over year, supporting a wide variety of applications. Stateful workload (e.g. databases and key-value stores) usage has grown since the initial launch in 2016, to over 40 percent of production Kubernetes Engine clusters.
Here is what a few of the enterprises who are using Kubernetes Engine have to say.
Alpha Vertex, a financial services company that delivers advanced analytical capabilities to the financial community, built a Kubernetes cluster of 150 64-core Intel Skylake processors in just 15 minutes and trains 20,000 machine learning models concurrently using Kubernetes Engine.
“Google Kubernetes Engine is like magic for us. It’s the best container environment there is. Without it, we couldn’t provide the advanced financial analytics we offer today. Scaling would be difficult and prohibitively expensive.”
- Michael Bishop, CTO and Co-Founder, Alpha Vertex
Philips Lighting builds lighting products, systems, and services. Philips uses Kubernetes Engine to handle 200 million transactions every day, including 25 million remote lighting commands.
“Google Kubernetes Engine delivers a high-performing, flexible infrastructure that lets us independently scale components for maximum efficiency.”
- George Yianni, Head of Technology, Home Systems, Philips Lighting
Spotify the digital music service that hosts more than 2 billion playlists and gives consumers access to more than 30 million songs uses Kubernetes Engine for thousands of backend services.
“Kubernetes is our preferred orchestration solution for thousands of our backend services because of its capabilities for improved resiliency, features such as autoscaling, and the vibrant open source community. Shared VPC in Kubernetes Engine is essential for us to be able to use Kubernetes Engine with our many GCP projects.”
- Matt Brown, Software Engineer, Spotify
Get started today and let Google Cloud manage your enterprise applications on Kubernetes Engine. To learn more about Kubernetes Engine, join us for a deep dive into the Kubernetes 1.10 enterprise features in Kubernetes Engine by signing up for our upcoming webinar, 3 reasons why you should run your enterprise workloads on Kubernetes Engine.

Sharding of timestamp-ordered data in Cloud Spanner



Cloud Spanner was designed from the ground up to offer horizontal scalability and a developer-friendly SQL interface. As a managed service, Google Cloud handles most database management tasks, but it’s up to you to ensure that there are no hotspots, as described in Schema Design Best Practices and Optimizing Schema Design for Cloud Spanner. In this article, we’ll look at how to efficiently insert and retrieve records with timestamp ordering. We’ll start with the high-level guidance provided in Anti-pattern: timestamp ordering and explore the scenario in more detail with a concrete example.

Scenario

Let’s say we’re building an app that logs user activity along with timestamps and also allows users to query this activity by user id and time range. A good primary key for the table storing user activity (let’s call it LogEntries) is (UserId, Timestamp), as this gives us a uniform distribution of activity logs. Cloud Spanner inserts log entries sequentially, but they’re naturally sharded by UserId, resulting in uniform key distribution.

Table LogEntries

UserId (PK)
Timestamp (PK)
LogEntry
15b7bd1f-8473
2018-05-01T15:16:03.386257Z


Here’s a sample query to retrieve a list of log entries by user and time range:

SELECT UserId, Timestamp, LogEntry
FROM LogEntries
   WHERE UserID = '15b7bd1f-8473'
   AND Timestamp BETWEEN '2018-05-01T15:14:10.386257Z'
   AND '2018-05-01T15:16:10.386257Z';
This query takes advantage of the primary key and thus performs well.

Now let’s make things more interesting. What if we wanted to group users by the company they work for so we can segment reports by company? This is a fairly common use case for Cloud Spanner, especially with multi-tenant SaaS applications. To support this, we create a table with the following schema.
Table LogEntries


CompanyId (PK)
UserId (PK)
Timestamp (PK)
LogEntry
Acme
15b7bd1f-8473
2018-05-01T15:16:03.386257Z


And here’s the corresponding query to retrieve the log entries:

SELECT CompanyId, UserId, Timestamp, LogEntry
FROM LogEntries
   WHERE CompanyID = 'Acme'
   AND UserID = '15b7bd1f-8473'
   AND Timestamp BETWEEN '2018-05-01T15:14:10.386257Z'
   AND '2018-05-01T15:16:10.386257Z';


Here’s the query to retrieve log entries by CompanyId and time range (user field not specified):

SELECT CompanyId, UserId, Timestamp, LogEntry
FROM LogEntries
   WHERE CompanyID = 'Acme'
   AND Timestamp BETWEEN '2018-05-01T15:14:10.386257Z'
   AND '2018-05-01T15:16:10.386257Z';
To support the above query, we add a separate, secondary index. Initially, we include just two columns:

CREATE INDEX LogEntriesByCompany ON UserActivity(CompanyId, Timestamp)

Challenge: hotspots during inserts


The challenge here is that some companies may have a lot more (orders of magnitude more) users than others, resulting in a very skewed distribution of log entries. The challenge is particularly acute during inserts as described in the opening paragraph above. And even if Cloud Spanner helps out by creating additional splits, nodes that service new splits become hotspots due to uneven key distribution.

The above diagram depicts a scenario where Company B has three times more users than Company A or Company C. Therefore, log entries corresponding to Company B grow at a higher rate, resulting in the hotspotting of nodes that service the splits where Company B’s log entries are being inserted.

Hotspot mitigation

There are multiple aspects to our hotspot mitigation strategy: schema design, index design and querying. Let’s look at each of these below.

Schema and index design 

As described in Anti-pattern: timestamp ordering, we’ll use application-level sharding to distribute data evenly. Let’s look at one particular approach for our scenario: instead of (CompanyId, UserId, Timestamp), we’ll use (UserId, CompanyId, Timestamp).

Table LogEntries (reorder columns CompanyId and UserId in Primary Key)


UserId (PK)
CompanyId (PK)
Timestamp (PK)
LogEntry
15b7bd1f-8473
Acme
2018-05-01T15:16:03.386257Z


By placing UserId before CompanyId in the primary key, we can mitigate the hotspots caused by the non-uniform distribution of log entries across companies.

Now let’s look at the secondary index on CompanyId and timestamp. Since this index is meant to support queries that specify just CompanyId and timestamp, we cannot address the distribution problem by simply incorporating UserId. Keep in mind that indexes are also susceptible to hotspots and we need to design them so that their distribution is uniform.

To address this, we’ll add a new column, EntryShardId, where (in pseudo-code):
entryShardId = hash(CompanyId + timestamp) % num_shards
The hash function here could be a simple crc32 operation. Here’s a python snippet illustrating how to calculate this hash function before a log entry is inserted:
...
import datetime
import zlib
...
timestamp = datetime.datetime.utcnow()
companyId = 'Acme'
entryShardId = (zlib.crc32(companyId + timestamp.isoformat()) & 0xffffffff) % 10
...
In this case, num_shards = 10. You can adjust this value based on the characteristics of your workload. For instance, if one company in our scenario generates 100 times more log entries on average than the other companies, then we would pick 100 for num_shards in order to achieve a uniform distribution across entries from all companies.

This hashing approach essentially takes the sequential, timestamp-ordered LogEntriesByCompany index entries for a particular company and distributes them across multiple application (or logical) shards. In this case, we have 10 such shards per company, resulting from the crc32 and modulo operations shown above.

Table LogEntries (with EntryShardId added)


CompanyId (PK)

UserId (PK)

Timestamp (PK)

EntryShardId

LogEntry

‘Acme’

1

2018-05-01T15:16:03.386257Z

8


And the index:
CREATE INDEX LogEntriesByCompany ON LogEntries(EntryShardId, CompanyId, Timestamp)

Querying

Evenly distributing data using a sharding approach is great for inserts but how does it affect retrieval? Application-level sharding is no good to us if we cannot retrieve the data efficiently. Let’s look at how we would query for a list of log entries by CompanyId and time range, but without UserId:

SELECT CompanyId, UserId, Timestamp, LogEntry
FROM LogEntries@{FORCE_INDEX=LogEntriesbyCompany}
   WHERE CompanyId = 'Acme'
   AND ShardedEntryId BETWEEN 0 AND 9
   AND Timestamp > '2018-05-01T15:14:10.386257Z'
   AND Timestamp < '2018-05-01T15:16:10.386257Z'
ORDER BY Timestamp DESC;

The above query illustrates how to perform a timestamp range retrieval while taking sharding into account. By including the ShardedEntryId in the query above, we tell Spanner to ‘look’ in all 10 logical shards to retrieve the timestamp entries for CompanyId ‘Acme’ for a particular range.

Cloud Spanner is a full-featured relational database service that relieves you of most—but not all—database management tasks. For more information on Cloud Spanner management best practices, check out the recommended reading.

Anti-pattern: timestamp ordering
Optimizing Schema Design for Cloud Spanner
Best Practices for Schema Design

Kubernetes best practices: terminating with grace



Editor’s note: Today is the second installment in a seven-part video and blog series from Google Developer Advocate Sandeep Dinesh on how to get the most out of your Kubernetes environment.

When it comes to distributed systems, handling failure is key. Kubernetes helps with this by utilizing controllers that can watch the state of your system and restart services that have stopped performing. On the other hand, Kubernetes can often forcibly terminate your application as part of the normal operation of the system.

In this episode of “Kubernetes Best Practices,” let’s take a look at how you can help Kubernetes do its job more efficiently and reduce the downtime your applications experience.


In the pre-container world, most applications ran on VMs or physical machines. If an application crashed, it took quite a while to boot up a replacement. If you only had one or two machines to run the application, this kind of time-to-recovery was unacceptable.

Instead, it became common to use process-level monitoring to restart applications when they crashed. If the application crashed, the monitoring process could capture the exit code and instantly restart the application.

With the advent of systems like Kubernetes, process monitoring systems are no longer necessary, as Kubernetes handles restarting crashed applications itself. Kubernetes uses an event loop to make sure that resources such as containers and nodes are healthy. This means you no longer need to manually run these monitoring processes. If a resource fails a health check, Kubernetes automatically spins up a replacement.

Check out this episode to see how you can set up custom health checks for your services.

The Kubernetes termination lifecycle

Kubernetes does a lot more than monitor your application for crashes. It can create more copies of your application to run on multiple machines, update your application, and even run multiple versions of your application at the same time!

This means there are many reasons why Kubernetes might terminate a perfectly healthy container. If you update your deployment with a rolling update, Kubernetes slowly terminates old pods while spinning up new ones. If you drain a node, Kubernetes terminates all pods on that node. If a node runs out of resources, Kubernetes terminates pods to free those resources (check out this previous post to learn more about resources).

It’s important that your application handle termination gracefully so that there is minimal impact on the end user and the time-to-recovery is as fast as possible!

In practice, this means your application needs to handle the SIGTERM message and begin shutting down when it receives it. This means saving all data that needs to be saved, closing down network connections, finishing any work that is left, and other similar tasks.

Once Kubernetes has decided to terminate your pod, a series of events takes place. Let’s look at each step of the Kubernetes termination lifecycle.

1 - Pod is set to the “Terminating” State and removed from the endpoints list of all Services

At this point, the pod stops getting new traffic. Containers running in the pod will not be affected.

2 - preStop Hook is executed

The preStop Hook is a special command or http request that is sent to the containers in the pod.

If your application doesn’t gracefully shut down when receiving a SIGTERM you can use this hook to trigger a graceful shutdown. Most programs gracefully shut down when receiving a SIGTERM, but if you are using third-party code or are managing a system you don’t have control over, the preStop hook is a great way to trigger a graceful shutdown without modifying the application.

3 - SIGTERM signal is sent to the pod

At this point, Kubernetes will send a SIGTERM signal to the containers in the pod. This signal lets the containers know that they are going to be shut down soon.

Your code should listen for this event and start shutting down cleanly at this point. This may include stopping any long-lived connections (like a database connection or WebSocket stream), saving the current state, or anything like that.

Even if you are using the preStop hook, it is important that you test what happens to your application if you send it a SIGTERM signal, so you are not surprised in production!

4 - Kubernetes waits for a grace period

At this point, Kubernetes waits for a specified time called the termination grace period. By default, this is 30 seconds. It’s important to note that this happens in parallel to the preStop hook and the SIGTERM signal. Kubernetes does not wait for the preStop hook to finish.

If your app finishes shutting down and exits before the terminationGracePeriod is done, Kubernetes moves to the next step immediately.

If your pod usually takes longer than 30 seconds to shut down, make sure you increase the grace period. You can do that by setting the terminationGracePeriodSeconds option in the Pod YAML. For example, to change it to 60 seconds:


5 - SIGKILL signal is sent to pod, and the pod is removed


If the containers are still running after the grace period, they are sent the SIGKILL signal and forcibly removed. At this point, all Kubernetes objects are cleaned up as well.

Conclusion

Kubernetes can terminate pods for a variety of reasons, and making sure your application handles these terminations gracefully is core to creating a stable system and providing a great user experience.

Google Maps Platform now integrated with the GCP Console



Thirteen years ago, the first Google Maps mashup combined Craigslist housing data on top of our map tiles—before there was even an API to access them. Today, Google Maps APIs are some of the most popular on the internet, powering millions of websites and apps generating billions of requests per day.

Earlier this month, we introduced the next generation of our Google Maps business—Google Maps Platform—that included a series of updates to help you take advantage of new location-based features and products. We simplified our APIs into three product categories—Maps, Routes and Places—to make it easier for you to find, explore and add new features to your apps and sites. In addition, we merged our pricing plans into one pay-as-you go plan for our core products. With this new plan, you get the first $200 of monthly usage for free, so you can try the APIs risk-free.

In addition, Google Maps Platform includes simplified products, tighter integration with Google Cloud Platform (GCP) services and tools, as well as a single pay-as-you-go offering. By integrating with GCP, you can scale your business and utilize location services as you grow—we no longer enforce usage caps, just like any other GCP service.

You can also manage Google Maps Platform from Google Cloud Console—the same interface you already use to manage and monitor other GCP services. This integration provides a more tailored view to manage your Google Maps Platform implementation, so you can monitor individual API usage, establish usage quotas, configure alerts for more visibility and control, and access billing reports. All Google Maps Platform customers now receive free Google Maps Platform customer support, which you can also access through the GCP Console.

Check out the Google Maps Platform website where you can learn more about our products and also explore the guided onboarding wizard from the website to the console. We can’t wait to see how you will use Google Maps Platform with GCP to bring new innovative services to your customers.

Getting more value from your Stackdriver logs with structured data



Logs contain some of the most valuable data available to developers, DevOps practitioners, Site Reliability Engineers (SREs) and security teams, particularly when troubleshooting an incident. It’s not always easy to extract and use, though. One common challenge is that many log entries are blobs of unstructured text, making it difficult to extract the relevant information when you need it. But structured log data is much more powerful, and enables you to extract the most valuable data from your logs. Google Stackdriver Logging just made it easier than ever to send and analyze structured log data.

We’ve just announced new features so you can better use structured log data. You’ve told us that you’d like to be able to customize which fields you see when searching through your logs. You can now add custom fields in the Logs Viewer in Stackdriver. It’s also now easier to generate structured log data using the Stackdriver Logging agent.

Why is structured logging better?
Using structured log data has some key benefits, including making it easier to quickly parse and understand your log data. The chart below shows the differences between unstructured and structured log data. 

You can see here how much more detail is available at a glance:



Unstructured log data
Structured log data
Example from custom logs
...
textPayload: A97A7743 purchased 4 widgets.
...
...
jsonPayload: {
 "customerIDHash": “A97A7743”
 "action": “purchased”
 "quantity": “4”
 "item": “widgets”
}
...
Example from Nginx logs—now available as structured data through the Stackdriver logging agent
textPayload: 127.0.0.1 10.21.7.112 - [28/Feb/2018:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Chrome/66.0"
time:
1362020400 (28/Feb/2018:12:00:00 +0900)

jsonPayload: {
 "remote" : "127.0.0.1",
 "host"   : "10.21.7.112",
 "user"   : "-",
 "method" : "GET",
 "path"   : "/",
 "code"   : "200",
 "size"   : "777",
 "referer": "-",
 "agent"  : "Chrome/66.0"
}
 


Making structured logs work for you
You can send both structured and unstructured log data to Stackdriver Logging. Most logs Google Cloud Platform (GCP) services generate on your behalf, such as Cloud Audit Logging, Google App Engine logs or VPC Flow Logs, are sent to Stackdriver automatically as structured log data.

Since Stackdriver Logging also passes the structured log data through export sinks, sending structured logs makes it easier to work with the log data downstream if you’re processing it with services like BigQuery and Cloud Pub/Sub.

Using structured log data also makes it easier to alert on log data or create dashboards from your logs, particularly when creating a label or extracting a value with a distribution metric, both of which apply to a single field. (See our previous post on techniques for extracting values from Stackdriver logs for more information.)

Try Stackdriver Logging for yourself
To start using Stackdriver structured logging today, you’ll just need to install (or reinstall) the Stackdriver logging agent with the --structured flag. This also enables automatic parsing of common log formats, such as syslog, Nginx and Apache.

curl -sSO "https://dl.google.com/cloudagents/install-logging-agent.sh"
sudo bash ./install-logging-agent.sh --structured

For more information on installation and options, check out the Stackdriver structured logging installation documentation.

To test Stackdriver Logging and see the power of structured logs for yourself, you can try one of our most asked-for Qwiklab courses, Creating and alerting on logs-based metrics, for free, using a special offer of 15 credits. This offer is good through the end of May 2018. Or try our new structured logging features out on your existing GCP project by checking out our documentation.