Tag Archives: Storage & Databases

Introducing Transfer Appliance: Sneakernet for the cloud era



Back in the eighties, when network constraints limited data transfers, people took to the streets and walked their floppy disks where they needed to go. And Sneakernet was born.

In the world of cloud and exponential data growth, the size of the disk and the speed of your sneakers may have changed, but the solution is the same: Sometimes the best way to move data is to ship it on physical media.

Today, we’re excited to introduce Transfer Appliance, to help you ingest large amounts of data to Google Cloud Platform (GCP).
Transfer Appliance offers up to 480TB in 4U or 100TB in 2U of raw data capacity in a single rackmount device
Transfer Appliance is a rackable high-capacity storage server that you set up in your data center. Fill it up with data and then ship it to us, and we upload your data to Google Cloud Storage. With capacity of up to one-petabyte compressed, Transfer Appliance helps you migrate your data orders-of-magnitude faster than over a typical network. The appliance encrypts your data at capture, and you decrypt it when it reaches its final cloud destination, helping to get it to the cloud safely.

Like many organizations we talk to, you probably have large amounts of data that you want to use to train machine learning models. You have huge archives and backup libraries taking up expensive space in your data center. Or IoT devices flooding your storage arrays. There’s all this data waiting to get to the cloud, but it’s impeded by expensive, limited bandwidth. With Transfer Appliance, you can finally take advantage of all that GCP has to offer  machine learning, advanced analytics, content serving, archive and disaster recovery  without upgrading your network infrastructure or acquiring third-party data migration tools.

Working with customers, we’ve found that the typical enterprise has many petabytes of data, and available network bandwidth between 100 Mbps and 1 Gbps. Depending on the available bandwidth, transferring 10 PB of that data would take between three and 34 years  much too long.

Estimated transfer times for given capacity and bandwidth
That’s where Transfer Appliance comes in. In a matter of weeks, you can have a petabyte of your data accessible in Google Cloud Storage, without consuming a single bit of precious outbound network bandwidth. Simply put, Transfer Appliance is the fastest way to move large amounts of data into GCP.

Compare the transfer times for 1 petabyte of data.
Customers tell us that space inside the data center is at a premium, and what space there is comes in the form of server racks. In developing Transfer Appliance, we built a device designed for the data center, that slides into a standard 19” rack. Transfer Appliance will only live in your data center for a few days, but we want it to be a good houseguest while it’s there.

Customers have been testing Transfer Appliance for several months, and love what they see:
"Google Transfer Appliance moves petabytes of environmental and geographic data for Makani so we can find out where the wind is the most windy." Ruth Marsh, Technical Program Manager at Makani

"Using a service like Google Transfer Appliance meant I could transfer hundreds of terabytes of data in days not weeks. Now we can leverage all that Google Cloud Platform has to offer as we bring narratives to life for our clients."  Tom Taylor, Head of Engineering at The Mill
Transfer Appliance joins the growing family of Google Cloud Data Transfer services. Initially available in the US, the service comes in two configurations: 100TB or 480TB of raw storage capacity, or up to 200TB or 1PB compressed. The 100TB model is priced at $300, plus shipping via Fedex (approximately $500); the 480TB model is priced at $1800, plus shipping (approximately $900). To learn more visit the documentation.

We think you’re going to love getting to cloud in a matter of weeks rather than years. Sign up to reserve a Transfer Appliance today. You can also sign up here for a GCP free trial.

From NoSQL to new SQL: How Spanner became a global, mission-critical database



Now that Cloud Spanner is generally available for mission-critical production workloads, it’s time to tell how Spanner evolved into a global, strongly consistent relational database service.
Recently the Spanner team presented a new paper at SIGMOD ‘17 that offers some fascinating insights into this aspect of Spanner’s “database DNA” and how it developed over time.

Spanner was originally designed to meet Google’s internal requirements for a global, fault-tolerant service to power massive business-critical applications. Today Spanner also embraces the SQL functionality, strong consistency and ACID transactions of a relational database. For critical use cases like financial transactions, inventory management, account authorization and ticketing/reservations, customers will accept no substitute for that functionality.

For example, there's no “spectrum” of less-than-strong consistency levels that will satisfy the mission-critical requirement for a single transaction state that's maintained worldwide; only strong consistency will do. Hence, few if any customers would choose to use an eventually-consistent database for critical OLTP. For Cloud Spanner customers like JDA, Snap and Quizlet, this unique feature set is already resonating.

Here are a few highlights from the paper:


  • Although Spanner was initially designed as a NoSQL key-value store, new requirements led to an embrace of the relational model, as well. Spanner’s architects had a relatively specific goal: to provide a service that could support fault-tolerant, multi-row transactions and strong consistency across data centers (with significant influence  and code  from Bigtable). At the same time, internal customers building OLTP applications also needed a database schema, cross-row transactions and an expressive query language. Thus early in Spanner’s lifecycle, the team drew on Google’s experience building the F1 distributed relational database to bring robust relational semantics and SQL functionality into the Spanner architecture. “These changes have allowed us to preserve the massive scalability of Spanner, while offering customers a powerful platform for database applications,” the authors wrote, adding that, “From the perspective of many engineers working on the Google infrastructure, the SQL vs. NoSQL dichotomy may no longer be relevant.”
  • The Spanner SQL query processor, while recognizable as a standard implementation, has unique capabilities that contribute to low-latency queries. Features such as query range extraction (for runtime analysis of complex expressions that are not easily re-written) and query restarts (compensating for failures, resharding, and other anomalies without significant latency impact) mitigate the complexities of highly distributed queries that would otherwise contribute to latency. Furthermore, the query processor serves both transactional and analytical workloads for low-latency or long-running queries.
  • Long-term investments in SQL tooling have produced a familiar RDBMS-like user experience. As part of a companywide effort to standardize on common SQL functionality for all its relational services (Spanner, Dremel/BigQuery, F1, and so on), Spanner’s user experience emphasizes ANSI SQL constructs and support for nested data as a first-class citizen. “SQL has provided significant additional value in expressing more complex data access patterns and pushing computation to the data, ” the authors wrote.
  • Spanner will soon rely on a new columnar format called Ressi designed for database-like access patterns (for hybrid OLAP/OLTP workloads). Ressi is optimized for time-versioned (rapidly changing) data, allowing queries to more efficiently find the most recent values. Later in 2017, Ressi will replace the SSTables format inherited from Bigtable, which although highly robust, are not explicitly designed for performance.


All in all, “Our path to making Spanner a SQL system led us through the milestones of addressing scalability, manageability, ACID transactions, relational model, schema DDL with indexing of nested data, to SQL,” the authors wrote.

For more details, read the full paper here.

How to do serverless pixel tracking with GCP



Whether they’re opening a newsletter or visiting a shopping cart page, how users interact with web content is very interesting to publishers. One way to understand user behavior is by using pixels, small 1x1 transparent images embedded into the web property. When loaded, the pixel calls a web server that records the request parameters passed in the URL that can be processed later.

Adding a pixel is easy, but hosting it and processing the request can be challenging for various reasons:
  • You need to set up, manage and monitor your ad servers
  • Users are usually global, which means that you need ad servers around the world
  • User visits are spiky, so pixel servers must scale up to sustain the load and scale down to limit the spend.
Google Cloud Platform (GCP) services such as Container Engine and managed autoscaled instance groups can help with those challenges. But at Google Cloud, we think companies should avoid managing infrastructure whenever possible.

For example, we recently worked with GCP partner and professional services firm DoiT International to build a pixel tracking platform that relieves the administrator from setting up or managing any servers. Instead, this serverless pixel tracking solution leverages managed GCP services, including:
  • Google Cloud Storage: A global or regional object store that offers different options such as Standard, Nearline, Cold with various prices and SLAs depending on your needs. In our case, we used Standard, which offers low millisecond latency
  • Google HTTP(s) Load Balancer: A global anycast IP load balancer service that can scale to millions of QPS with integrated logging. It also can be leveraged by Cloud CDN to prevent useless access to Google Cloud Storage by caching pixels closer to the user in Google edges
  • BigQuery: Google's fully managed, petabyte-scale, low-cost enterprise data warehouse for analytics
  • Stackdriver Logging: A logging system that allows you to store, search, analyze, monitor and alert on log data and events from GCP and Amazon Web Services (AWS). It supports Google load balancers and can export data to Cloud Storage, BigQuery or Pub/Sub
Tracking pixels with these services works as follows:
  1. A client calls a pixel URL that's served directly by Cloud Storage.
  2. A Google Cloud Load Balancer in front of Cloud Storage records the request to Stackdriver Logging, whether there was a cache hit or not.
  3. Stackdriver Logging exports every request to BigQuery as they come in, which acts as a storage and querying engine for ad-hoc analytics that can help business analysts better understand their users.


All those services are fully managed and do not require you to set up any instances or VMs. You can learn more about this solution by:
Going forward, we look forward to building more serverless solutions on top of GCP managed offerings. Let us know in the comments if there’s a solution that you’d like us to build!

Cloud Spanner is now production-ready; let the migrations begin!



Cloud Spanner, the world’s first horizontally-scalable and strongly-consistent relational database service, is now generally available for your mission-critical OLTP applications.

We’ve carefully designed Cloud Spanner to meet customer requirements for enterprise databases — including ANSI 2011 SQL support, ACID transactions, 99.999% availability and strong consistency — without compromising latency. As a combined software/hardware solution that includes atomic clocks and GPS receivers across Google’s global network, Cloud Spanner also offers additional accuracy, reliability and performance in the form of a fully-managed cloud database service. Thanks to this unique combination of qualities, Cloud Spanner is already delivering long-term value for our customers with mission-critical applications in the cloud, including customer authentication systems, business-transaction and inventory-management systems, and high-volume media systems that require low latency and high throughput. For example, Snap uses Cloud Spanner to power part of its search infrastructure.

Looking toward migration


In preparation for general availability, we’ve been working closely with our partners to make adoption as smooth and easy as possible. Thus today, we're also announcing our initial data integration partners: Alooma, Informatica and Xplenty.

Now that these partners are in the early stages of Cloud Spanner “lift-and-shift” migration projects for customers, we asked a couple of them to pass along some of their insights about the customer value of Cloud Spanner, as well as any advice about planning for a successful migration:

From Alooma:

Cloud Spanner is a game-changer because it offers horizontally scalable, strongly consistent, highly available OLTP infrastructure in the cloud for the first time. To accelerate migrations, we recommend that customers replicate their data continuously between the source OLTP database and Cloud Spanner, thereby maintaining both infrastructures in the same state — this allows them to migrate their workloads gradually in a predictable manner.

From Informatica:
“Informatica customers are stretching the limits of latency and data volumes, and need innovative enterprise-scale capabilities to help them outperform their competition. We are excited about Cloud Spanner because it provides a completely new way for our mutual customers to disrupt their markets. For integration, migration and other use cases, we are partnering with Google to help them ingest data into Cloud Spanner and integrate a variety of heterogeneous batch, real-time, and streaming data in a highly scalable, performant and secure way.”

From Xplenty:
"Cloud Spanner is one of those cloud-based technologies for which businesses have been waiting: With its horizontal scalability and ACID compliance, it’s ideal for those who seek the lower TCO of a fully managed cloud-based service without sacrificing the features of a legacy, on-premises database. In our experience with customers migrating to Cloud Spanner, important considerations include accounting for data types, embedded code and schema definitions, as well as understanding Cloud Spanner’s security model to efficiently migrate your current security and access-control implementation."

Next steps


We encourage you to dive into a no-cost trial to experience first-hand the value of a relational database service that offers strong consistency, mission-critical availability and global scale (contact us about multi-regional instances) with no workarounds — and with no infrastructure for you to deploy, scale or manage. (Read more about Spanner’s evolution inside Google in this new paper presented at the SIGMOD ‘17 conference today.) If you like what you see, a growing partner ecosystem is standing by for migration help, and to add further value to Cloud Spanner use cases via data analytics and visualization tooling.

Compute Engine machine types with up to 64 vCPUs now ready for your production workloads



Today, we're happy to announce general availability for our largest virtual machine shapes, including both predefined and custom machine types, with up to 64 virtual CPUs and 416 GB of memory.


64 vCPU machine types are available on our Haswell, Broadwell and Skylake (currently in Alpha) generation Intel processor host machines.

Tim Kelton, co-founder and Cloud Architect of Descartes Labs, an early adopter of our 64 vCPU machine types, had this to say:
"Recently we used the 64 vCPU instances during the building of both our global composite imagery layers and GeoVisual Search. In both cases, our parallel processing jobs needed tens of thousands of CPU hours to complete the task. The new 64 vCPU instances allow us to work across more satellite imagery scenes simultaneously on a single instance, dramatically speeding up our total processing times."
The new 64 core machines are available for use today. If you're new to GCP and want to give these larger virtual machines a try, it’s easy to get started with our $300 credit for 12 months.

Google Cloud Natural Language API launches new features and Cloud Spanner graduating to GA



Today at Google Cloud Next London we're excited to announce product news that will help customers innovate and transform their businesses faster via the cloud: first, that Google Cloud Natural Language API is adding support for new languages and entity sentiment analysis, and second, that Google Cloud Spanner is graduating to general availability (GA).

Cloud Natural Language API beta


Since we launched Cloud Natural Language API, a fully managed service for extracting meaning from text via machine learning, we’ve seen customers such as Evernote and Ocado enhance their businesses in fascinating ways. For example, they use Cloud Natural Language API to analyze customer feedback and sentiment, extract key entities and metadata from unstructured text such as emails or web articles, and enable novel features (such as deriving action items from meeting notes).

These use cases, among many others, highlighted the need to expand language support and add improvements in the quality of our base NLU technology. We've incorporated this feedback into the product and are pleased to announce the following new capabilities under beta:

  • Expanded language support for entity, document sentiment and syntax analysis for the following languages: Chinese (Simplified and Traditional), French, German, Italian, Korean and Portuguese. This is in addition to existing support for English, Spanish and Japanese.
  • Understand sentiment for specific entities and not just whole document or sentence: We're introducing a new method that identifies entities in a block of text and also determines sentiment for those entities. Entity sentiment analysis is currently only available for the English language. For more information, see Analyzing Entity Sentiment.
  • Improved quality for sentiment and entity analysis: As part of the continuous effort to improve quality of our base models, we're also launching improved models for sentiment and entity analysis as part of this release.

Early access users of this new functionality such as Wootric are already using the expanded language support and new entity sentiment analysis feature to better understand customer sentiment around brands and products. For example, for customer feedback such as “the phone is expensive but has great battery life,” users can now parse that the sentiment for phone is negative while the sentiment for battery life is positive.

As the API becomes more widely adopted, we're looking forward to seeing more interesting and useful applications of it.

Cloud Spanner enters GA

Announced in March at Google Cloud Next ‘17, Cloud Spanner is the world’s first fully managed, horizontally scalable relational database service for mission-critical online transaction processing (OLTP) applications. Cloud Spanner is specifically designed to meet customer requirements in this area for strong consistency, high availability and global scale qualities that make it unique as a service.

During the beta period, we were thrilled to see customers unlock new use cases in the cloud with Cloud Spanner, including:

  • Powering mission-critical applications like customer authentication and provisioning for multi-national businesses
  • Building consistent systems for business transactions and inventory management in the financials services and retail industries
  • Supporting incredibly high-volume systems that need low-latency and high-throughput in the advertising and media industries

As with all our other services, GCP handles all the performance, scalability and availability needs automatically in a pay-as-you-go way.

On May 16, Cloud Spanner will reach a further milestone by becoming generally available for the first time. Currently we're offering regional instances, with multi-regional instances coming later this year. We've been Spanner users ourselves for more than five years to support a variety of mission-critical global apps, and we can’t wait to see what new workloads you bring to the cloud, and which new ones you build next!

Google Cloud Storage introduces Cloud Pub/Sub notifications



Google Cloud Storage has always been a high-performance and cost-effective place to store data objects. Now it’s also easy to build workflows around those objects that are triggered by creating or deleting them, or changing their metadata.

Suppose you want to take some action every time a change occurs in one of your Cloud Storage buckets. You might want to automatically update sales projections every day when sales uploads its new daily totals. You might need to remove a resource from a search index when an object is deleted. Or perhaps you want to update the thumbnail when someone makes a change to an image. The ability to respond to changes in a Cloud Storage bucket gives you increased responsiveness, control and flexibility.

Cloud Pub/Sub Support


We’re pleased to announce that Cloud Storage can now register changes by sending change notifications to a Google Cloud Pub/Sub topic. Cloud Pub/Sub is a powerful messaging platform that allows you to build fast, reliable and more secure messaging solutions. Cloud Pub/Sub support introduces many new capabilities to Cloud Storage notifications, such as pulling from subscriptions instead of requiring users to configure webhooks, multiplexing copies of each message to many subscribers and filtering messages by event type or prefix.
You can get started sending Cloud Storage notifications to Cloud Pub/Sub by reading our getting started guide. Once you’ve enabled the Cloud Pub/Sub API and downloaded the latest version of the gcloud SDK, you can set up notification triggers from your Cloud Storage bucket to your Cloud Pub/Sub topic with the following command:

$> gsutil notification create -f json -t your-topic gs://your-bucket

From that point on, any changes to the contents of your Cloud Storage bucket trigger a message to your Cloud Pub/Sub topic. You can then create Cloud Pub/Sub subscriptions on that topic and pull messages from those subscriptions in your programs, like in this example Python app.

Cloud Functions

Cloud Pub/Sub is a powerful and flexible way to respond to changes in a bucket. However, for some tasks you may prefer the simplicity of deploying a small, serverless function that just describes the action you want to take in response to a change. For that, Google Cloud Functions supports Cloud Storage triggers.

Cloud Functions is a quick way to deploy cloud-based scripts in response to a wide variety of events, for example an HTTP request to a certain URL, or a new object in a Cloud Storage bucket.

Once you get started with Google Cloud Functions, you can learn about setting up a Cloud Storage Trigger for your function. It’s as simple as adding a “--trigger-bucket” parameter to your deploy function:

$> gcloud beta functions deploy helloWorld --stage-bucket cloud-functions --trigger-bucket your-bucket

It’s fun to think about what’s possible when Cloud Storage objects aren’t just static entities, but can trigger a wide variety of tasks. We hope you’re as excited as we are!

Google Cloud Platform expands to Mars



Google Cloud Platform (GCP) is committed to meeting our customers needs—no matter where they are. Amidst our growing list of new regions, today we're pleased to announce our expansion to Mars. In addition to supporting some of the most demanding disaster recovery and data sovereignty needs of our Earth-based customers, we’re looking to the future cloud infrastructure needed for the exploration and ultimate colonization of the Red Planet.
Visit Mars with Google Street View
Mars has long captured the imagination as the most hospitable planet for future colonization, and expanding to Mars has been a top priority for Google. By opening a dedicated extraterrestrial cloud region, we're bringing the power of Google’s compute, network, and storage to the rest of the solar system, unlocking a plethora of possibilities for astronomy research, exploration of Martian natural resources and interplanetary life sciences. This region will also serve as an important node in an extensive network throughout the solar system.

Our first interplanetary data center—affectionately nicknamed “Ziggy Stardust”—will open in 2018. Our Mars exploration started as a 20% project with the Google Planets team, which mapped Mars and other bodies in space and found a suitable location in Gale Crater, near the landing site of NASA’s Curiosity rover.
Explore more of Mars in Google Maps
In order to ease the transition for our Earthling customers, Google Cloud Storage (GCS) is launching a new Earth-Mars Multi-Regional location. Users can store planet-redundant data across Earth and Mars, which means even if Earth experiences another asteroid strike like the one that wiped out the dinosaurs, your cat videos, selfies and other data will still be safe. Of course, we'll also store all public domain scientific data, history and arts free of charge so that the next global catastrophe doesn't send humanity back into the dark ages.

Customers can choose to store data exclusively in the new Mars region, outside of any controlled jurisdictions on Earth, ensuring that they're both compliant with and benefit from the terms of the Outer Space Treaty. The ability to store and process data on Mars enables low-latency data analysis pipelines and consumer apps to serve the expected influx of Mars explorers and colonists. How exciting would it be to stream movies of potatoes growing right from the craters and dunes of our new frontier?

One of our early access customers says “This will be a game changer for us. With GCS, we can store all the data collected from our rovers right on Mars and run big data analytics to query exabyte-scale datasets all in a matter of seconds. Our dream of colonizing Mars by 2020 can now become a reality.”
Walk inside our new data center in Google Street View
The Martian data center will become Google’s greenest facility yet by taking full advantage of its new location. The cold weather enables natural, unpowered cooling throughout the year, while the thin atmosphere and high winds allow the entire facility to be redundantly powered by entirely renewable sources.

But why stop at Mars? We're taking a moonshot at N+42 redundancy with galaxy-scale computing. While GCP is optimized for faster-than-light data coordination for databases, the Google Planets team is already hard at work mapping the rest of our solar system for future data center locations. Stay tuned and join our journey! We can’t wait to see the problems you solve and the breakthroughs you achieve.

P.S. Check out Curiosity’s journey across the Red Planet on Mars Street View.


Solution guide: Archive your cold data to Google Cloud Storage with Komprise



More than 56% of enterprises have more than half a petabyte of inactive data but this “cold” data often lives on expensive primary storage platforms. Google Cloud Storage provides an opportunity to store this data cost-effectively and achieve significant savings, but storage and IT admins often face the challenge of how to identify cold data and move it non-disruptively.

Komprise, a Google Cloud technology partner, provides software that analyzes data across NFS and SMB/CIFS storage to identify inactive/cold data, and moves the data transparently to Cloud Storage, which can help to cut costs significantly. Working with Komprise, we’ve prepared a full tutorial guide that describes how customers can understand data usage and growth in their storage environment, get a customized ROI analysis and move this data to Cloud Storage based on specific policies.
Cloud Storage provides excellent options to customers looking to store infrequently accessed data at low cost using Nearline or Coldline storage tiers. If and when access to this data is needed, there are no access time penalties; the data is available almost immediately. In addition, built-in object-level lifecycle management in Cloud Storage reduces the burden for admins by enabling policy-based movement of data across storage classes. With Komprise, customers can bring lifecycle management to their on-premise primary storage platforms and seamlessly move this data to the Cloud. Komprise deploys in under 15 minutes, works across NFS, SMB/CIFS and object storage without any storage agents, adapts to file-system and network loads to run non-intrusively in the background and scales out on-demand.

Teams can get started through this self-service tutorial or watch this on-demand webinar featuring Komprise’ COO Krishna Subramanian and Google Cloud Storage Product Manager Ben Chong. As always, don’t hesitate to reach out to us to explore which enterprise workloads make the most sense for your cloud initiatives.

Solution guide: backing up Windows files using CloudBerry Backup with Google Cloud Storage



Modern businesses increasingly depend on their data as a foundation for their operation. The more critical the reliance is on that data, the more important it is to ensure that data is protected with backups. Unfortunately, even by taking regular backups, you're still susceptible to data loss from a local disaster or human error. Thus, many companies entrust their data to geographically distributed cloud storage providers like Google Cloud Platform (GCP). And when they do, they want convenient cloud backup automation tools that offer flexible backup options and quick on-demand restores.

One such tool is CloudBerry Backup (CBB), and has the following capabilities:

  • Creating incremental data copies with low impact on production workloads
  • Data encryption on all transferring paths
  • Flexible retention policy, allowing you to balance the volume of data stored and storage space used
  • Ability to carry out hybrid restores with the use of local and cloud storage resources

CBB includes a broad range of features out of the box, allowing you to address most of your cloud backup needs, and is designed to have low impact on production servers and applications.

CBB has a low-footprint backup client that you install on the desired server. After you provision a Google Cloud Storage bucket, attach it to CBB and create a backup plan to immediately start protecting your files in the cloud.

To simplify your cloud backup onboarding, check out the step-by-step tutorial on how to use CloudBerry Backup with Google Cloud Storage and easily restore any files.