Tag Archives: Developer Tools & Insights

Implementing an event-driven architecture on serverless — the Smart Parking story



Part 2 


In this article, we’re going to explore how to build an event-driven architecture on serverless services to solve a complex, real-world problem. In this case, we’re building a smart city platform. An overview of the domain can be found in part one. If you haven’t read part one, please go take a look now. Initial reviews are in, and critics are saying “this may be the most brilliantly composed look at modern software development; where’s dinner?” (please note: in this case the "critics" are my dogs, Dax and Kiki).

Throughout this part, we’ll be slowly building up an architecture. In part three, we’ll dive deeper into some of the components and review some actual code. So let’s get to it. Where do we start? Obviously, with our input!

Zero step: defining our domain 


Before we begin, let’s define the domain of a smart city. As we learned in the previous post, defining the domain means establishing a clear language and terminology for referencing objects and processes in our software system. Of course, creating this design is typically more methodical, iterative, and far more in-depth. It would take a genius to just snap and put an accurate domain at the end of a blog post (it’s strange that I never learned to snap my fingers, right?).

Our basic flow, for this project is a network of distributed IoT (Internet of Things) devices that send periodic readings that are used to define the frames of larger correlated events throughout a city.
  • Sensor - electronic device that's capable of capturing and reporting one or more specialized readings 
  • Gateway - an internet-connected hub that's capable of receiving readings from one or more sensors and sending these packages to our smart cloud platform 
  • Device - the logical combination of a sensor and its reporting gateway (used to define a clear split between the onramp from processing) 
  • Readings - key-value pairs (e.g., { temperature: 35, battery: "low" } ) sent by sensors 
  • UpdatedReadings - the command to update readings for a specific device 
  • ReadingsUpdated - the event that occurs in our system when new readings are received from a device (response to a UpdateReadings command) 
  • Frame - a collection of correlated / collocated events (typically ReadingsUpdated) used to drive business logic through temporal reasoning [lots more on this later] 
  • Device Report - an analytic view of devices and their health metrics (typically used by technicians) 
  • Event Report - an analytic view of frames (typically used by business managers) If we connect all of these parts together in a diagram, and add some serveless glue (parts in bold), we get a nice overview of our architecture:

Of course, there's a fair bit of missing glue in the above diagram. For example, how do we take an UpdateReadings command and get it into Bigtable? This is where my favorite serverless service comes into play: Cloud Functions! How do we install devices? Cloud Functions. How do we create organizations? Cloud Functions. How do we access data through an API? Cloud Functions. How do we conquer the world? Cloud Functions. Yep, I’m in love!

Alright, now we have our baseline, let’s spend the rest of this post exploring just how we go about implementing each part of our architecture and dataflows.

First step: inbound data


Our smart city platform is nothing more than a distributed network of internet-connected (IoT) devices. These devices are composed of one or more sensors that capture readings and their designated gateways that help package this data and send it through to our cloud.

For example, we may have an in-ground sensor used to detect a parked car. This senor reports IR and magnetic readings that are transferred through RF (radio frequencies) to a nearby gateway. Another example is a smart trash can that monitors capacity and broadcasts when the bin is full.

The challenge of IoT-based systems has always been collecting data, updating in-field devices, and security. We could write an entire series of articles on how to deal with these challenges. In fact, the burden of this task is the reason we haven’t seen many sophisticated, generic IoT platforms. But not anymore! The problem has been solved for us by those wonderful engineers at Google. Cloud IoT Core is a serverless service offered by Google Cloud Platform (GCP) that helps you skip all the annoying steps. It’s like jumping on top of the bricks!

Wait . . . does anyone get that reference anymore? Mario Brothers. The video game for the NES. You could jump on top of the ceiling of bricks to reach a secret pipe that let you skip a bunch of levels. It was a pipe because you were a plumber . . .  fighting a turtle dragon to save a princess. And you could throw fireballs by eating flowers. Trust me, it made perfect sense!

Anyway! Cloud IoT Core is the secret passage that lets you skip a bunch of levels and get to the good stuff. It scales automatically and is simple to use. Seriously, don’t spend any time managing your devices and securing your streams. Let Google do that for you.

So, sensors are observing life in our city and streaming this data to IoT Core. Where does it end up after IoT Core? In Cloud Pub/Sub, Google’s serverless queuing service. Think of it as a globally distributed subscription queue with guaranteed delivery. The result: our vast network of data streams has been converted to a series of queues that our services can subscribe to. This is our inbound pipeline. It scales nearly infinitely and requires no operation support. Think about that. We haven’t even written any code yet and we already have an incredibly robust architecture. Trust me, it took my team only a week to move over existing device onramp to IoT Core—it’s that straightforward. And how many problems have we had? How many calls at 3 AM to fix the inbound data? Zero. They should call it opsless rather than serverless!

Anyway, we got our data streaming in. So far, our architecture looks like this:
While we’re exploring a smart city platform made from IoT devices, you can use this pipeline with almost any architecture. Just replace the IoT Core box with your onboarding service and output to Pub/Sub. If you still want that joy of serverless (and no calls at 3 AM), then consider using Google Cloud Dataflow as your onramp!

What is Dataflow? It's a serverless implementation of a Hadoop-like pipeline used for the transformation and enriching of streaming or batch data. Sounds really fancy, and it actually is. If you want to know just how fancy, grab any data engineer and ask for their war stories on setting up and maintaining a Hadoop cluster (it might take awhile; bring popcorn). In our architecture, it can be used to both onramp data from an external source, to help with efficient formation of aggregates (i.e., to MapReduce a large number of events), or to help with windowing for streaming data. This is huge. If you know anything about streaming data, then you’ll know the value of a powerful, flexible windowing service.

Ok, now that we got streaming data, let’s do something with it!

Second step: normalizing streaming data

How is a trash can like a street lamp? How is a parking sensor like a barometer? How is a phaser like a lightsaber? These are questions about normalization. We have a lot of streaming data, but how do we find a common way of correlating it all?

For IoT this is a complex topic and more information can be found in this whitepaper. Of course, the only whitepaper in most developers lives comes on a roll. So here is a quick summary:

How do we normalize the data streaming from our distributed devices? By converting them all to geolocated events. If we know the time and location of a sensor reading, we can start to colocate and correlate events that can lead to action. In other words, we use location and time to help use build a common reference point for everything going on in our city.

Fortunately, many (if not all devices) will already need some form of decoding / translation. For example, consider our in-ground parking sensor. Since it's transmitting data over radio frequencies, it must optimize and encode data. Decoding could happen in the gateway, but we prefer a gateway to contain no knowledge of the devices it services. It should just act as the doorway to the world wide web (for all the Generation Z folks out there, that’s what the "www." in urls stands for).

Ideally, devices would all natively speak "smart city" and no decoding or normalization would be needed. Until then, we still need to create this step. Fortunately, it is super simple with Cloud Functions.

Cloud Functions is a serverless compute offering from Google. It allows us to run a chunk of code whenever a trigger occurs. We simply supply the recipe and identify the trigger and Google handles all the scaling and compute resource allocation. In other words, all I need to do is write the 20-50 lines of code that makes my service unique and never worry about ops. Pretty sweet, huh?

So, what’s our trigger? A Pub/Sub topic. What’s our code? Something like this:

function decodeParkingReadings( triggerInput ) {
    parsePubSubMessage(triggerInput)
    .then(decode)
    .then(converToCommand)
    .then(PubSub.topic(‘DeviceCommands’).publish)
  }

If you’re not familiar with promises and async coding in JavaScript, the above code simply does the following:
  1. Parse the message from Pub/Sub 
  2. When this is done, decode the payload byte string sent by the sensor 
  3. When this is done, wrap the decoded readings with our normalized UpdateReadings command data struct 
  4. When this is done, send the normalized event to the Device Readings Pub/Sub 
Of course, you’ll need to write the code for the "decode" and "convertToCommand" functions. If there's no timestamp provided by the device, then it would need to be added in one of these two steps. We’ll get more in-depth into code examples in part three.

So, in summary, the second step is to normalize all our streams by converting them into commands. In this case, all sensors are sending in a command to UpdateReadings for their associated device. Why didn’t we just create the event? Why bother making a command? Remember, this is an event-driven architecture. This means that events can only be created as a result of a command. Is it nitpicky? Very. But is it necessary? Yes. By not breaking the command -> event -> command chain, we make a system that's easy to expand and test. Without it, you can easily get lost trying to track data through the system (yes, a lot more on tracking data flows later).

So our architecture now looks like this:
Data streams coming into our platform are decoded using bespoke Cloud Functions that output a normalized, timestamped command. So far, we’ve only had to write about 30 - 40 lines of code, and the best part . . . we’re almost halfway complete with our entire platform.

Now that we have commands, we move onto the real magic. . . storage. Wait, storage is magic?

Third step: storage and indexing events


Now that we've converted all our inbound data into a sequence of commands, we’re 100% into event-driven architecture. This means that now we need to address the challenges of this paradigm. What makes event-driven architecture so great? It makes sense and is super easy to extend. What makes event-driven architecture painful? Doing it right has been a pain. Why? Because you only have commands and events in your system. If you want something more meaningful you need to aggregate these events. What does that mean? Let’s consider a simple example.

Let’s say you’ve got an event-driven architecture for a website that sells t-shirts. The orders come in as commands from a user-submitted web form. Updates also come in as commands. On the backend, we store only the events. So consider the following event sequence for a single online order:

1 - (Order Created) 
     orderNumber: 123foo
     items: [ item: redShirt, size: XL, quantity: 2 ]
     shippingAddress: 123 Bar Lane
2 - (Address Changed)
     orderNumber: 123foo
     shippingAddress: 456 Infinite Loop
3 - (Quantity Changed)
      orderNumber: 123foo
      items: [ item: redShirt, size: XL, quantity: 1 ]

You cannot get an accurate view of the current order by looking at only one event. If you only looked at #2 (Address Changed), you wouldn’t know the item quantities. If you only looked at #3 (Quantity Change), you wouldn’t have the address.

To get an accurate view, you need to "replay" all the events for the order. In event-driven architecture, this process is often referred to as "hydrating." Alternatively, you can maintain an aggregate view of the order (the current state of the order) in the database and update it whenever a new command arrives. Both of these methods are correct. In fact, many event-driven architectures use both hydration and aggregates.

Unfortunately, implementing consistent hydration and/or aggregation isn’t easy. There are libraries and even databases designed to handle this, but that was before the wondrous powers of serverless computing. Enter Google Cloud Bigtable and BigQuery.

Bigtable is the database service that Google uses to index and search the entire internet. Let that sink in. When you do a Google search, it's Bigtable that gets you the data you need in a blink of an eye. What does that mean? Unmatched power! We’re talking about a database optimized to handle billions of rows with millions of columns. Why is that so important? Because it lets us do event-driven architecture right!

For every command we receive, we create a corresponding event that we store in Bigtable.

Wow.
This blogger bloke just told us that we store data in a database.
Genius.

Why thank you! But honestly, it's the aspects of the database that matters. This isn’t just any database. Bigtable lets us optimize without optimizing. What does that mean? We can store everything and anything and access it with speed. We don’t need to write code to optimize our storage or build clever abstractions. We just store the data and retrieve it so fast that we can aggregate and interpret at access.

Huh?

Let me give you an example that might help explain the sheer joy of having a database that you cannot outrun.

These days, processors are fast. So fast that the slowest part of computing is loading data from the disk (even SSD). Therefore, most of the world’s most performant systems use aggressive compression when interacting with storage. This means that we'll compress all writes going to disk to reduce read-time. Why? Because we have so much excess processing power, it's faster to decompress data rather than read more bytes from disk. Go back 20 years and tell developers that we would "waste time" by compressing everything going to disk and they would tell you that you’re mad. You’ll never get good performance if you have to decompress everything coming off the disk!

In fact, most platforms go a step further and use the excessive processing power to also encrypt all data going to disk. Google does this. That’s why all your data is secure at rest in their cloud. Everything written to disk is compressed AND encrypted. That goes for their data too, so you know this isn’t impacting performance.

Bigtable is very much the same thing for web services. Querying data is so fast that we can perform processing post-query. Previously, we would optimize our data models and index the heck out of our tables just to reduce query time inside the database. When the database can query across billions of rows in 6 ms, that changes everything. Now we just store and store and store and process later.

This is why storing your data in a database is amazing, if it's the right database!

So how do we deal with aggregation and hydration? We don’t. At least not initially. We simply accept our command, load any auxiliary lookups needed (often the sensor / device won’t be smart enough to know its owner or location), and then save it to Bigtable. Again, we’re getting a lot of power with very little code and effort.

But wait, there’s more! I also mentioned BigQuery. This is the service that allows us to run complex SQL queries across massive datasets (even datasets store in a non-SQL database). In other words, now that I’ve stored all this data, how do I get meaning from it? You could write a custom service, or just use BigQuery. It will let you perform queries and aggregations across terabytes of data in seconds.

So yes, for most of you, this could very well be the end of the architecture:
Seriously, that’s it. You could build any modern web service (social media, music streaming, email) using this architecture. It will scale infinitely and have a max response time for queries of around 3 - 4 seconds. You would only need to write the initial normalization and the required SQL queries for BigQuery. If you wanted even faster responses, you could target queries directly at Bigtable, but that requires a little more effort.

This is why storage is magic. Pick the right storage and you can literally build your entire web service in two weeks and never worry about scale, ops, or performance. Bigtable is my Patronus!!

Now we could stop here. Literally. We could make a meaningful and useful city platform with just this architecture. We’d be able to make meaningful reports and views on events happening throughout our city. However, we want more! Our goal is to make a smart city, one that automatically reacts to events.

Fourth step: temporal reasoning


Ok, this is where things start to get a little more complex. We have events—a lot of events. They are stored, timestamped and geolocated. We can query this data easily and efficiently. However, we want to make our system react.

This is where temporal reasoning comes in. The fundamental idea: we build business rules and insights based on the temporal relationship between grouped events. These collections of related events are commonly referred to as "intervals" or "frames." For example, if my interval is a lecture, the lecture itself can contain many smaller events:


We can also take the lecture in the context of an even larger interval, such as a work day:
And of course, these days can be held in the context of an even larger frame, like a work week.

Once we've built these frames (these collection of events), we can start asking meaningful questions. For example, "Has the average temperature been above the danger threshold for more than 5 minutes?", "Was maintenance scheduled before the spike in traffic?", "Did the vehicle depart before the parking time limit?"

For many of you, this process may sound familiar. This approach of applying business rules for streaming data has many similarities to a Complex Event Processing (CEP) service. In fact, a wonderful implementation of a CEP that uses temporal reasoning is the Drools Fusion module. Amazing stuff! Why not just use a CEP? Unfortunately, business rule management systems (BRMS) and CEPs haven't yet fully embraced the smaller, bite-size methodologies of microservices. Most of these system require a single monolithic instance that demands absolute data access. What we need is a distributed collection of rules that can be easily referenced and applied by a distributed set of autoscaling workers.

Fortunately, writing the rules and applying the logic is easy once you have the grouped events. Creating and extending these intervals is the tricky part. For our smart city platform, this means having modules that define specific types of intervals and then adds any and all related events.

For example, consider a parking module. This would take the readings from sensors that detect the arrival and departure of a vehicle and create a larger parking interval. An example of a parking interval might be:
We simply build a microservice that listens to ReadingsUpdated events and manages creation and extension of parking intervals. Then, we're free to make a service that reacts to the FrameUpdated events and runs temporal reasoning rules to see if a new command should be created. For example, "If there's no departure 60 minutes after arrival, broadcast an IssueTicket command."

Of course, we may need to correlate events into an interval that are outside the scope of the initial sensor. In the parking example, we see "payment made." Payment is clearly not collected by the parking sensor. How do we manage this? By creating links between the interval and all known associated entities. Then, whenever a new event enters our system, we can add it to all related intervals (if the associated producer or its assigned groups are related). This sounds complex, but it's actually rather easy to maintain a complex set of linkages in Bigtable. Google does on a significant scale (like the entire internet). Of course, this would be a lot simpler if someone provided a serverless graph database!

So, without diving too much into the complexities, we have the final piece of our architecture. We collect events into common groups (intervals), maintain a list of links to related entities (for updates), and apply simple temporal reasoning (business rules) to drive system behavior. Again, this would be a nightmare without using an event-driven architecture built on serverless computing. In fact, once we get a serverless graph database and distributed BRMS, we've solved the internet (spoiler alert: we'll all change into data engineers, AI trainers and UI guys).

[BTW, for more information, please consult the work of the godfather of computer-based temporal reasoning, James F. Allen. More specifically, his whitepapers An Interval-Based Representation of Temporal Knowledge and Maintaining Knowledge about Temporal Events]

Fifth step: the extras


While everything sounds easy, there are a few details I may have glossed over. I hope you found some! You’re an engineer, of course you did. Sorry, but this part is a little technical. You can skip it if you like! Or, just email me your doubt and I’ll reply!

A quick example is how do I query a subset of devices. We have everything stored in Bigtable, but how do I look at only one group? For example, what if I only wanted to look at devices or events downtown?

This is where grouping comes in. It’s actually really easy with Bigtable. Since Bigtable is NoSQL, it means that we can have sparse columns. In other words, we can have a family called "groups" and any custom set of column qualifiers in this family per row. In other words, we let an event belong to any number of groups. We look up the current groups when the command is received for the device and add the appropriate columns. This will hopefully make more sense when we go deeper in part three.

Another area worth a passing mention is extension and testing. Why is serverless and event-driven architecture so easy to test and extend? The ease of testing comes from the microservices. Each component does one thing and does it well. It accepts either a command or an event and produces a simple output. For example, each of our event Pub/Subs has a Cloud Function that simply takes the events and stores them in Google Cloud Storage for archival purposes. This function is only 20 lines of code (mostly boilerplate) and has absolutely no impact on the performance of other parts of the system. Why no impact? It's serverless, meaning that it autoscales only for its needs. Also, thanks to the Pub/Sub queues, our microservice is taking a non-destructive replication of input (i.e., each microservice is getting a copy of the message without putting a burden on any other part of our architecture).

This zero impact is also why extension of our architecture is easy. If we want to build an entirely new subsystem, we simply branch off one of the Pub/Subs. This means a developer can rebuild the entire system if they want with zero impact and zero downtime for the existing system. I've done this [transitioned an entire architecture from Datastore to Bigtable1], and it's liberating. Finally, we can rebuild and refactor our services without having to toss out the core of our architecture—the events. In fact, since the heart of our system is events published through serverless queues, we can branch our system just like many developers branch their code in modern version control systems (i.e, Git). We simply create new ways to react to commands and events. This is perfect for introducing new team members. These noobs [technical term for a new starter] can branch off a Pub/Sub and deploy code to the production environment on their first day with zero risk of disrupting the existing system. That's powerful stuff! No-fear coding? Some dreams do come true.

BUT—and this is a big one (fortunately, I like big buts)—what about integration testing? Building and testing microservices is easy. Hosting them on serverless is easy. But how do we monitor them and, more importantly, how do we perform integration testing on this chain of independent functions? Fortunately, that's what Part Three is for. We'll cover this all in great detail there.

Conclusion


In this post, we went deep into how we can make an event-driven architecture work on serveless through the context of a smart city platform. Phew. That was a lot. Hope it all made sense (if not, drop me an email or leave a comment). In summary, modern serverless cloud services allow us to easily build powerful systems. By leveraging autoscaling storage, compute and queuing services, we can make a system that outpaces any demand and provides world-class performance. Furthermore, these systems (if designed correctly) can be easy to create, maintain and extend. Once you go serverless, you'll never go back! Why? Because it's just fun!

In the next part, we'll go even deeper and look at the code required to make all this stuff work.


A little more context on the refactor, for those who care to know. Google Datastore is a brillant and extremely cost-efficient database. It is noSQL like Bigtable but offers traditional-style indexing. For most teams, Datastore will be a magic bullet (solving all your scalability and throughput needs). It’s also ridiculously easy to use. However, as your data sets (especially for streaming) start to grow, you’ll find that the raw power of Bigtable cannot be denied. In fact, Datastore is built on Bigtable. Still, for most of us, Datastore will be everything we could want in a database (fast, easy and cheap, with infanite2 scaling).

Did I put a footnote in a footnote? Yes. Does that make it a toenote? Definitely. Is ‘infanite’ a word? Sort of. In·fa·nite (adjective) - practically infinite. Google’s serverless offerings are infanite, meaning that you’ll never hit the limit until your service goes galactic.

Cloud Endpoints: Introducing a new way to manage API configuration rollout



Google Cloud Endpoints is a distributed API gateway that you can use to develop, deploy, protect and monitor APIs that you expose. Cloud Endpoints is built on the same services that Google uses to power its own APIs, and you can now configure it to use a new managed rollout strategy that automatically uses the latest service configuration, without having to re-deploy or restart it.

Cloud Endpoints uses the distributed Extensible Service Proxy (ESP) to serve APIs with low latency and high performance. ESP is a service proxy based on NGINX, so you can be confident that it can scale to handle simultaneous requests to your API. ESP runs in its own Docker container for better isolation and scalability and is distributed in the Google Container Registry and Docker registry. You can run ESP on Google App Engine flexible, Google Kubernetes Engine, Google Compute Engine, open-source Kubernetes, or an on-premises server running Linux or Mac OS.

Introducing rollout_strategy: managed


APIs are a critical part of using cloud services, and Cloud Endpoints provides a convenient way to take care of API management tasks such as authorization, monitoring and rate limiting. With Cloud Endpoints, you can describe the surface of the API using an OpenAPI specification or a gRPC service configuration file. To manage your API with ESP and Cloud Endpoints, deploy your OpenAPI specification or gRPC service configuration file using the brand new command:

gcloud endpoints services deploy

This command generates a configuration ID. Previously, in order for ESP to apply a new configuration, you had to restart ESP with the generated configuration ID of the last API configuration deployment. If your service was deployed to the App Engine flexible environment, you had to re-deploy your service every time you deployed changes to the API configuration, even if there were no changes to the source code.

Cloud Endpoint’s new rollout_strategy: managed option configures ESP to use the latest deployed service configuration. When you specify this option, ESP detects the change to a new service configuration within one minute, and automatically begins using it. We recommend that you specify this option instead of a specific configuration ID for ESP to use.

With the new managed rollout deployment strategy, Cloud Endpoints becomes an increasingly frictionless API management solution that doesn’t require you to re-deploy your services or restart ESP on every API configuration change.

For information on deploying ESP with this new option, see the documentation for your API implementation:

More reading 

API design: Which version of versioning is right for you?



There's a lot of advice on the web about API versioning, much of it contradictory and inconclusive: One expert says to put version identifiers in HTTP headers, another expert insists on version identifiers in URL paths, and a third says that versioning of APIs is not necessary at all. (For some examples of those divergent views, take a look at this blog post and its bibliography and this interview with the author of the original REST dissertation).

With all this information, who’s an API developer to believe?

Here on the Apigee team at Google Cloud, we design, use and see a lot of APIs, and have developed some strong opinions about the right way to version them. My aim is to bring some clarity and offer unequivocal practical advice.

A significant part of the confusion around API versioning stems from the fact that the word “versioning” is used to describe at least two fundamentally different techniques, each with different uses and consequences. Understanding the two different meanings of API versioning is a prerequisite to deciding how and when to use each.


Type 1 versioning: format versioning


Consider the example of a bank that maintains accounts for customers. These accounts have been around for a long time and customers can get information about these accounts through multiple channels (the postal service, the telephone or the bank website, for example).

In addition, the bank has a web API that allows access to accounts programmatically. At some point, the bank sees a way to improve the web API; to attract more developers to it they decide to introduce a new version. I don't think there's an API designer who hasn't one day wished they had organized the information in some request or response differently.

The important point in this example is that version 1 and version 2 of the API both allow access to the same bank accounts. The API change introduces no new entities; versions 1 and 2 simply provide two different "formats" [my word1] for manipulating the same bank accounts.

Further, any change made using the version 2 API changes the underlying account entity in ways that are visible to clients of the version 1 API. In other words, each new API version defines a new format for viewing a common set of entities. It’s in this sense that I use the phrase "format versioning" in the rest of this post.

Format versioning seems straightforward, but it presents a technical challenge: how do you ensure that any changes made to an entity using one version of the API are seen correctly through all other versions of the API, and how do you ensure that changes made by older clients don't undo or corrupt changes made by newer clients?

In practice, this means that format versioning often can’t accommodate major functional changes. Another problem is that if a lot of format versions are introduced, the server code can become bloated and complex, raising costs for the API publisher.

Type 2 versioning: entity versioning


Given that format versioning can be complex to implement, and that it cannot be used for substantial changes that would cause older API clients to stop working, it’s a good thing that there’s another way to version APIs.

Consider how car manufacturers do versioning. Car manufacturers typically introduce a new generation of popular models every 4 or 5 years. Car model generations are sometimes also called marks; they're a form of versioning. When you buy a specific car of a particular model, it’s a generation 2 car or a generation 3 car, but not both. Car manufacturers will recall cars to fix faults, but they can’t make your generation 2 car look and behave like a generation 3 car on demand.

This model can work well for software too. Extending the bank example, imagine that the bank wants to introduce checking accounts based on blockchain technology, which requires the underlying data for the account to be organized quite differently. If the API that was previously exposed for accounts made assumptions that are simply not compatible with the new technology, it's not going to be possible to read and manipulate the blockchain accounts using the old API. The bank’s solution is the same as the car company’s: introduce "version 2" checking accounts. Each account is either a conventional account or a blockchain account, but not both at the same time. Each version has its own API that are the same where possible but different where necessary.

While "entity versioning" is attractive for its flexibility and simplicity, it also is not free; you still have to maintain the old versions for as long as people use them.

You could think of entity versioning as a limited form of software product lines (SPLs), where the product line evolves along a single time dimension. There's quite a bit of material on the general topic of SPLs on the internet.

New entity version or new entity?


From a technical perspective, introducing a new "entity version" is very similar to introducing a brand new entity type. The argument by Roy Fielding, the creator of REST, that you don't need to version APIs at all seems to be based on this idea.

Fielding makes an important point, but there are good reasons not to present a new API for similar concepts as something completely unrelated. We have all seen advertisements that say something like "Acme Motors introduces the all-new model Y," where model Y is an existing car model that Acme Motors has been selling for years. Not every statement made in car advertisements is accurate, but when car companies say "all-new," they're usually being truthful in the sense that there isn't a single part on the new model that is the same as on the old one.

So if the model is really "all-new," why is it still a model Y, and not some completely different model? The answer is that the manufacturer is choosing to emphasize brand continuity; although it's "all-new," the new model serves the same market segment in a similar way, and the manufacturer wants to leverage its existing investment in the model Y brand.

The bank has a similar choice to make when it introduces accounts based on blockchain technology. Although the new accounts are based on a different technology, they provide all of the basic capabilities of traditional accounts to the same customers. They may offer additional capabilities, or they may just hold the promise of greater security, or shorter delays in transferring money. The bank likely chooses to present them as a new version of their traditional bank accounts, rather than a completely different financial product.

One of the reasons why this form of versioning works well for car companies is that customers replace their cars every few years, so there's a natural migration from older to newer inventory over time. Banking customers may keep bank accounts for a long time, so the bank may want to offer to automatically migrate their customers to move them onto the new version more quickly. If your company's products are digital technology products, the natural inventory turnover may be even more rapid, which makes this versioning model even more attractive.

When is versioning required?


It turns out that in practice, many APIs don’t need versioning of either kind. We can say with certainty that some of our own APIs (Apigee’s management API is one example) have never had the need for a subsequent version or only in limited portions of the API.

One reason why many APIs never need versioning is that you can make many small enhancements to APIs in a backwards-compatible way, usually by adding new properties or new entities that older clients can safely ignore. Your first thought should always be to try to find a backwards-compatible way of introducing an API change without versioning; versioning of either sort should only be attempted if that fails. Fortunately, there are things you can do up front when designing your API to maximize the number of changes that can be made in a backwards-compatible way. One example is using PATCH instead of PUT for updates. You can also add a couple of random and meaningless properties to every entity representation; this will test the clients' ability to ignore future properties they haven't seen before.

One of the simplest changes that might motivate a version change is to change the name of a particular property or entity type. We all know that naming is hard, and finding the right names for concepts is difficult even when the concepts are clear. Names that were carefully chosen during development often become obsolete as concepts evolve and ideas become clearer. A format version can be an opportunity to clean up nomenclature without changing the fundamental meaning of entities and their properties. Format versioning lends itself to this sort of change—the same changes can be made by old and new clients using different names for the same things.

Another example that can motivate a version change is restructuring the representation of a resource. Here's an example of two ways of structuring the same information in JSON:

{"kind": "Book",
 "name": "The Adventures of Tom Sawyer",
 "characters": {
   "Tom Sawyer": {"address": "St Petersburg, Mo"},
   "Huckleberry Finn": {"father": "Pap Finn"}
  }
}

{"kind": "Book",
 "name": "The Adventures of Tom Sawyer",
 "characters": [
   {"name": "Tom Sawyer", "address": "St Petersburg, Mo"},
   {"name": "Huckleberry Finn", "father": "Pap Finn"}
  ]
}

One of these formats encodes the list of characters as a JSON object keyed by the characters' name, and the other encodes it as a JSON array. Neither are right or wrong. The first format is convenient for clients that always access the characters by name, but it requires clients to learn that the name of the character is to be found in the place that a property name is usually found in JSON, rather than as a property value. The second format does not favor one access pattern over another and is more self-describing; if in doubt, I recommend you use this one. This particular representation choice may not seem very important, but as an API designer you're faced with a large number of options, and you may sometimes wish you had chosen differently.

Sadly, there's no practical way to write API clients that are insensitive to name changes and changes in data representation like these. A version format allows you to make changes like this without breaking existing API clients.

Browsers are able to survive HTML webpage changes without versioning, but the techniques that make this work for browsers—e.g., the ability to download and execute client code that is specific to the current format of a particular resource, enormous investment in the technology of the browser itself, industry-level standardization of HTML, and the human user's ability to adapt to changes in the final outcome—are not available or practical for most API clients. An exception is when the API client runs in a web browser and is loaded on demand each time an API resource is accessed. Even then, you have to be willing to manage a tight coordination between the team producing the browser code and the team producing the API—this doesn't happen often, even for browser UI development within a single company.

A very common situation that usually requires an entity version change, rather than just a format version change, is when you split or merge entity hierarchies. In the bank example, imagine that Accounts belong to Customers, and each Account entity has a reference to the Customer it belongs to. Because some customers have many Accounts, the bank wants Accounts to be grouped into Portfolios. Now Accounts need to reference the Portfolio they belong to, not the Customer, and it's the Portfolio that references the Customer. Changes like this are hard to accommodate with format versions, because older clients will try to set a property linking an Account to a Customer and newer clients will try to set a property linking an Account to a Portfolio. You can sometimes find ways to make both sets of clients work in cases like this, but more often you are forced to introduce new entity versions, each of which is updated using only one API format.

The sort of structural changes that force a new entity version usually introduce new concepts and new capabilities that are visible to the user, whereas the changes handled by format version changes are more superficial.

In general, the more clients an API has, and the greater the independence of the clients from the API provider, the more careful the API provider has to be about API compatibility and versioning.

Providers of APIs sometimes make different choices if the consumers of the API are internal to the same company, or limited to a small number of partners. In that case they may be tempted to try to avoid versioning by coordinating with consumers of the API to introduce a breaking change. In our experience this approach has limited success; it typically causes disruption and a large coordination effort on both sides. Google uses this approach internally, but at considerable cost—this article describes some of Google's investments to make it work. It is usually much better for API providers to treat internal users and partners as if they were external consumers whose development process is independent.

Choosing the appropriate technique


You can probably see already that format version and entity versioning are fundamentally different techniques that solve different problems with different consequences, even though they both sail under the flag of versioning.

So when should you choose to do format versioning versus entity versioning? Usually the business requirements make the choice obvious.

In the case of the bank, it isn’t feasible to introduce a new entity version of an account in order to enable an API improvement. Accounts are stable and long-lived, and moving from old accounts to new ones is disruptive. A bank is unwilling to inconvenience its banking customers just to make life better for API developers. If the goal is just to improve the API, the bank should pick format versioning, which will limit the sort of changes that they make to superficial improvements.

The bank should consider introducing a new entity version if there's significant new value that it wants to expose to its banking customers, or if it's forced to do so for security or regulatory reasons. In the case of blockchain accounts, there may be publicity value as well as practical value. Entity version upgrades are less common than format versioning changes for established services, but they do happen; you may have received messages from your bank telling you about a significant technology upgrade to your accounts and alerting you to actions you need to take, or changes you will see.

Entity versioning puts an additional burden on API clients, because the older clients cannot work with the newer entities, even though they continue to work unchanged with the older ones. This puts pressure on client developers to produce a new client application or upgrade an existing one to work with the new API.

Entity versioning can work well for technology products, where the the users of the API and the core customers are often one and the same and rapid obsolescence is considered normal.

How do you implement the different versions of versioning?


On the web, you often see conflicting advice on whether or not a version number should appear in the URLs of a web API. The primary alternative is to put the version ID in an HTTP header. The better choice depends on whether you're doing format versioning or entity versioning.

For format versioning, put the version identifier in an HTTP header, not in the URL. Continuing the banking example, it’s conceptually simpler for each account to have a single URL, regardless of which format the API client wants to see it in. If you put a format version identifier in the URL, you are effectively making each format of each entity a separate web resource, with some behind-the-scenes magic that causes changes in one to be reflected in the other.

Not only is this a more complex conceptual model for users, it also creates problems with links. Suppose that in addition to having an API for accounts, the bank also has an API for customer records, and that each account contains a link to the record for the customer that owns it. If the developer asks for the version 2 format of the account, what version should be used in the link to the customer record? Should the server assume that the developer will also want to use the version 2 format of the customer record and provide that link? What if customer records don't even have a version 2 format?

Some APIs that put version identifiers in URLs (OpenStack, for example, and at least one bank we know) solve the link problem by having a “canonical” URL for each entity that's used in links, and a set of version-specific URLs for the same entity that are used to access the entity's format versions. Clients that want to follow a link have to convert a canonical URL in a link into a version-specific URL by following a documented formula. This is more complex for both the provider and the client; it's simpler to use a header.

The usual objection to putting format version identifiers in a header is that it's no longer possible to simply type a URL into a browser to test the result of a GET on a specific version. While this is true, it's not very hard to add headers in the browser using plugins like Postman, and you'll probably have to set headers anyway for the Authorization and Accept headers. If you'ew using the cURL shell command to test your API, adding headers is even simpler. You'll also need more than just the browser to create, update or delete requests to your API, so optimizing for GET only helps for one scenario. Your judgement may be different, but I have never found it very onerous to set a header.

There's no standard request header that's ideal for the client to say what format version it wants. The standard "Accept" header specifies which media types the client can accept (e.g., json, yaml, xml, html, plain text), and the standard "Accept-Language" header denotes which natural languages the client can accept (e.g., French, English, Spanish). Some API designers (e.g., the authors of the Restify framework) use a non-standard header called "Accept-Version". If you're doing format versioning, I recommend this header. The standard "Accept" headers allow the client to give a list of values they accept, and even provide a weighting for each. This level of complexity isn’t necessary for "Accept-Version"; a single value is enough. If you're meticulous, you should set a corresponding "Content-Version" header in the response. Further, it can be useful for clients if the server also puts the format version in the body of the response; in fact, if the representation of one resource is embedded in another, the body is the only place to put it. [This argument applies to a number of the standard headers too: e.g., Etag, Location, and Content-Location.]

By contrast, if you're doing entity versioning, the version identifier will appear somewhere in the URL of each entity—usually either in the domain name or the path. Users of the API do not have to be aware of this; for them, it's just the entity's URL. The version identifier will appear in the URL because the URL has to contain information for two different purposes: for routing requests to the correct part of the implementation for processing, and for identifying the entity within that implementation. Because requests for entities that belong to two different entity versions are almost always processed by a different part of the implementation or use different storage, the version identifier (or a proxy for it) must be somewhere in the URL for your routing infrastructure or implementation to use.

Coincidentally, banking provides a simple illustration of the principle that identifiers contain information for both routing and identification. If you have a checking account at a U.S. bank (the details are different in other countries, but the idea is similar), you'll find two numbers at the bottom of each check. The first is called the routing number. It identifies the institution that issued and can process this check. The second number identifies the check itself. Conceptually, entity URLs are like the numbers at the bottom of a check, though their formats may be different.

Do I have to define my versioning strategy up front?


You'll sometimes hear the advice that you must define a versioning strategy before you release your first version, or evolving your API will be impossible. This is not true.

You can always add a new versioning header later if you find the need to do format versioning and you can always add new URLs for new entities for a different entity version. Any requests that lack the format versioning header should be interpreted as meaning the first format version. Since instances of a new entity version get new URLs, you can easily introduce a version ID in those URLs without affecting the URLs of the entities of the first version. The new URLs may use a new hostname rather than adding path segments to URLs on the original hostname; whether or not you like that option will depend on your overall approach for managing hostnames.

Procrastination can be good


Laziness is not the only reason why you might not add versioning to the initial version of your API. If it turns out that versioning is never needed for your API, or for significant portions of your API, then the API will look better and be easier to use if it doesn’t include versioning in its initial release.

If you introduce an "Accept-Version" header in V1 of your API in anticipation of future "format versions" that never materialize, then you force your clients to set a header unnecessarily on every request.

Likewise, if you start all your URLs with the path prefix '/v1' in anticipation of future "entity version" introductions that never happen, then you make your URLs longer and uglier than they need to be.

More importantly, in both cases you introduce a complex topic to clients that you didn’t need to introduce.

Some more versioning tips


If you use versioning, make it clear what sort of versioning you use. If there'a a field in your HTTP requests and responses that says "version: V1," what does that mean? Does V1 apply to the persistent entity itself (entity versioning), or does it reflect the format in which the client asked to see the entity (format versioning)? Having a clear understanding of which versioning scheme or schemes you use helps your users understand how to use your API as it evolves.

If you're using format versioning and entity versioning together, signal them with different mechanisms. Format versions should go in headers—Accept-Version and Content-Version—in the request and response. Format versions can also be included in the bodies of responses and requests, for those requests that have them. Entity versions (which are really part of the entity type) belong in the request and response bodies; they're part of the representation of the entity.

Do not try to put versioning identifiers of either kind or entity type identifiers into the standard Accept or Content-Type headers; those headers should only include standard media types like text/html or application/json. Avoid using values like application/v2+json or application/customer+json; the media-type is not the place to try to encode version or type information. Unfortunately, even some of the web standards do this the wrong way, for example application/json-patch+json.

Don't put words like "beta" or "alpha" in version IDs for either format versioning or entity versioning. When you move from alpha to beta, or beta to general availability, you're making a statement about your level of support for the API, or its likely stability. You don't want to be in a position where the API version changes just because your level of support changes; you only want to change the version if there's a technical or functional reason for changing it. To illustrate this point, imagine I am a customer who develops a number of client applications that are using the V1beta4 version of an interface—a late-beta version. The API provider declares the product to be GA, and introduces the V1 version of the API, which is actually exactly the same as the V1beta4 API, since there were no breaking API changes between V1beta4 and GA. The V1Beta4 version of the API is still available, so my client applications don't break, but the language of the support agreement is clear—only users of the V1 version get full product support. The change to my client applications to upgrade to V1 is small—I only have to change the version number I'm using, which may even be as simple as recompiling with the latest release of the vendor-provided client libraries—but any change to my applications, no matter how small, needs to go through a full release process with QA testing, which costs me thousands of dollars. This is very annoying.

Hopefully this post helps bring a little more clarity to the topic of API versioning, and helps you with your design and implementation choices. Let us know what you think.

For more on API design, read the eBook, “Web API Design: The Missing Link” or check out more API design posts on the Apigee blog.

1 Representation would be another word for this concept that might be better aligned with REST terminology. For reasons I can't explain, the term "representation versioning" is not as appealing to me as "format versioning". 

Now, you can automatically document your API with Cloud Endpoints



With Cloud Endpoints, our service for building, deploying and managing APIs on Google Cloud Platform (GCP), you get to focus on your API’s logic and design, and our team handles everything else. Today, we’re expanding “everything else” and announcing new developer portals where developers can learn how to interact with your API.

Developer portals are the first thing your users see when they try to use your API, and are an opportunity to answer many of their questions: How do I evaluate the API? How do I get working code that calls the API? And for you, the API developer, how do you keep this documentation up-to-date as your API develops and changes over time?

Much like with auth, rate-limiting and monitoring, we know you prefer to focus on your API rather than on documentation. We think it should be easy to stand up a developer portal that’s customized with your branding and content, and that requires minimal effort to keep its contents fresh.

Here’s an example of a developer portal for the Swagger Petstore (YAML):

The portal includes, from left to right, the list of methods and resources, any custom pages that the API developer has added, details of the individual API method and an interactive tool to try out the API live!

If you’re already using Cloud Endpoints, you can start creating developer portals immediately by signing up for this alpha. The portal will always be up-to-date; any specification you push with gcloud also gets pushed to the developer portal. From the portal, you can browse the documentation, try the APIs interactively alongside the docs, and share the portal with your team. You can point your custom domain at it, for which we provision an SSL certificate, and add your own pages for content such as tutorials and guides. And perhaps the nicest thing is that this portal works out of the box for both gRPC and OpenAPI—so your docs are always up-to-date, regardless of which flavor of APIs you use.

Please reach out to our team if you’re interested in testing out Cloud Endpoints developer portals. Your feedback will help us shape the product and prioritize new features over the coming months.

Introducing GCP’s new interactive CLI



If you develop applications on Google Cloud Platform (GCP), you probably spend a lot of time in the GCP command line. But as we grow our GCP services, the number of commands and flags is growing by leaps and bounds. So today, we’re introducing a new command line interface (CLI) that lets you discover—and use—all these commands more efficiently: gcloud interactive.

The Google Cloud SDK offers a variety of command line tools to interact with GCP, namely:

  • gcloud — GCP’s primary CLI 
  • gsutil — CLI to interact with Google Cloud Storage 
  • bq — CLI to interact with Google BigQuery 
  • kubectl — Kubernetes Engine’s CLI

Currently in public alpha, the new interactive CLI environment provides auto-prompts and in-line help for gcloud, gsutil, bq and kubectl commands. No more context-switching as you search for command names, required flags or argument types in help pages. Now all of this information is included as part of the interactive environment as you type!
The interactive environment also supports standard bash features like:

  • intermixing gcloud and standard bash commands 
  • running commands like cd and pwd, and set/use shell variables across command executions 
  • running and controlling background processes 
  • TAB-completing shell variables, and much more!

For example, you can assign the result of the command to a variable and later call this variable as an input to a different command:

$ active_vms=$(gcloud compute instances list --format="value(NAME)" --filter="STATUS=RUNNING")
$ echo $active_vms

You can also create and run bash scripts while you're in the interactive environment.
For example, the following script iterates all compute instances and restarts the ones that have been TERMINATED.

#!/bin/bash
terminated_vms=$(gcloud compute instances list --format="value(NAME)" --filter="STATUS=terminated")
for name in $terminated_vms
do
  echo "Instance $name will restart."
  zone=$(gcloud compute instances list --format="value(ZONE)" --filter="NAME=$name")
  gcloud compute instances start $name --zone $zone 
done