BetterCloud processes ~1B events per day and makes event more API calls to make that happen. I am a platform architect which means that I think about cloud architecture and micro service patterns a lot. Machine to machine communication is an important part of scaling to 12,500 events / second.
Internal Synchronous Communication
We are both the owners and consumers of the API
While BetterCloud utilizes Kafka for asynchronous communication, there are certain times that we need information right now. These are usually requests to perform credential and permissions checks, or to perform event enrichment from databases owned by other teams. In this case we leverage JSON REST API's and symmetric request signing, specifically HMAC .
BetterCloud does most of these well, but here are a few challenges you should consider:
- Rate limiting
- Symmetric key rotation
- Changing contracts (data models, query parameters)
- Search query syntax
- Good documentation (<== this needs more bold)
We are only the consumers of the API
As of July 2019 BetterCloud integrates with 12 SaaS providers and ~30 APIs from those companies, e.g. Google Calendar, Google GSuite Admin, and Slack SCIMM. Most of those leverage some form of OAuth 2, while others require basic auth, api keys, or special IAM service account access. We query those API's for ~20TB of data on behalf of our customers each month.
Some provider API's are better than others but here are the things to keep in mind when integrating with one (or many) 3rd party API's:
- Quota, if you eat your lunch at 10am you're gonna be hungry
- Rate limiting, especially if they enforce exponential back off for "abusers"
- Intermittent errors
bugsfeatures. Seriously though, audit your code enough to know when it is "their" fault
- "We know about that bug, it's on our radar" => "We probably have a ticket rotting on some backlog somewhere..."
- Changes can happen without warning! Sometimes with warning, but also sometimes without.
- Do yourself a favor by investing in mocking tools early (before you run out of quota during testing)
We are the owners of the API
BetterCloud has been building out a micro service architecture since late 2015. But all of our REST endpoints have only been used for internal use cases. In January 2019 we exposed our Platform API to customers, partners, and an internal professional services team.
Things to remember:
- Don't do the stuff listed in the "API Client" section...
- Great documentation with lot's of code examples
- Protect yourself, consider DOS and find some good PEN testers.
- Once you expose something it becomes a contract point for a LONG time in the future
Use API's to automate internal processes
I lovingly stole this term from a conversation with some Postman employees. This could be anything from on-boarding automation to cloud infrastructure deployments. At BetterCloud, we leverage API's to help debug, manage, and remediate issues in our micro services. But keep in mind that MOST infrastructure tools, SaaS applications, and cloud providers are exposing management API's, e.g. Elasticseach, GCP.
- There might not be an
/undoendpoint so be careful
- Audit as much as you can so you know who is doing what, or at least who wrote that script that just deprovisioned your entire production Elasticsearch cluster.
REST API's are not the be-all and end-all of machine to machine communication. They are a great solution for flexible, light weight synchronous communication, but might be lacking in certain use cases. Messaging technologies like Kafka, PubSub, and RabbitMQ help with traffic spikes and handling back pressure. Protocols like RPC are used more often when high volume, low latency interactions are required. In addition to those patterns technologies, like GraphQL, are building on top of REST API's concepts to provide an opinionated type safe query API.
In short, find the tool and technique that best solve your need. Many times it will end up just be a simple, familiar REST API.