How Do We Handle Scaling?
Objectives
-
- No data loss
Any spike in traffic beyond allocated capacity should in the worst case lead to delays and never to data loss - Isolation
Spikes in traffic should be isolated to their accounts or the connected channels that cause them to leave other channels to operate at normal speeds
- No data loss
- Data flow
- Priority for real-time sync
- Support for spikes in traffic
- Fast upscale to avoid delays; slow downscale
Architecture
CrescoData’s platform runs exclusively in a serverless architecture and has been built from the ground up with this architecture in mind. The concept best describing the overall concept is nano-services which basically means:
- each action has its own isolated container, this includes every API endpoint - each of these containers can scale to accommodate the required traffic
- each datastore is an isolated no-SQL table
Implementation and key concepts
-
- Error handling in architecture not in code
A key concept deployed throughout CrescoData’s platform is to handle errors (including retries due to traffic spikes) on an architecture level thereby utilising key components of the AWS offering rather than in-code. This:- simplifies logic
- allows for scalability
- avoids data loss
- Follow serverless implementation methodology
At its most basic, this means not to use servers. More broadly speaking, however, this also refers to single points of failure that should be avoided.
Translating this into the correct data flows means:- avoiding states at any point during the sync flow thereby allowing each update to have all required information to be reprocessed - this also means that increasing traffic cannot introduce race conditions or errors due to states not being updated
- Error handling in architecture not in code
- translating bulk updates into record updates at the earliest possible level thereby avoiding single components with a large memory footprint and processing time and standardising the data flow into record updates
- Global platform auto-scaling
Auto-scaling is done platform-wide reading from a range of areas to scale others.
Using core AWS components throughout the platform means that each component can scale thereby providing a baseline of auto-scalability.
However, this is limited as each component only has visibility over its own traffic and backlog. A good example is a datastore that can only see how much traffic is currently coming in. However, the platform globally is aware of the traffic that is happening at a much earlier stage.
Auto-scaling is therefore done often in preparation of traffic prior to it reaching the components that would be backlogged otherwise.