Before reading through this post please first refer to this post: Architecting a data synchronisation process between SAP and multiple internal mobile applications.
With some of our existing, offline capable, sales mobility systems we’ve experienced issues when a large numbers of users synchronise data at once. Typically, we would see all synchronisation activities error and not be able to complete mainly due to hardware constraints.
These events have happened when:
- physical servers are unreachable for a period of time,
- a software update requires a sync to take place,
- or potentially after a business requirement requesting users to sync data.
You can address the issue in a number of ways:
- by scaling out hardware,
- scaling up hardware,
- re-architect the solution,
- implement a control process
- or any combination of these.
In our most recent sales mobility projects we implemented a control process (a throttling mechanise) to combat the issue. The aim was to allow a certain mix of syncs (i.e. login, daily, hourly, etc) to be in flight at any point in time and being able to sync the fleet of users within an acceptable amount of time.
The table below provides a view of the different variables that needed to be taken into account and how they affect each other. The main one to point out is how the a concurrency increase also increases the average sync time.
Total Users: |
1000 |
1000 |
1000 |
1000 |
Concurrent Users: |
20 | 50 | 100 | 200 |
Sync time (min): |
4 | 5.5 | 7.5 | 10 |
Elapsed time to complete a full sync for all users (hours): |
3.3 | 1.8 | 1.3 | 0.8 |
So given a throttling level of 200 login syncs (our most expensive sync) we would be able to sync the fleet in 1 hour.
The implementation
The implementation of the throttling mechanism is extremely simple (I prefer simple). We could get away with simple because it is only being used by internal applications that we have control over.
The core of the mechanism is triggered via the login API, with some logic existing in the logout API. It heavily relies on the logging that occurs in the sync process.
During the login API call, made by the mobile app, a query is run against the database to determines what the current ‘Syncs In Flight’ score is. It does this querying all the sync records that have a null end date within the last 10 minutes and summing the ‘Sync Score’ value for that applications sync type.
The score is then compared to a global setting ‘MaxSyncScore’ (located in the web applications config file) to determine if any more syncs are allowed. If more are allowed, the sync is logged to the database and a success response it returned to the client device. If no more syncs are allowed a retry response is returned to the client that specified how many minutes to wait before retrying.
The login and log out process, in regards to the throttling mechanism, is shown in the activity diagram below.
This might be better explained by this snippet of code:
bool syncAllowed; lock (syncLock) { int syncPointsInFlight = this.GetSyncsInFlightScore(settings); syncAllowed = syncPointsInFlight < SyncPointsInflightMax; if (syncAllowed) { int syncId = this.loggerService.LogSync(applicationName, syncTypeName, sessionId, threadId, serverName, syncPointsInFlight); this.SaveSyncIdToSession(syncId); this.RemoveSyncsInFlightScoreFromCache(); } }
As described above the throttling mechanism leverages the existing infrastructure used when logging sync activities. The main table that was extended was the LogApplicationSyncTypes table to include a ‘SyncScore’ field. This field is used to assign a weighting or score to a particular applications sync type. The scores assigned are relative to each other i.e the login sync might be given a score of 4 where the hourly sync might be given a score of 1 because the login sync has been determined to be 4 times more expensive than an hourly sync after conducting load testing.
When a sync is initiated its logged to the ‘LogSyncs’ table with a ‘Start Date’ and a null ‘End Date’, that way we can determine which syncs might still be in flight.
Looking at the below table design you might also notice there is field on the ‘LogSyncs’ called ‘SyncPointsInFlight’ this is done to be able to better the support the system so at any point in time we can understand what load the system might have been under from a synchronisation perspective for the apps that enlist in the process .
All in all the throttling mechanism has been a step forward in making our systems more reliable by protecting our infrastructure and I hope this helps some of you when faced with solving a similar problem.
This is part of a 3 part series on aspects of implementing a synchronisation process with throttling capabilities: