My team and I have over the last 12 months have been working through architecting the ability for internal mobile applications to synchronise data largely stored in SAP based system to local storage for offline use. The data synchronisation process has mainly been from SAP ECC SAP CRM to a sales mobility application and an equipment services mobile application.
The synchronisation process is composed of two main areas, each with their individual nuances which I’ll delve into later. These two areas include:
- Master data synchronisation process that involves data being written from master systems to a staging environment.
- Client data synchronisation process that involves data being retrieved from a staging environment to the client consumer.
Here is a graphical representation of the architecture:
Before I go into each individual area its important to call out some implementation aspects of the architecture that required attention:
- Data change identification – being able to identify a record has changed and doing so with consistency.
- Data change frequency – understanding if data changes real-time/hourly/daily/weekly/adhock, are there periods if increased frequency etc.
- Data change volume – when data changes, how much of it changes?
- Total data volume – the total amount of data expected to be stored in the staging environment, per dataset, at any point in time.
- Date retention periods – will the interfaces used to stage the data also be responsible for deleting old data or is it something the staging environment needs to handle.
Here is an obfuscated version the data transfer analysis document used to capture the data:
Master data synchronisation process
The master data synchronisation process involved replicating master data from our ERPCRM systems. The choice to stage the data was largely made for:
- data that changes relatively infrequently
- or where there might be potentially performance implications if the client connected directly to those source systems when they needed it and the data needed didn’t require any complex computation in the staging environment.
Here is obfuscated version of the architecture:
Each dataset had SAP-ABAP (the most common coding language for SAP development) program written with a proxy into SAP-PI (SAP middleware technology similar to Microsoft BizTalk) that mapped directly to a Microsoft SQL Server database tables or set of tables. These programs are scheduled into SAP-CPS job chain (an SAP job scheduling tool similar to Windows schedule or SQL Server Agent scheduler).
We specifically chose not to have SAP-PI write to staging tables and then build other jobs to transform that data into core tables as is a common approach when interfacing data between systems. Some pros and cons of this decision include:
Positives | Negatives |
Simpler solution – we were using PI as middleware so it was decided to leverage it as much as possible | System performance – we had a few performance problems that required us to tweak one or more of these:
|
Provided greater agility both initially and when changes were required | We tried limit the number of datasets being interfaced in real-time in an effort to minimise potential performance impacts. |
Speed – because the solution had less moving parts it would complete in a shorter time then the alternate solution | |
Leveraging existing systems process for data quality issues – by leveraging SAP-PIs established processed we were able to fit into existing processes when there were data quality issues. |
A gotcha we encountered was ensuring all the fields we cared about on various datasets from SAPs source systems were enlisted properly in SAPs change tracking process. Bugs in this area were typically logged as client side issue or something further down stream than where the issue was which at times required many man hours to track down.
Before I move onto talking about the client synchronisation process it best I call out its reliance on a ‘Modified Date’ field we have on all our master data tables and how we ensured its consistency across datasets, particularly from different systems.
In order to ensure a reliable ‘Modified Date’ was set on each imported record across datasets we created update triggers on all master data tables in the staging environment thereby creating a reliable reference point from which the client data synchronisation process could be architected.
NOTE: At the time of implementation SAP-PIs MS SQL Server adapter was not able to embed a GETDATE() command in an update or insert statement which would have been the most performant approach to the problem.
Client data synchronisation process
The client data synchronisation process is triggered differently to that of the master data process in that the client pulls the data from the staging environment either via a particular interaction on the app or via a background scheduler.
There are a number different types of sync’s built into the apps, these include:
- Login sync – is the first sync that occurs when the app has no data i.e. it happens just after a login.
- Daily sync – is tigered by the device after an 8 hours period when it has connectivity and is active.
- Hourly sync – is tigered by the device after a 2 hours period when it has connectivity and is active.
- Manual sync -is triggered by the user from a ‘Sync’ button within the app.
Client synchronisation are controlled by the client, as you would imagine, and can include one or more of the available synchronisation APIs available on the server. The ones included are dictated by the requirements of the application, frequency of change of the data, etc.
Here is an obfascated graphical representation of the setup that exists:
Sync API Name | Login Sync |
Daily Sync |
Hourly Sync |
Manual Sync |
Login | x | x | x | x |
SyncCodesAndValues | x | x | x | |
SyncProducts | x | x | x | |
SyncCustomers | x | x | x | x |
Logout | x | x | x | x |
Each synchronisation API has a common signature in both its request parameters and returned response.
These common fields in the request include:
- SyncId – used to instrument the sync process.
- SyncFromDate – The date from which the last sync occurred.
- SyncToDate – The date at which the sync was initiated (same date it send into all APIs that form a sync type).
- PageSize – The total number of records to return in one response.
- PageNumber – used by the client to conjunction with the PageSize and TotalRecord count from the request to pull down the data necessary.
The common fields in the response include:
- TotalRecords
That covers the foundation of the client sync process but here a few other points worth mentioning:
- The SyncFromDate and SyncToDate are used to query the ‘Modified Date’ field on tables in the staging environment as spoken about earlier.
- The client device keeps track of the last time it completed a synchronisation per API. That way if an API is called in multiple sync types is only going to get data from the API from the last time it completed a sync.
- When the client initiates a sync it subtracts 5minutes from the last sync DateTime it has stored against an API. This creates some overlap between syncs to provide some fault tolerance against timing differenceissues that might occur in the system.
- On all but the Full sync the client stores the API response message and only processes them at the end of the last successful API call.
- If an error occurs during a sync the client device will abandon the current sync and re-initiate it from the start.
Importance of the login API in the sync process
The login API plays an important role in the sync process not only because it controls the authentication and authorisation elements of process but because it also:
- Enlists the sync in a throttling process which I explain in more detail in this post:
- Tracks the version of the app being used which I explain in more detail in this post:
- Starts the logging of the users sync operation to the database.
Tables used to store client synchronisation operations
There are a series of tables used to log the synchronisation process. These include:
- LogApplications – stores the names of the apps that will call the sync APIs.
- LogSyncTypes – stores the names of the different syncs that exist.
- LogApplicationSyncTypes – describes the types of syncs an application is configured for. This table becomes more important in the throttling process.
- LogSyncs – stores one records for each sync that gets initiated by the login API.
- LogSyncDetails – stores a record for each sync API call. You can find a the APIs URI and part of the actual request message.
This is part of a 3 part series on aspects of implementing a synchronisation process with throttling capabilities: