How to implement two way database synchronization with php?

TheJokerAeZ · December 13, 2015, 11:07am

I am building a website which needs be able to use CMS online as offline. So I would like to implement two-way database synchronization. How will I be possible to implement it in simple way and what will I need for it?

I only need to save data in online and offline database, which will be with mysqli? I would like to do this with PHP.

the Admin must be able to save data offline, which I think about local database. But when he is has connection with network, de data must be synchronizate with the online database server. But when user hasn’t a connection than he needs to send data to local database. Because he can’t connect online database without network connection.

droopsnoot · December 14, 2015, 10:15am

Isn’t there a database replication feature with MySQL that could handle keeping the two databases synchronised? I can’t get my head around exactly where your PHP code will be running - that is, will it always run on a local PHP server, but connect either to an online database or, if that fails, connect to the local version, or will the users try the online site and then the local one (which it might be possible to automate, I don’t know)?

Dormilich · December 14, 2015, 12:22pm

something like master/slave?

see also https://dev.mysql.com/doc/refman/5.7/en/replication.html

droopsnoot · December 14, 2015, 12:49pm

Kind of, but I was thinking that data updated onto the slave while the master is unavailable would find its way back to the master, so it doesn’t really matter which was which, once the comms catches up all the databases would be the same. I’ve only read it briefly, but it seems that doesn’t work in that way. The only experience I have (and I wasn’t directly involved in that) was with MS SQL Server replication, which did seem to work that way, with multiple subscribers that would replicate back to the publisher as required. I may have misunderstood it, to be fair.

TheJokerAeZ · December 14, 2015, 1:07pm

Master/slave is not possible for this, because i shall give you some example:

LOCAL TABLE: REMOTE TABLE:
----Projects---- ----Projects----
id | id |
title | title |

if there is 1 user working offline and 1 user is working online. The user that is working offline has synced yesterday the remote table. The online user that has added a new project has id:4
but offline user has not record with id: 4 and he tries to add a new project. The adding will be easy for syncing because you can get last record of remote table and append it after the last record. But edit shall bring problems. if online user has deleted a record which exits at local db.

I would like to use LOCALHOST db for local and a server for online.

droopsnoot · December 14, 2015, 1:24pm

Although you’re probably correct that master/slave won’t do for your scenario, for any database replication scenario you’re going to have to design the database to deal with these things, nothing is going to handle assigning IDs like that unless it can know what IDs both servers can assign, and it cannot know that when the other server is offline. So you must ensure that stuff used to identify records must be unique, perhaps by adding in a site code to differentiate between the two different records with id=4 in your example.

TheRedDevil · December 14, 2015, 3:31pm

How complicated this is, depends on the amount of tables and how complicated your database model is.

In short, you need to save the current state of the last update for each table on the “offline” database. Basically, you need to know which was the last ID for each table online and offline when you last synced.

Based on that information, you can then decide how to merge any new data on the next merge. Though this is the complicated process, since it is possible you need to move some of the offline records to new IDs, so you can insert those from the online database.

To make the matter worse, if the plan is to allow editing of records as well, you also need to double check if the record has been edited by someone else as well, and then ask the user which of the changes should be kept by showing them side by side.

Gandalf · December 14, 2015, 3:37pm

Perhaps maintaining a “journal” of changes (adds, mods and deletes) made to each database would help in synchronising. That way you can just read the journal from each server, rather than trying to check all the records on both databases to see what, if anything, has been changed.

droopsnoot · December 14, 2015, 7:07pm

Yes, that’s a common way, “log shipping” is a term I believe, you simply log everything that happens on one database, send it to the other one, and apply the same changes. That relies on never missing a transaction, though, so some mechanism for checking that, and what to do if you detect it, is required.

felgall · December 14, 2015, 8:06pm

It is more complicated than that. If you insist on using ids then you need to know all the other tables that use it as a foreign key so that you can ensure that all of those get changed as well when you have to update the value in one database because it is already used for different data in the other.

You make the synchronisation a whole lot simpler if you use real fields for the keys instead of creating ids.

TheRedDevil · December 14, 2015, 11:21pm

Yes, this is correct. The entire database model has to be synced/updated accordingly to the database model. That is why you need to store the changes made, and act on them. That include all of the tables both for possible changes to inserted records as well as edited/updated records.

If a record has to be moved using foreign keys will actually make this easier if the “On Update” setting is set to CASCADE. However this does not help in regards for checking for updates done to the related records, which has to be done anyway.

That depends, if you have a lot of relationships and/or tables, it will not make it any easier if you have to do this for each table. You also need to make certain you have a collision protection, making certain there will never be created identical keys. While GUID seems a obvious solution here, it has horrible performance impacts on your database compared to INT primary key.

What it does help for is that you do not need to change the ID (primary key) on records when syncing. Though if InnoDB is used together with Foreign Keys, all that need to be updated is the main record.

felgall · December 15, 2015, 12:59am

You do if the same key has been used for completely different data in the two databases.

At least if you use real field values as the keys then chances are if the same value has been added separately to both databases that the actual data will be at least intended to be the same. Which brings up a further complication - what happens if an attempt has been made to add the same data to both databases but either the keys don’t match because a meaningless int value is being used or the data is not quite the same for both - how do you detect that both are intended to be the same entry and what do you do when the data between them is slightly different?

TheRedDevil · December 15, 2015, 2:42pm

I think we have a misunderstanding, when I said ID, it is meant as a “identification value”. If we are talking about a user table, this could be the email (assuming it should be unique), on a many to one relationship, it would be a int.

I never said that using a known unique value as a primary key (assuming it is a value that will work well in a btree) is something that should not be done in this case.

The larger the database system, the more difficult it will be to not also rely on int keys. Even on a simple database model with only 50-100 tables, it becomes cumbersome to try to avoid it.

The process in the end, is not that very different if you use a char key or int key. The content need to be reviewed and then synced according to the business rules. If a record is similar on both systems, it still need to be reviewed and if the content is not identical, a manual merge has to be approved.

For speeding up a review process on similar records, a hash can be added as well, which contain all the columns in a set direction, case etc. That way to know if it can be merged without issues, all that is required is to check the hashes.

felgall · December 15, 2015, 5:42pm

In most cases an int only has value as a key if it uniquely identifies the rest of the data. With two databases to synchronise that only works if you can guarantee reserving that value in both databases at the start.

If you grab 4 on one database and I grab 4 on the other then that value is useless as how does anything know whether 4 on the one database means the same or something completely different on the other.

TheRedDevil · December 16, 2015, 12:33am

That is why it has to be synchronized as I mentioned in the first post, in which process these records would be rearranged as required by the business rules. This could be as easy as those that come first to the online db, keep the position and the new ones are moved, or even by sorting this by created date etc.

In the end, if we were to create a char/varchar key for all tables, we would run into exactly the same issue. There is cases where a relationships would have no natural key, in these cases you are stuck with GUID (horrible btree performance) or a homegrown algorithm for making it (could be everything from collision issues to btree issues), in this case you resort to a surrogate key anyway just making it a varchar/char instead of int.

No matter how we look at this, a sync and journal system has to be created. This system has to follow the business rules, and handle all the odd cases that could happen like the ones we have discussed. Which type is used (surrogate/natural) does not really matter, both has their pros and cons which has to be handled and addressed.

system · March 16, 2016, 7:36am

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.