summaryrefslogtreecommitdiff
path: root/README.md
blob: 1677b7595204f5dc498ae5f508f04baba2c0812b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# SSSync
SSSync, a Simple and Stupid Synchronizer for data with multi-valued attributes.

## What is SSSync?
Simple and Stupid Synchronizer performs one-way synchronisation of any data that follow the "key => attributes => values set" data model. Each synchronisation task could have multiple sources and one destination. The multi-valued data model of SSSync is inspired of the one used for LDAP entries. Any stricter model is usable with SSSync, notably SQL resultset and more generally anything that sounds like a data table.

### Functionnal facts
 - Meant to be started periodically (ie. nightly)
 - No daemon, no persistent things, no internal data change tracking
 - Dry-run mode and safeguards (max exec time, max insert/update/delete operations)
 - Minimal data interpretation and mapping (SQL already got a marvellous "as" keyword)
 - Validate data against LDAP schemas, log problems in human-readable form
 - Never rewrites an already up-to-date data (master/slave friendly)
 - SysAdmin-friendly (crontab-aware, well-defined exit codes, log verbosity)

### Technical facts
 - Structured configuration files with YaML : simple and stupid, like the rest
 - No embded script language, no XML, no ORM mappings
 - Basically performs one full read of all sources at each run
 - Small code base (2k SLOC), low algorithmic complexity, key-sorted reads and comparisons
 - Memory footprint is low (< 64 Mio) whatever input data element count
 - Performs within seconds, throughput typically limited by destination write rate
 - Written in Java, uses great libraries like [Unbound ID's LDAP SDK for Java](https://www.unboundid.com/products/ldapsdk)
 - Could be self-contained in a single directory like portable apps
 - Packaging for Linux (.deb, .rpm) and Windows (.msi) are planned
 - Will probably never eat up more than 20 Mio of disk

## Connectors
### Already shipped-in
 - OpenLDAP (source/dest)
 - MySQL (source)
 - Oracle (source)
 - Fixed format CSV (source)
### Not yet done connectors
 - JDBC writer
 - LDIF reader (painfull because they could mix data and instructions)
 - Active Directory (because of the lack of real test environment)
 - Arbitrary CSV format (lack of formalism implies huge dev/test effort)

## Limitations (of current release)
 - Values are represented and compared as Java Strings (so, UTF-16)
 - Don't expect much about binary blobs or non-printable things
 - No password hashing utilities (but nobody still store cleartext passwords, do you ?)
 - No data manipulation and transformation in SSSync (maybe this is a feature)

## SSSync through examples
### Medium-sized university LDAP directory
 - Context : 10000 people, 1000 groups and structures, 4 different sync tasks
 - Main data source : human resources system stored in an Oracle database
 - Additionnal sources : 2 CSV (technical accounts, manual corrections)
 - Destination : OpenLDAP (master node, with many replicas via syncrepl)
 - Full run time including dry-run passes : 20 seconds 
(even in september when there is 400 new students and 50 employee updates to sync per night)
<p align="center">
 <img src="http://www.pouzenc.fr/projects/sssync/SSSync_Doc/diagrams/example1_flow.svg"
      alt="SSSync process with 3 combined sources : Oracle + 2 CSV, one destination : slapd"
      width="75%"/>
</p>

### Give me more examples
Please conact me if you have more examples to put here.