go to Japanese Site

ScSLINT

ScSLINT is a scalable interlinking framework designed for time and memory efficiency. ScSLINT can be used to detect all co-referent resources between given repositories: Rsource and Rtarget. ScSLINT is developed as an generalization of SLINT+. ScSLINT is written in C++ and currently is compatible with Windows and Linux system.

Summary

The architecture of ScSLINT

The architecture of ScSLINT

  • Step 1. Property alignment generator creates the property alignments between Rsource and Rtarget.
  • Step 2. Similarity function generator assigns similarity measures for each property alignment.
  • Step 3. Candidate generator detects the candidates of potentially co-referent instances.
  • Step 4. Configuration creator generates a default configuration. A configuration describes similarity functions, similarity aggregator, and co-reference filter.
  • Step 5. Similarity aggregator executes similarity functions and computes final matching score.
  • Step 6. Co-reference filter applies declared contrainsts (e.g. thresholding and stable matching) on matching scores and produces the final result.
  • Remark: ScSLINT can work automatically and also support intervention (e.g., from user) on each step. Using ScSLINT, user can introduce any new mechanism (e.g., similarity functions and similarity aggregator) for any step.


Systems and Algorithms that use ScSLINT as the base framework

  • cLearn: Heuristic-based configuration learning algorithm. cLearn finds the optimal configuration using labeled pairs of instances.
  • cLink: Supervised instance matching system.
  • ScLink: Scalable supervised instance matching system.

Benchmark

The following table is the result of ScSLINT when tested with OAEI 2012 dataset, using default configuration generated by ScSLINT. Two complex similarity functions used for strings: Levenshtein and TF-IDF Cosine. Time (Step 1, 3, 5, 6) is measured on Intel core i7 4770 CPU, 8GB RAM.
Dataset Size (x10^9) Candidates (x10^6) Similarity
functions
Step 1 Step 3 Step 5 Step 6
nyt.loc.gn 32.69 32.2 12 37s 7s 70s 3s
nyt.loc.db 16.06 38.2 25 43s 8s 268s 6s
nyt.org.db 25.47 61.7 17 46s 11s 404s 6s
nyt.peo.db 41.66 46.9 22 46s 11s 251s 6s
nyt.loc.fr 154.97 222.7 23 14s 111s 641s 29s
nyt.org.fr 245.7 357.4 16 14s 268s 1023s 46s
nyt.peo.fr 401.89 620.1 18 15s 507s 1578s 78s

Download

Contact

Khai Nguyen:
Ryutaro Ichise:


Reference

[1] Khai Nguyen, Ryutaro Ichise, Bac Le. Interlinking Linked Data Sources Using a Domain-Independent System. In Proceedings of the 2nd Joint International Semantic Technology Conference. LNCS, vol. 7774, pp. 113-128. Springer (2013)
[2] Khai Nguyen, Ryutaro Ichise, Bac Le. SLINT: A Schema-Independent Linked Data Interlinking System. In Proceedings of the 7th Ontology Matching, CEUR-WS.org, vol. 946. (2012)
[3] Khai Nguyen, Ryutaro Ichise. ScSLINT: Time and Memory Efficient Interlinking Framework for Linked Data. In Proceedings of the 14th Internation Semantic Web Conference Posters and Demonstrations Track, CEUR-WS.org, vol. 1486. (2015)
[4] Khai Nguyen, Ryutaro Ichise. A Heuristic Approach for Configuration Learning of Supervised Instance Matching. In Proceedings of the 14th Internation Semantic Web Conference Posters and Demonstrations Track. (2015)
[5] Khai Nguyen, Ryutaro Ichise. Heuristic-based Configuration Learning for Linked Data Instance Matching. In Proceedings of the 5th Joint International Semantic Technology Conference. LNCS, vol. 9544, pp. 56-72. Springer (2015)
[6] Khai Nguyen, Ryutaro Ichise. Linked Data Entity Resolution System Enhanced by Configuration Learning Algorithm. IEICE Transaction on Information System, Vol.E99-D, No.6, pp. 1521-1530. (2016)
[7] Khai Nguyen, Ryutaro Ichise. ScLink: supervised instance matching system for heterogeneous repositories. Journal of Intelligent Information Systems, DOI: 10.1007/s10844-016-0426-3. Springer. (2016)