You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h2>We build <i>property graphs</i> in 3 steps</h2>
42
+
<h2>We build <i>property graphs</i> in 3 steps...</h2>
43
43
<div style="margin-top: 2%;"></div>
44
+
<h3>Extract, Transform, Load (ETL): Common Formats</h3>
44
45
<div>
45
-
1) Build the core of a graph by combining more than one structured datasets into a common schema or in larger domains, a full blown ontology. We Extract, Transform, Load (ETL) multiple, large and small datasets from different sources with different formats into a common property graph schema using tools like Python, Spark, Databricks or Snowflake. How much ETL varies by industry from relatively little in cybersecurity applications to a significant amount with business graph applications like KYC /AML / financial compliance. A well defined graph model with fewer makes it easy to access, query, analyze and model your business domain in a graph database such as Neo4j, TigerGraph, ArangoDB or Neptune.
46
+
<b>1)</b> Build the core of a graph by combining more than one structured datasets into a common schema or in larger domains, a full blown ontology. We Extract, Transform, Load (ETL) multiple, large and small datasets from different sources with different formats into a common property graph schema using tools like Python, Spark, Databricks or Snowflake. How much ETL varies by industry from relatively little in cybersecurity applications to a significant amount with business graph applications like KYC /AML / financial compliance. A well defined graph model with fewer makes it easy to access, query, analyze and model your business domain in a graph database such as Neo4j, TigerGraph, ArangoDB or Neptune.
46
47
</div>
47
48
<div style="margin-top: 2%;"></div>
48
49
<div>
@@ -60,12 +61,13 @@ background: home/bg.png
60
61
<img style="width: 70%;" alt="Transformed, Cleaned Data in Silver Tables" src="assets/slides/Entity-Resolution-Phase-1-Silver-ETL.png" />
61
62
</center>
62
63
<div style="margin-top: 2%;"></div>
64
+
<h3>Entity Resolution (ER): Deduplication</h3>
63
65
<div>
64
-
<b>2)</b> Extract a graph from text using Natural Language Processing (NLP) via a chain of operations: NER —> IE —> EL. Named entity recognition (NER) points out entities corresponding to nodes. Information Extraction (IE) creates relationships [edges] between entities. Entity linking links nodes and edges extracted from text documents into single into a core graph established via ETL.
66
+
Entity Resolution (ER) is the process of deduplicating and combining duplicate nodes and splitting up mistakenly merged nodes. In a similar manner, edges can also be merged or split up.
65
67
</div>
66
68
<div style="margin-top: 2%;"></div>
67
69
<div>
68
-
Initially a process of exploratory data analysis (EDA) reveals patterns that can be used to handle the combinatoric problems arising from the need in entity matching to compare every node in the graph with every node. The complexity of this comparison is n^2, where n is the number of nodes. This can quickly get out of hand with millions or billions of nodes! Blocking is a strategy to prune the set of nodes compared down to groups that are more manageable.
70
+
Traditional entity resolution involves two phases: blocking and matching. Initially a process of exploratory data analysis (EDA) reveals blocking patterns that can help group data to help limit the number of comparisons between records during the matching phase. The naive complexity of every node to every other comparison is n^2, where n is the number of nodes. This can quickly get out of hand with millions or billions of nodes! Blocking is a strategy to prune the set of nodes compared down to groups that are more manageable.
69
71
</div>
70
72
<div style="margin-top: 2%;"></div>
71
73
<div style="margin-top: 2%;"></div>
@@ -100,6 +102,10 @@ background: home/bg.png
100
102
<img style="width: 70%;" alt="Raw Data in Bronze Tables" src="assets/slides/Entity-Resolution-Phase-3---LSH-Blocking.jpg" />
101
103
</center>
102
104
</div>
105
+
<div>
106
+
<b>3)</b> Enlarge the core knowledge graph from unstructured data sources using Natural Language Processing (NLP) via a chain of operations: NER —> IE —> EL. Named Entity Recognition (NER) points out entities corresponding to nodes. Information Extraction (IE) creates relationships [edges] between entities. Entity Linking links nodes and edges extracted from text documents into single into a core graph established via ETL.
0 commit comments