You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h2>We build <i>property graphs</i> in 3 steps</h2>
43
43
<div style="margin-top: 2%;"></div>
44
-
<div style="margin-bottom: 2%;">
45
-
1) Transform myriad datasets into a common ontology. This means we Extract, Transform, Load (ETL) [or ELT] multiple, large and small datasets from different sources with different formats into a common property graph schema using tools like Python, PySpark, Databricks or Snowflake. How much ETL varies by industry from minimal with cybersecurity applications to simplified graph model with fewer makes it easy to access, query, analyze and model in a graph database such as Neo4j, TigerGraph, ArangoDB or Neptune.
44
+
<div>
45
+
1) Build the core of a graph by combining more than one structured datasets into a common schema or in larger domains, a full blown ontology. We Extract, Transform, Load (ETL) multiple, large and small datasets from different sources with different formats into a common property graph schema using tools like Python, Spark, Databricks or Snowflake. How much ETL varies by industry from relatively little in cybersecurity applications to a significant amount with business graph applications like KYC /AML / financial compliance. A well defined graph model with fewer makes it easy to access, query, analyze and model your business domain in a graph database such as Neo4j, TigerGraph, ArangoDB or Neptune.
46
+
</div>
47
+
<div style="margin-top: 2%;"></div>
48
+
<div>
49
+
Why not load your raw data directly into a graph database and do ETL inside it? Graph databases aren't ETL platforms. They are not designd for it. Python based tools are. Modern ETL increasingly involves using machine learning techniques rather than simple transformations. Graph databases are typically build on top of the Java Virtual Machine (JVM) or C++. Ask your data engineers how productive they will be doing ETL in Python versus Java or C++. Python shines at ETL. The JVM and C++ shine at interactively querying clean graph data.
46
50
</div>
47
51
<div style="margin-top: 2%;"></div>
48
52
<div>
@@ -73,12 +77,12 @@ background: home/bg.png
73
77
<div style="margin-top: 2%;"></div>
74
78
<div>
75
79
<center>
76
-
<img style="width: 70%;" alt="Raw Data in Bronze Tables" src="assets/slides/Entity-Resolution-Phase-2---Manual-Matching.jpg" />
80
+
<img style="width: 70%;" alt="Clean Data in Silver Tables" src="assets/slides/Entity-Resolution-Phase-2---Manual-Matching.jpg" />
77
81
</center>
78
82
</div>
79
83
<div style="margin-top: 2%;"></div>
80
84
<div>
81
-
<b>3)</b> Entity resolution using network topology and natural language processing. Recent developments in Large Language Models [LLMs] and Graph Neural Networks (GNNs) allow us to encode nodes and edges as XML-like text using a language model and then combine them based on semantic inferences made by the LLM in combination with those made about the network via a GNN. LLMs have seen many similar documents as the nodes’ text representation on the world wide web.
85
+
<b>2)</b> Entity resolution using network topology and natural language processing. Recent developments in Large Language Models [LLMs] and Graph Neural Networks (GNNs) allow us to encode nodes and edges as XML-like text using a language model and then combine them based on semantic inferences made by the LLM in combination with those made about the network via a GNN. LLMs have seen many similar documents as the nodes’ text representation on the world wide web.
0 commit comments