Skip to content

Commit 009c7c7

Browse files
committed
More info on why not just use a graph DB
1 parent 384e5b4 commit 009c7c7

File tree

1 file changed

+9
-5
lines changed

1 file changed

+9
-5
lines changed

01_index.markdown

+9-5
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ background: home/bg.png
3232
<div class="problem-definition">
3333
<div>
3434
<div style="margin-top: 4%;"></div>
35-
<h2>We build knowledge graph <i>factories</i></h2>
35+
<center><h2>We build property graph <i>factories</i></h2></center>
3636
<div>
3737
<center>
3838
<img style="width: 70%; margin-top: 1%;" alt="Knowledge Graph Construction Architecture" src="assets/slides/KG-Factory-System-Architecture-Diagram.jpg" />
@@ -41,8 +41,12 @@ background: home/bg.png
4141
<div style="margin-top: 3%;"></div>
4242
<h2>We build <i>property graphs</i> in 3 steps</h2>
4343
<div style="margin-top: 2%;"></div>
44-
<div style="margin-bottom: 2%;">
45-
1) Transform myriad datasets into a common ontology. This means we Extract, Transform, Load (ETL) [or ELT] multiple, large and small datasets from different sources with different formats into a common property graph schema using tools like Python, PySpark, Databricks or Snowflake. How much ETL varies by industry from minimal with cybersecurity applications to simplified graph model with fewer makes it easy to access, query, analyze and model in a graph database such as Neo4j, TigerGraph, ArangoDB or Neptune.
44+
<div>
45+
1) Build the core of a graph by combining more than one structured datasets into a common schema or in larger domains, a full blown ontology. We Extract, Transform, Load (ETL) multiple, large and small datasets from different sources with different formats into a common property graph schema using tools like Python, Spark, Databricks or Snowflake. How much ETL varies by industry from relatively little in cybersecurity applications to a significant amount with business graph applications like KYC /AML / financial compliance. A well defined graph model with fewer makes it easy to access, query, analyze and model your business domain in a graph database such as Neo4j, TigerGraph, ArangoDB or Neptune.
46+
</div>
47+
<div style="margin-top: 2%;"></div>
48+
<div>
49+
Why not load your raw data directly into a graph database and do ETL inside it? Graph databases aren't ETL platforms. They are not designd for it. Python based tools are. Modern ETL increasingly involves using machine learning techniques rather than simple transformations. Graph databases are typically build on top of the Java Virtual Machine (JVM) or C++. Ask your data engineers how productive they will be doing ETL in Python versus Java or C++. Python shines at ETL. The JVM and C++ shine at interactively querying clean graph data.
4650
</div>
4751
<div style="margin-top: 2%;"></div>
4852
<div>
@@ -73,12 +77,12 @@ background: home/bg.png
7377
<div style="margin-top: 2%;"></div>
7478
<div>
7579
<center>
76-
<img style="width: 70%;" alt="Raw Data in Bronze Tables" src="assets/slides/Entity-Resolution-Phase-2---Manual-Matching.jpg" />
80+
<img style="width: 70%;" alt="Clean Data in Silver Tables" src="assets/slides/Entity-Resolution-Phase-2---Manual-Matching.jpg" />
7781
</center>
7882
</div>
7983
<div style="margin-top: 2%;"></div>
8084
<div>
81-
<b>3)</b> Entity resolution using network topology and natural language processing. Recent developments in Large Language Models [LLMs] and Graph Neural Networks (GNNs) allow us to encode nodes and edges as XML-like text using a language model and then combine them based on semantic inferences made by the LLM in combination with those made about the network via a GNN. LLMs have seen many similar documents as the nodes’ text representation on the world wide web.
85+
<b>2)</b> Entity resolution using network topology and natural language processing. Recent developments in Large Language Models [LLMs] and Graph Neural Networks (GNNs) allow us to encode nodes and edges as XML-like text using a language model and then combine them based on semantic inferences made by the LLM in combination with those made about the network via a GNN. LLMs have seen many similar documents as the nodes’ text representation on the world wide web.
8286
</div>
8387
<div style="margin-top: 2%;"></div>
8488
<div>

0 commit comments

Comments
 (0)