diff --git a/docs/approval.md b/docs/approval.md index 7b6639f3..0307d89a 100644 --- a/docs/approval.md +++ b/docs/approval.md @@ -6,12 +6,10 @@ nav_order: 13 # Approval of Clusters -## +## [Zingg Enterprise Feature](#user-content-fn-1)[^1] - - ### The approval phase is run as follows: -` ` \ No newline at end of file +[^1]: Zingg Enterprise is an advance version of Zingg Community with production grade features diff --git a/docs/modelexplain.md b/docs/modelexplain.md index 792f1e75..25314b57 100644 --- a/docs/modelexplain.md +++ b/docs/modelexplain.md @@ -2,16 +2,15 @@ title: Explanation parent: Step By Step Guide nav_order: 12 +description: To get a better understanding of how the data is trained and matched --- # Explanation of Models -## To get a better understanding of how the data is trained and matched - [Zingg Enterprise Feature](#user-content-fn-1)[^1] - - ### The explain phase is run as follows: -` ./scripts/zingg.sh --phase --conf --mode explain ` \ No newline at end of file +`./scripts/zingg.sh --phase --conf --mode explain` + +[^1]: Zingg Enterprise is an advance version of Zingg Community with production grade features diff --git a/docs/relations.md b/docs/relations.md index 48cb83d1..975abf60 100644 --- a/docs/relations.md +++ b/docs/relations.md @@ -7,6 +7,85 @@ description: When a single match model is not sufficient # Combining Different Match Models -[Zingg Enterprise Feature](relations.md#user-content-fn-1)\[^1] +[Zingg Enterprise Feature](#user-content-fn-1)[^1] -## In many cases +In many cases, we want to build the identity graph using a combination of different datasets, schemas and matching logic. An example could be having a source system which only contains userids and emails, another one wtih user name and phone numbers and a few others with person information with addresses. Another example could be some systems capturing spousal information, but others to be matched on the basis of lastname and address. + +In such cases, Zingg can build the entire graph and relate different models together. In the following case, results of a query with exact match on family Id and a matching model(household) using address and lastname are brought together. + +````json +``` +{ + "vertices" : + [ + { + "name" : "spouse", + "vertexType" : "zingg_pipe", + "data" : [ + { + "name" : "spouse", + "format" : "snowflake", + "props": { + "query": "select a.id as id, a.FNAME, a.LNAME, a.STNO, a.ADD1, a.CITY, a.STATE, a.ZINGG_ID_PERSON, b.id as z_id, b.fname as Z_FNAME,b.lname as Z_LNAME,b.stno as Z_STNO,b.add1 as Z_ADD1, b.city as Z_CITY,b.state as Z_STATE, b.ZINGG_ID_PERSON as Z_ZINGG_ID_PERSON from CUSTOMER_RELATE_PARTIAL a, CUSTOMER_RELATE_PARTIAL b where a.familyId = b.familyId" + } + } + ], + "edges" : + { "edgeType" : "same_edge", + "edges":[ + { + "dataColumn" : "zingg_personId", + "column" : "zingg_personId", + "name" : "zingg_personId1" + }, + { + "dataColumn" : "zingg_personId", + "column" : "z_zingg_personId", + "name" : "zingg_personId2" + } + ] + } + }, + { + "name" : "household", + "config" : "$ZINGG_ENTERPRISE_HOME$/zinggEnterprise/configHousehold.json", + "strategy" : { + "vDataStrategy" : "unique_edge", + "props" : { + "column" : "zingg_personId", + "edge" : "zingg_personId,z_zingg_personId" + } + }, + "vertexType" : "zingg_match", + "edges" : + { "edgeType" : "same_edge", + "edges":[ + { + "dataColumn" : "zingg_personId", + "column" : "zingg_personId", + "name" : "zingg_personId1" + }, + { + "dataColumn" : "zingg_personId", + "column" : "z_zingg_personId", + "name" : "zingg_personId2" + } + ] + } + } + ], + "output" : [{ + "name":"relatedCustomers", + "format":"snowflake", + "props": { + "table": "RELATED_CUSTOMERS_PARTIAL" + } + }], + "strategy":"pairs_and_vertices" +} + + +``` +```` + +[^1]: Zingg Enterprise is an advance version of Zingg Community with production grade features diff --git a/docs/runIncremental.md b/docs/runIncremental.md index a475e87c..8e76e3b4 100644 --- a/docs/runIncremental.md +++ b/docs/runIncremental.md @@ -2,12 +2,13 @@ title: Adding incremental data parent: Step By Step Guide nav_order: 10 +description: >- + Building a continuosly updated identity graph with new, updated and deleted + records --- # Adding Incremental Data -## Building a continuosly updated identity graph with new, updated and deleted records - [Zingg Enterprise Feature](#user-content-fn-1)[^1] Rerunning matching on entire datasets is wasteful, and we lose the lineage of matched records against a persistent identifier. Using the[ incremental flow](https://www.learningfromdata.zingg.ai/p/zingg-incremental-flow) feature in [Zingg Enterprise](https://www.zingg.ai/company/zingg-enterprise), incremental loads can be run to match existing pre-resolved entities. The new and updated records are matched to existing clusters, and new persistent [**ZINGG\_IDs**](https://www.learningfromdata.zingg.ai/p/hello-zingg-id) are generated for records that do not find a match. If a record gets updated and Zingg Enterprise discovers that it is a more suitable match with another cluster, it will be reassigned. Cluster assignment, merge, and unmerge happens automatically in the flow. Zingg Enterprise also takes care of human feedback on previously matched data to ensure that it does not override the approved records. diff --git a/docs/setup/match.md b/docs/setup/match.md index ecec2778..832f16fe 100644 --- a/docs/setup/match.md +++ b/docs/setup/match.md @@ -2,12 +2,11 @@ title: Find the matches parent: Step By Step Guide nav_order: 9 +description: Finds the records that match with each other. --- # Finding The Matches -## Finds the records that match with each other. - `./zingg.sh --phase match --conf config.json` As can be seen in the image below, matching records are given the same **z\_cluster** id. Each record also gets a **z\_minScore** and **z\_maxScore** which shows the _least/greatest_ it matched with other records in the same cluster. diff --git a/docs/setup/train.md b/docs/setup/train.md index 33259dc8..bb12590c 100644 --- a/docs/setup/train.md +++ b/docs/setup/train.md @@ -2,6 +2,7 @@ title: Build and save the model parent: Step By Step Guide nav_order: 8 +description: So that the same model can be applied to new data --- # Building And Saving The Model @@ -11,4 +12,3 @@ Builds up the Zingg models using the training data from the above phases and wri ``` ./zingg.sh --phase train --conf config.json ``` -