-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLesson-RelMod.qmd
355 lines (302 loc) · 9.33 KB
/
Lesson-RelMod.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
---
title: "Biostat 823 - Relational data modeling"
author: "Hilmar Lapp"
institute: "Duke University, Department of Biostatistics & Bioinformatics"
date: "Sep 3, 2024"
format:
revealjs:
slide-number: true
editor: visual
knitr:
opts_chunk:
echo: TRUE
---
## Relational data model
- Relational models are based on first-order predicate logic
- Entities and predicates
- Relations (tables) and tuples (rows)
## Entity-relationship (E-R) model
- Entities are identifiable things of interest in the domain of interest
- Entities have attributes (properties):
- The set of attributes that uniquely identify the entity is the *natural key*.
- Relationships between entities
## Logical vs Physical E-R model I {.smaller}
::: columns
::: {.column width="50%"}
Logical data model:
- All entities (including keys), their attributes, and relationships (including cardinalities)
- Independent of database implementation
- Abstract model
:::
::: {.column width="50%"}
Physical data model:
- Usually derived from the logical model
- For instantiation in a relational database manage-ment system (RDBMS)
- Includes indexes and constraints
- Can derive multiple physical data models from same logical model
:::
:::
## Relationship cardinalities
- {0,1}:1
- {0,1}:*n*
- *n*:*n*
## Crow's foot notation {.smaller}
| Description | Symbol |
|-------------------------------------------------------|-----------------|
| Ring and dash: Minimum zero, maximum one (optional) | <img src="images/SQL/ring-and-dash.png" style="vertical-align: bottom"/> |
| Dash and dash: Exactly one (mandatory) | <img src="images/SQL/dash-and-dash.png" style="vertical-align: bottom"/> |
| Ring and crow's foot: Minimum zero, maximum many (optional) | <img src="images/SQL/ring-and-crow.png" style="vertical-align: bottom"/> |
| Dash and crow's foot: Minimum one, maximum many (mandatory) | <img src="images/SQL/dash-and-crow.png" style="vertical-align: bottom"/> |
::: aside
Adapted from [Visio: Create a diagram with crow's foot database notation](https://support.microsoft.com/en-us/office/create-a-diagram-with-crow-s-foot-database-notation-1ec22af9-3bd3-4354-b2b5-ed5752af6769)
:::
## Examples for Crow's foot notation
::: {layout-ncol="5"}
```{mermaid}
erDiagram
Instructor ||--o| Course : teaches
```
```{mermaid}
erDiagram
Instructor ||--|| Course : teaches
```
```{mermaid}
erDiagram
Instructor ||--o{ Course : teaches
```
```{mermaid}
erDiagram
Instructor }|--|{ Course : teaches
```
:::
## Resolving *n*:*n* relationships
RDBMSs do not directly support *n*:*n* relationships between two tables, so they must be resolved for a physical E-R model.
::: {layout-ncol="3"}
```{mermaid}
%%| fig-cap: "Logical model"
erDiagram
Instructor }|--|{ Course : teaches
```
```{mermaid}
%%| fig-cap: "Physical model, using [associative table](https://en.wikipedia.org/wiki/Associative_entity)"
erDiagram
INSTRUCTOR ||--|{ INSTRUCTOR_TO_COURSE : teaches
COURSE ||--|{ INSTRUCTOR_TO_COURSE : "taught by"
```
```{mermaid}
%%| fig-cap: "Logical and/or physical model"
erDiagram
Instructor ||--|{ Lesson : teaches
Course ||--|{ Lesson : "consists of"
```
:::
## Strong vs weak entities
```{mermaid}
erDiagram
Instructor ||--|{ Lesson : teaches
Course ||--|{ Lesson : "consists of"
Instructor {
string Name
string Email PK
}
Course {
string Title
string Code PK
}
Lesson {
string Name
}
```
- `Instructor` and `Course` exist on their own, identifiable by attributes of their own: *strong* entities
- `Lesson` exists only within the context of a `Course`, not identifiable on its own: *weak* entity
## Foreign Keys
::: {layout-ncol="2"}
```{mermaid}
%%| fig-cap: "Foreign keys implicit"
erDiagram
Instructor ||--|{ Lesson : teaches
Course ||--|{ Lesson : "consists of"
Instructor {
string Name
string Email PK
}
Course {
string Title
string Code PK
}
Lesson {
string Name
}
```
```{mermaid}
%%| fig-cap: "Explicit foreign keys and synthetic (a.k.a. surrogate) primary keys"
erDiagram
Instructor ||--|{ Lesson : teaches
Course ||--|{ Lesson : "consists of"
Instructor {
integer Instructor_ID PK
string Name
string Email PK
}
Course {
integer Course_ID PK
string Title
string Code PK
}
Lesson {
string Name
integer Instructor_ID FK
integer Course_ID FK
}
```
:::
## Entity relationships can be self-referential
```{mermaid}
erDiagram
Instructor ||--|{ Lesson : teaches
Course ||--|{ Lesson : "consists of"
Course |o--o{ Course : "required by"
Instructor {
integer Instructor_ID PK
string Name
string Email PK
}
Course {
integer Course_ID PK
string Title
string Code PK
integer Req_Course_ID FK
}
Lesson {
string Name
integer Instructor_ID FK
integer Course_ID FK
}
```
## E-R models can be very complex
- Consider for example the [BioSQL ERD](https://github.com/biosql/biosql/blob/master/doc/biosql-ERD.pdf)
- Represents entries in biological sequence databases (such as GenBank), their key-value annotations, cross-references, and feature annotations (which themselves have key-value annotations)
## Database normalization
- Goal is to minimize data redundancy, improve data integrity, prevent anomalies
- Enable a database to _enforce data integrity_ through uniqueness, not-null, and referential integrity constraints
- Formalized as *normal forms*
- [1st](https://en.wikipedia.org/wiki/First_normal_form), [2nd](https://en.wikipedia.org/wiki/Second_normal_form), and [3rd](https://en.wikipedia.org/wiki/Second_normal_form) Forms are most widely practiced
- Stated initially by Codd in 1970 (1NF) and 1971 (2NF, 3NF)
## First normal form
> A relation is in first normal form *iff* no attribute domain has relations as elements.
Attribute values must be "atomic" (not tables, arrays, lists, etc).
::: {layout-ncol="2"}
```{mermaid}
%%| fig-cap: "Unnormalized"
erDiagram
Instructor {
string Name
string Email
array Course_Lessons
}
```
```{mermaid}
%%| fig-cap: "Normalized to 1NF"
erDiagram
Instructor ||--|{ Course_Lesson : teaches
Instructor {
string Name
string Email
}
Course_Lesson {
string Course_Name
string Lesson_Name
}
```
:::
## Second normal form {.smaller}
> A relation is in 2NF *iff* it is in 1NF and it does not have any non-prime attribute functionally dependent on any proper subset of any candidate key of the relation.
If any table has a composite natural key, no column in that table depends on only a part of the composite key.
::: {layout-ncol="2"}
```{mermaid}
%%| fig-cap: "In 1NF but not 2NF"
erDiagram
Instructor ||--|{ Course_Lesson : teaches
Instructor {
string Name
string Email
}
Course_Lesson {
string Course_Name
string Lesson_Name
string Room
}
```
```{mermaid}
%%| fig-cap: "Normalized to 2NF"
erDiagram
Instructor ||--|{ Lesson : teaches
Course ||--|{ Lesson : "consists of"
Instructor {
string Name
string Email
}
Course {
string Name
string Room
}
Lesson {
string Name
}
```
:::
## Third normal form {.smaller}
> A relation R is in 3NF *iff* it is in 2NF and every non-prime attribute of R is non-transitively dependent on every key of R.
For every table, any attribute that is not part of a natural key depends *directly* on every key for the table.
::: {layout-ncol="2"}
```{mermaid}
%%| fig-cap: "In 2NF but not 3NF"
erDiagram
Instructor ||--|{ Lesson : teaches
Course ||--|{ Lesson : "consists of"
Instructor {
string Name
string Email
}
Course {
string Name
string Semester
string Room
string Room_Address
}
Lesson {
string Name
}
```
```{mermaid}
%%| fig-cap: "Normalized to 3NF"
erDiagram
Instructor ||--|{ Lesson : teaches
Course ||--|{ Lesson : "consists of"
Room |o--o{ Course : "used for"
Instructor {
string Name
string Email
}
Course {
string Name
string Semester
}
Room {
string Name
string Address
}
Lesson {
string Name
}
```
:::
## Logical vs Physical E-R model II {.smaller}
| Logical data model | Physical data model |
|--------------------------------------|----------------------------------|
| Entities and their attributes | Tables and columns |
| Natural keys (identifying attributes) | Unique key constraints |
| | Surrogate primary keys, auto-increment mechanism |
| Relationships | Foreign keys; associative tables for *n*:*n* relationships |
| Relationship cardinalities | Foreign key and `NOT NULL` constraints |
| | Indexes |