You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 24, 2024. It is now read-only.
As we (@urig, @qguyk, @galwiner) have learned from examining issue #123, the Entropy API is currently confusing and somewhat even buggy around Nodes, their ids, stages and labels.
Problems
The Nodes table in Sqlite has an id column but that is not the primary key for the table.
The primary key for the Nodes table is a combination of id and experiment_id. This is a correct natural key for nodes but we prefer synthetic (artificial) keys.
Given an experiment, both stage and node id (the column, not the actual natural key combination of experiment_id and the column) are two different names for the same thing.
The SqlAlchemyDB method get_nodes_id_by_label() is broken because it returns node ids without experiment_ids. This does not follow the principle of least surprise :)
The idea behind stage is that in the context of a specific experiment, users can accidentally (?) use the same label for two different nodes. stage can be used to distinguish between different nodes in this case. I* think this solves a problem we should instead prevent.*
Proposed solution
a. Change the unique identifier for nodes from the id X experiment_id combination to a single auto-generated id column.
b. Remove all references to stage from the Entropy code base and replace them with node_label. That is, the label property of nodes.
c. Make it impossible for users to use the same label for two different nodes in any given experiment. We can enforce this in the
Sqlite DB and reflect it to users when they invoke the Graph constructor.
I think this solution will make our code base and our data schema smaller, more intuitive and easier to reason about.
Would love to hear what you think - Your feedback most welcome!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
As we (@urig, @qguyk, @galwiner) have learned from examining issue #123, the Entropy API is currently confusing and somewhat even buggy around
Nodes
, theirid
s,stage
s andlabel
s.Problems
The
Nodes
table in Sqlite has anid
column but that is not the primary key for the table.The primary key for the
Nodes
table is a combination ofid
andexperiment_id
. This is a correct natural key for nodes but we prefer synthetic (artificial) keys.Given an experiment, both
stage
and nodeid
(the column, not the actual natural key combination ofexperiment_id
and the column) are two different names for the same thing.The
SqlAlchemyDB
methodget_nodes_id_by_label()
is broken because it returns nodeid
s withoutexperiment_id
s. This does not follow the principle of least surprise :)The idea behind
stage
is that in the context of a specific experiment, users can accidentally (?) use the same label for two different nodes.stage
can be used to distinguish between different nodes in this case. I* think this solves a problem we should instead prevent.*Proposed solution
a. Change the unique identifier for nodes from the
id
Xexperiment_id
combination to a single auto-generatedid
column.b. Remove all references to
stage
from the Entropy code base and replace them withnode_label
. That is, thelabel
property of nodes.c. Make it impossible for users to use the same
label
for two different nodes in any given experiment. We can enforce this in theSqlite DB and reflect it to users when they invoke the
Graph
constructor.I think this solution will make our code base and our data schema smaller, more intuitive and easier to reason about.
Would love to hear what you think - Your feedback most welcome!
Beta Was this translation helpful? Give feedback.
All reactions