- Make
molgraph
compatible withPython>3.10
andTensorFlow>2.15
.- For
tensorflow>2.15
make sure to setTF_USE_LEGACY_KERAS=1
.
- For
- Update
PeptideModel
ofmolgraph.applications.proteomics
.
- Make
Python 3.10.*
a requirement.
molgraph.layers
- Added layer
UpdateField
.
- Added layer
molgraph.applications.proteomics
- Fix some bugs, and update default config.
molgraph.applications.proteomics
- Two different types of peptide models now exist --- one with, and one without, virtual/super nodes. For inclusion of super nodes specify
super_nodes=True
forPeptideGraphEncoder
, otherwiseFalse
. Depending onsuper_nodes
parameter,PeptideModel
(aliasedPeptideGNN
orPepGNN
) will return a Keras Sequential model with an certain readout layer.
- Two different types of peptide models now exist --- one with, and one without, virtual/super nodes. For inclusion of super nodes specify
molgraph.models.interpretability
- Add
reduce_features
argument (default True) toGradientActivationMapping
. Specifies whether node feature dimension should be averaged.
- Add
molgraph.applications.proteomics
PeptideSaliency
is now based on the gradient (class) activation mapping algorithm. It considers all node features, including intermediate ones. Based on preliminary experiments, the saliencies looks more reasonable now.- Users are now able to add their own dictionary of AA/Residue SMILES, see README at
molgraph/applications/proteomics/
.
molgraph.applications.proteomics
- Remove
keras.Sequential
wrapping around RNN and DNN ofPeptideGNN
to avoid 'graph disconnect' error.
- Remove
chemistry.MolecularGraphEncoder
are now by default not computnig positional encoding. Pass aninteger
topositional_encoding_dim
to compute positional encodings of diminteger
.
- add
MANIFEST.in
and modifysetup.py
to include json files.
molgraph.applications.proteomics
- Fix import issue.
molgraph.applications
- Adding an application (
proteomics
). Applications are somewhat experimental to begin with and thus potentially subject to changes. See application README for updates.
- Adding an application (
molgraph.models.interpretability
Saliency
takes a new argument,absolute
(True/False), which decides whether the gradients should be absolute or not. Namely, ifabsolute=False
(which is default), saliency values will be both negative and positive.
molgraph.layers
SuperNodeReadout
added. This layer extracts "super node" features based on an indicator field. Basically, it performs a boolean_mask on node features resulting in atf.Tensor
of shape (n_subgraphs, n_supernodes, n_features). This tensor can then be inputted to a sequence model such as an RNN.
molgraph.models.interpretability
andmolgraph.layers.gnn
GradientActivationMapping
now behaves as expected, when usingGNN
. A private method was implemented that "watches" the intermediate inputs.
molgraph.layers
- The default kernel initializer is now again 'glorot_uniform'. This is the default kernel initializer for
keras.layers.Dense
and seems to work well for the GNN layers as well. To use set the previous default kernel initializer, specifykernel_initializer=keras.initializers.TruncatedNormal(stddev=0.005)
.
- The default kernel initializer is now again 'glorot_uniform'. This is the default kernel initializer for
molgraph.layers
- A GNN layer ("
GNN
") is implemented to combine the output of the last GNN layer as well as all the intermediate GNN layers. Simply pass a list of GNN layers toGNN
: (GNN([..., GINConv(128), GINConv(128), ...])
) and pass it as a layer to e.g.keras.Sequential
.
- A GNN layer ("
molgraph.models.interpretability
GradientActivationMapping
deprecateslayer_names
and will be default watch the node features of all graph tensors.
molgraph.tensors
GraphTensor
can now add field with prepended underscore in the input pipeline (after batching etc.)
molgraph.models
molgraph.models.interpretability
models now work with multi-label data, as it was supposed to.molgraph.models.interpretability.GradientActivationMapping
now computes alpha correctly, namely, computed for each subgraph separately (based on the graph indicator).
molgraph.models
molgraph.models.interpretability
models are now simplified and are not by default wrapped in tf.function; if desirable, the user may wrap it in tf.function themselves.
molgraph.layers
molgraph.layers.GNNLayer
's get_config and from_config methods are updated to allow for serialization of GNN models.molgraph.layers.GNNInputLayer
andmolgraph.layers.GNNInput
were added to allow for serialization of GNN models.molgraph.layers.StandardScaling
,molgraph.layers.Threshold
andmolgraph.layers.CenterScaling
can now be loaded.
molgraph.chemistry
molgraph.chemistry.Tokenizer
now appropriately adds self loops (if specified).
molgraph.layers.GATv2Conv
should now better correspond to the original GATv2 implementation.
- MolGraph should now install appropriate tensorflow version.
MolGraph can now be installed (via pip) for GPU and CPU users: pip install molgraph[gpu]
and pip install molgraph
, respectively.
molgraph.models
molgraph.models.gin
now considers initial node features (which has been subject to a linear transformation) in its output.
molgraph.tensors
molgraph.tensors.graph_tensor
can no longer be stacked. To stack GraphTensor instances, performtf.concat
followed by.separate()
.
molgraph.tensors
molgraph.tensors.graph_tensor
now accepts list of values, with sizes set to None.
molgraph.tensors
molgraph.tensors.graph_tensor
deprecates old features, attributes, etc. See documentation for how to use the GraphTensor.
molgraph.chemistry
molgraph.chemistry.encoders
now compatible with latest RDKit version.
molgraph.tensors
GraphTensor
is now implemented with thetf.experimental.ExtensionType
API. Be aware that this migration will likely break user code. The migration was deemed necessary to make the MolGraph API more robust, reliable and maintainable. TheGraphTensor
is by default in its non-ragged (disjoint) state when obtained from theMolecularGraphEncoder
. A non-raggedGraphTensor
is now, thanks to thetf.experimental.ExtensionType
API batchable. There is no need to.separate()
it before using it withtf.data.Dataset
. Furthermore, no need to add type_spec to keras.Sequential model.
molgraph.layers
molgraph.layers.GINConv
now optionally updates edge features at each layer, given thatuse_edge_features=True
andedge_features
exist. Specifyupdate_edge_features=True
to update edge features at each layer. Ifupdate_edge_features=False
,GINConv
will behave as before, namely, edge_features will only be updated ifedge_dim!=node_dim
. Furthermore,GINConv
uses edge features by default, given that edge features exist (specifyuse_edge_features=False
to not use edge features).
molgraph.models
- Note on
models
: models are currently being experimented with and will likely change in the future. GIN
model implemented, based on the original paper. This model differ somewhat from a GIN implementation usingkeras.Sequential
withGINConv
layers, as it outputs node embeddings from eachGINConv
layer. These embeddings can be used for graph predictions, by performing readout on each embedding and conatenating it. Or it can be used for node/edge predictions by simply concatenating these embeddings. The embeddings outputted from the GIN model is stored innode_feature
as a 3-D tf.Tensor (or 4-D tf.RaggedTensor).DMPNN
model changed to better match the implementation of the original paper (though some differences may still exist).
- Note on
molgraph.layers
normalization
is now by default set to None for most gnn layers (which means no normalization should be applied).
molgraph.chemistry
tf_records
module now implementswriter
: a context manager which makes it easier to write tf records to file.writer
seems to work fine, though it might be subject to changes in the future. Note:tf_records.write
can still be used, though thedevice
argument is depracated. To write tf records on CPU (given that GPU is available and used by default), usewith tf_records.writer(path) as writer: writer.write(data={'x': ...}, encoder=...)
instead.
molgraph.models
DGIN
model has been removed.DMPNN
now takes different parameters.- The first parameter of
DMPNN
andMPNN
issteps
and notunits
(units
is the second parameter).
- The first parameter of
molgraph.layers
from_config
ofmolgraph.layers.gnn_layer
should now properly build/initialize the the derived layer. Specifically, aGraphTensorSpec
should now be passed tobuild_from_signature()
.
molgraph.models
layer_names
ofmolgraph.models.GradientActivationMapping
is now optional. IfNone
(the default), the object will look for, and use, all layers subclassed fromGNNLayer
. If not found, an error will be raised.
molgraph
- Optional
positional_encoding
field ofGraphTensor
is renamed tonode_position
. A (Laplacian) positional encoding is included in aGraphTensor
instance when e.g.positional_encoding_dim
argument ofchemistry.MolecularGraphEncoder
is notNone
. The positional encoding is still referred to as "positional" and "encoding" inlayers.LaplacianPositionalEncoding
andchemistry.MolecularGraphEncoder
, though the actual data field added to theGraphTensor
isnode_position
.
- Optional
molgraph.chemistry
inputs
argument replaced withdata
.
molgraph.chemistry
molgraph.chemistry.tf_records.write()
no longer leaks memory. A large dataset (about 10 million small molecules, encoded as graph tensors) is expected to be written to tf records without exceeding 3GB memory usage.
molgraph.chemistry
molgraph.chemistry.tf_records.write()
now acceptsNone
input forencoder
. IfNone
is passed, it is assumed thatdata['x']
containsGraphTensor
instances (and not e.g. SMILES strings).
molgraph.tensors
node_position
is now an attribute of theGraphTensor
. Note:positional_encoding
can still be used to access the positional encoding (nownode_position
of aGraphTensor
instance). However, it will be depracated in the near future.
molgraph.layers
molgraph.layers.DotProductIncident
no longer takesapply_sigmoid
as an argument. Instead it takesnormalize
, which specifies whether the dot product should be normalized, resulting in cosine similarities (values between -1 and 1).
molgraph.models
GraphAutoEncoder
(GAE) andGraphVariationalAutoEncoder
(GVAE) are changed. The defaultloss
isNone
, which means that a default loss function is used. This loss function simply tries to maximize the positive edge scores and minimize the negative edge scores.predict
now returns the (positive) edge scores corresponding to the inputtedGraphTensor
instance.get_config
now returns a dictionary, as expected. The default decoder ismolgraph.layers.DotProductIncident(normalize=True)
. Note: there is still some more work to be done with GAE/GVAE; e.g. improving the "NegativeGraphSampler
" and (for VGAE) improving thebeta
schedule.
molgraph.tensors
GraphTensor.propagate()
now removes theedge_weight
data component, as it has already been used.
molgraph.models
GraphMasking
(alias:MaskedGraphModeling
) is now implemented. Like the autoencoders, this model pretrains an encoder; though instead of predicting links between nodes, it predicts randomly masked node and edge features. (Currently only works with tokenized node and edge features (viachemistry.Tokenizer
).) This pretraining strategy is inspired by BERT for language modeling.
molgraph.layers
from_config
now works as expected for all gnn layers. Consequently,gnn_model.from_config(gnn_model.get_config())
now works fine.
molgraph.layers
_build_from_vocabulary_size()
removed fromEmbeddingLookup
. Instead createsself.embedding
inadapt()
orbuild()
.
molgraph
- Make molgraph compatible with tf>=2.9.0. Before only compatible with tf>=2.12.0.
molgraph.layers
_get_reverse_edge_features()
ofedge_conv.py
is now correctly obtaining the reverse edge features.- Missing numpy import is now added for some preprocessing layers.
molgraph.models
- Update DGIN and DMPNN. These models are now working more as expected.
molgraph
- Replace tensorflow/keras functions to make MolGraph compatible with tensorflow 2.13.0. E.g.
keras.utils.register_keras_serializable
is replaced withtf.keras.saving.register_keras_serializable
.
- Replace tensorflow/keras functions to make MolGraph compatible with tensorflow 2.13.0. E.g.
molgraph.layers
- New base layer for GNN layers:
molgraph.layers.GNNLayer
. Old base layermolgraph.layers.BaseLayer
is removed. New base layer can be used to define new GNN layers. In brief these are the changes that may affect the user: (1)subclass_call
andsubclass_build
are renamed to_call
and_build
; (2)_build
is replaced bybuild_from_signature
which accepts aGraphTensor
orGraphTensorSpec
instead of tensors or tensor specs; which means that when building layers, you can obtain shapes from all nested data components corresponding to theGraphTensor
input. - Base layer file changed from
molgraph/layers/base.py
tomolgraph/layers/gnn_layer.py
- Layer ops file changed from
molgraph/layers/ops.py
tomolgraph/layers/gnn_ops.py
batch_norm
replaced withnormalization
for the built-in GNN layers as well as the base gnn layer. And if set to True,keras.layers.LayerNormalization
will be used. Specifynormalization='batch_norm'
to usekeras.layers.BatchNormalization
.- The attribute
edge_feature
(as well asnode_feature
,edge_src
,edge_dst
andgraph_indicator
) always exist in a graph tensor instance. If you need to check whether e.g. edge features exist in the graph, check whether the attributeedge_feature
is not None (graph_tensor.edge_feature is not None
), instead of checking whether the attributeedge_feature
exist (hasattr(graph_tensor, 'edge_feature'
), which will always be True.
- New base layer for GNN layers:
molgraph.models
- Saliency and gradient activation models are now implemented as
tf.Module
s instead ofkeras.Model
s. They no longer have apredict()
method and should instead be called directly via__call__()
. If batching is desired, loop over atf.data.Dataset
manually and withing the loop pass the graph tensor instance (and optionally the label) as__call__(x, y)
.
- Saliency and gradient activation models are now implemented as
molgraph.chemistry
- Removed deprecated chemistry featurizers and tokenizers (e.g.
chemistry.AtomicFeaturizer
andchemistry.AtomicTokenizer
, etc.). Simply usechemistry.Featurizer
orchemistry.Tokenizer
instead (for both atoms and bonds).
- Removed deprecated chemistry featurizers and tokenizers (e.g.
molgraph.layers
- Allows derived GNN layers (inherting from
GNNLayer
) to optionally passupdate_step
to override the default update step (_DefaultUpdateStep
ingnn_layer.py
). The customupdate_step
should be akeras.layers.Layer
which takes as input both the updated node (or edge) features ("inputs") as well as the previous node (or edge) features ("states"/residuals). One example of GNN layer which supplies a customupdate_step
(_FeedForwardNetwork
) is themolgraph.layers.GTConv
.
- Allows derived GNN layers (inherting from
molgraph.tensors
- Added
propagate()
method toGraphTensor
which propagates node features within the graph. Most built-in GNN layers now utilizes this method to propagate node features.
- Added
tests
- Adding more extensive/systematic unit tests.
molgraph.tensors
update()
method ofGraphTensor
now accepts keyword arguments. E.g.graph_tensor.update(node_feature=node_feature_updated)
is valid.- Added
is_ragged()
method toGraphTensor
which checks whether nested data is ragged. I.e., whether the graph tensor instance is in a ragged state. - Added several properties to
GraphTensor
andGraphTensorSpec
:node_feature
,edge_src
,edge_dst
,edge_feature
,graph_indicator
. Previously these properties were accessed via__getattr__
, now they are not. Conveniently, if they do not exist (e.g.edge_feature
is non-existent in the graph tensor instance) None is returned. Note: new data components added by the user (e.g.node_feature_updated
) can still be accessed as attributes.
- Misc
- Cleaned up code (mainly for the GNN layers) and modified/improved docstrings.
molgraph.chemistry
features.GasteigerCharge()
(and possibly other features) no longer gives None, nan or inf values.
molgraph.models
- Saliency and gradient activation mappings now works with
tf.saved_model
API. - Saliency and gradient activation mappings now work well with both ragged and non-ragged GraphTensor, as well as an optional label (for multi-label and multi-class classification). Note that these modules automatically sets an
input_signature
for__call__
upon first call.
- Saliency and gradient activation mappings now works with