Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Es item #111

Merged
merged 58 commits into from
Feb 3, 2020
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
2a811f1
Removed unused interface
carlvitzthum Oct 21, 2019
856d510
Added datastore param to PickStorage methods; move PickStorage to sto…
carlvitzthum Oct 21, 2019
684c752
Added datastore parameters to connection.py
carlvitzthum Oct 21, 2019
d5d9a56
initial test
carlvitzthum Oct 21, 2019
874832d
Fixed a couple imports, improvd test_pick_storage
carlvitzthum Oct 21, 2019
af3deff
request.datastore moved to storage.py. Misc fixes
carlvitzthum Oct 21, 2019
69fb912
Disable some annoying loggers, improve PickStorage, couple test-relat…
carlvitzthum Oct 21, 2019
2746e15
Confirm self.read has value in PickStorage.storage
carlvitzthum Oct 22, 2019
f90ea44
small test fix
carlvitzthum Oct 22, 2019
22da12c
Revised register_storage function to better handle existing PickStorage
carlvitzthum Oct 22, 2019
d900bf8
Use new register storage with esstorage and mpindexer
carlvitzthum Oct 22, 2019
ad4938d
test changes
carlvitzthum Oct 22, 2019
4b3e665
Test fix
carlvitzthum Oct 22, 2019
3d6bc0e
Storage reconfiguration and changes for ES-based items
carlvitzthum Oct 31, 2019
1b7c93a
Resolve merge conflicts with snovault v1.3.2 and refactor a bit of na…
carlvitzthum Oct 31, 2019
e03eb7c
Merge branch 'es_item' of https://github.com/4dn-dcic/snovault into e…
carlvitzthum Oct 31, 2019
6726132
Fix for get_by_uuid direct, add TestingLinkTargetElasticSearch
carlvitzthum Oct 31, 2019
46dd00b
test_create_es_item_without_es
carlvitzthum Oct 31, 2019
1651282
A couple more misc test fixes
carlvitzthum Oct 31, 2019
2e1dc8c
Fix to PickStorage.find_uuids_linked_to_item
carlvitzthum Nov 1, 2019
4717d1e
Fix collection name
carlvitzthum Nov 1, 2019
55604b5
One more small fix
carlvitzthum Nov 1, 2019
c8fa44e
Messy, but got something working. Cleanup is needed, especially for r…
carlvitzthum Nov 7, 2019
680516f
Refactoring, simplifying, fixing tests
carlvitzthum Nov 7, 2019
8832f7d
Fully remove linkFrom
carlvitzthum Nov 7, 2019
e69a02e
Test embedding with TestingLinkTargetElasticSearch
carlvitzthum Nov 12, 2019
00aab68
Misc cleanup
carlvitzthum Nov 13, 2019
d43b3a7
small test fix
carlvitzthum Nov 14, 2019
71bca4f
Polishing crud_views and connection, added agg_items to ES item tests
carlvitzthum Nov 19, 2019
968aab5
Doc changes to cached_views.py
carlvitzthum Nov 19, 2019
be1628c
doc updates for esstorage.py
carlvitzthum Nov 19, 2019
816d6ee
Slight refactor to mpindexer
carlvitzthum Nov 19, 2019
f3607df
Final doc refactors
carlvitzthum Nov 19, 2019
3294856
Resolve merge
carlvitzthum Nov 19, 2019
18ea498
Small fix for indexing-info when item is not yet indexed
carlvitzthum Nov 20, 2019
ac44fc2
Slight refactor of purge_uuid to remove from ES before DB
carlvitzthum Nov 20, 2019
097c6f7
Refactored docs a bit and only include updated ones
carlvitzthum Nov 21, 2019
7ed4990
Some progress on docs
carlvitzthum Nov 21, 2019
dada23c
Filled out storage overview doc
carlvitzthum Dec 3, 2019
16583f1
Small doc-related changes
carlvitzthum Dec 3, 2019
6374b9b
Added some placeholder docs and made rst formatting consistent
carlvitzthum Dec 3, 2019
397e17b
Fix merge conflict in esstorage.py
carlvitzthum Dec 3, 2019
5ce08f3
Correctly format inline code
carlvitzthum Dec 3, 2019
cb780c3
Change ES item designation to AbstractCollection.properties_datastore
carlvitzthum Dec 6, 2019
5b02900
Fixes for links/uuids for ES items, as well as adjustment to properti…
carlvitzthum Dec 6, 2019
a40bc2e
Check request.datastore first in PickStorage.storage; adjustments for…
carlvitzthum Dec 6, 2019
b692fe8
Doc changes for properties_datastore
carlvitzthum Dec 6, 2019
11fbbd5
Test and version updates
carlvitzthum Dec 6, 2019
1994b3a
Small fixes and refactors related to default properties_datastore=dat…
carlvitzthum Dec 6, 2019
bdc0b11
Addressed a couple of Will's PR comments
carlvitzthum Dec 6, 2019
540f067
Refactor TestingLinkTargetElasticSearch tests
carlvitzthum Dec 6, 2019
919c3f1
Handle ES-based collections in create mapping
carlvitzthum Dec 6, 2019
a654902
Use new Collcection.default_properties_datastore for uuid cache inval…
carlvitzthum Dec 11, 2019
e3c46c3
Resolve merge
carlvitzthum Jan 16, 2020
a46ebd6
More docs
carlvitzthum Jan 16, 2020
a8e2449
small review changes
willronchetti Feb 3, 2020
1a575cb
Merge branch 'master' into es_item
willronchetti Feb 3, 2020
dc1f627
fix import
willronchetti Feb 3, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,13 @@
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_static_path = ['_static']

# can add a logo on sidebar with:
# html_logo = docs/source/img/...

# Read the Docs configuration.
# See: https://sphinx-rtd-theme.readthedocs.io/en/stable/configuring.html
html_theme_options = {
'navigation_depth': 2
}
17 changes: 17 additions & 0 deletions docs/source/es_indexing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Elasticsearch Indexing
=====================

**Work in progress!**

Indexing is the process of building a complete document that contains multiple views of an item, then putting that document into Elasticsearch (ES). This is done whenever an item is created or changed, and acts as one of the backbones of Snovault, allowing searching of data and quick reading of complex views for items that are "cached" by using ES as a right storage.

.. image:: img/indexing.png

Figure 1: Diagram of the indexing process.

Code
-----------------
* `indexer.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/elasticsearch/indexer.py>`_: index endpoint and initialization, Indexer class
* `mpindexer.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/elasticsearch/mpindexer.py>`_: MPIndexer class and helper functions
* `indexer_queue.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/elasticsearch/indexer_queue.py>`_: QueueManager and endpoints for queueing and checking indexing
* `indexing_views.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/indexing_views.py>`_: index-data view and some other related endpoints
File renamed without changes.
Binary file added docs/source/img/connection_storage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/img/indexing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/img/traversal.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 15 additions & 65 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,75 +1,25 @@
Snovault Documentation
========================
Snovault
========================

Snovault is a JSON-LD Database Framework that serves as the backend for the 4DN Data portal and CGAP.

|Build status|_

.. |Build status| image:: https://travis-ci.org/4dn-dcic/snovault.svg?branch=master
.. _Build status: https://travis-ci.org/4dn-dcic/snovault

Installation Instructions
=========================

Currently these are for Mac OSX using homebrew. If using linux, install dependencies with a different package manager.

Step 0: Install Xcode (from App Store) and homebrew: http://brew.sh::

Step 1: Verify that homebrew is working properly::

$ sudo brew doctor


Step 2: Install or update dependencies::

$ brew install libevent libmagic libxml2 libxslt openssl postgresql graphviz python3
$ brew install freetype libjpeg libtiff littlecms webp # Required by Pillow
$ brew tap homebrew/versions
$ brew install elasticsearch@5.6

If you need to update dependencies::

$ brew update
$ brew upgrade

Step 3: Run buildout::

$ python3 bootstrap.py --buildout-version 2.9.5 --setuptools-version 36.6.0
$ bin/buildout

NOTE:
If you have issues with postgres or the python interface to it (psycogpg2) you probably need to install postgresql
via homebrew (as above)
If you have issues with Pillow you may need to install new xcode command line tools:
- First update Xcode from AppStore (reboot)
$ xcode-select --install
If you are running macOS Mojave, you may need to run the below command as well:
$ sudo installer -pkg /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg -target /



If you wish to completely rebuild the application, or have updated dependencies:
$ make clean

Then goto Step 3.


Running tests
=============

To run specific tests locally::

$ bin/test -k test_name

To run with a debugger::

$ bin/test --pdb

Specific tests to run locally for schema changes::
Snovault is a JSON-LD Database Framework that serves as the backend for the `4DN Data portal <https://github.com/4dn-dcic/fourfront>`_ and `CGAP <https://github.com/dbmi-bgm/cgap-portal>`_. It is a very divergent fork of the work of the same name written by the ENCODE team at Stanford University. `See here <https://github.com/ENCODE-DCC/snovault>`_ for the original version.

$ bin/test -k test_load_workbook
Since Snovault is used for multiple deployments across a couple projects, we use `GitHub releases <https://github.com/4dn-dcic/snovault/releases>_` to version it. This page also acts as a changelog.

Run the Pyramid tests with::
To get started, read the following documentation on setting up and developing Snovault:

$ bin/test
.. toctree::
:titlesonly:
local_installation
testing
resources
storage_overview
traversal
resource_views
es_mapping
es_indexing
snowflakes
45 changes: 45 additions & 0 deletions docs/source/local_installation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
Local Installation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet most of these instructions would be better if we borrowed the instructions I just created for forefront. (We could do that in a separate PR sometime. Not a blocker here, though.

==================

Currently these are for macOS using homebrew. If using linux, install dependencies with a different package manager.

Snovault is known to work with Python 3.6.x and will not work with Python 3.7 or greater. If part of the HMS team, it is recommended to use Python 3.4.3, since that's what is running on our servers. A good tool to manage multiple python versions is `pyenv <https://github.com/pyenv/pyenv>_`. It is best practice to create a fresh Python virtualenv using one of these versions before proceeding to the following steps.

Step 0: Obtain AWS keys. These will need to added to your environment variables or through the AWS CLI (installed later in this process).

Step 1: Verify that homebrew is working properly::

$ sudo brew doctor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did

~$ sudo brew doctor
Error: Running Homebrew as root is extremely dangerous and no longer supported.
As Homebrew does not drop privileges on installation you would be giving all
build scripts full access to your system.
~$ brew doctor
Your system is ready to brew.

I think we should change this line to say just

$ brew doctor



Step 2: Install or update dependencies::

$ brew install libevent libmagic libxml2 libxslt openssl postgresql graphviz
$ brew install freetype libjpeg libtiff littlecms webp # Required by Pillow
$ brew tap homebrew/versions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~$ brew tap homebrew/versions
Error: homebrew/versions was deprecated. This tap is now empty as all its formulae were migrated.

I recommend that we just remove this line. I don't actually think it's needed.

$ brew install elasticsearch@5.6

If you need to update dependencies::

$ brew update
$ brew upgrade

Step 3: Run buildout::

$ python3 bootstrap.py --buildout-version 2.9.5 --setuptools-version 36.6.0
$ bin/buildout

NOTE:
If you have issues with postgres or the python interface to it (psycogpg2) you probably need to install postgresql
via homebrew (as above)
If you have issues with Pillow you may need to install new xcode command line tools:
- First update Xcode from AppStore (reboot)
$ xcode-select --install
If you are running macOS Mojave, you may need to run the below command as well:
$ sudo installer -pkg /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg -target /


If you wish to completely rebuild the application, or have updated dependencies:
$ make clean

Then go to Step 3.
10 changes: 10 additions & 0 deletions docs/source/resource_views.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Resource Views
===========================

**Work in progress!**

This document outlines the different base resource views and their sources. May be worth first reading the `traversal <https://snovault.readthedocs.io/en/latest/traversal.html>`_ and `storage <https://snovault.readthedocs.io/en/latest/storage_overview.html>`_ documentation.

**TODO: outline each resource view with context=Item.**

**TODO: Include relationship to storage and traversal (context and embed.py)**
12 changes: 12 additions & 0 deletions docs/source/resources.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Resources
===========================

**Work in progress!**

This document outlines different classes that compose a base Snovault item. Code is located in the following files:

- `resources.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/resources.py>`_: Root, AbstractCollection, Collection, Item classes
- `typeinfo.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/typeinfo.py>`_: AbstractTypeInfo, TypeInfo, TypesTool
- `config.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/config.py>`_: CollectionsTool, collection and abstract_collection decorators

**TODO: outline the role of each resource class. Include a complete example**
4 changes: 0 additions & 4 deletions docs/source/search_info.rst

This file was deleted.

9 changes: 4 additions & 5 deletions docs/source/snowflakes.rst
Original file line number Diff line number Diff line change
@@ -1,21 +1,20 @@
================
Snowflakes
================

General
^^^^^^^^
-----------------

Snowflakes used to be the front-end component of Snovault meant to serve as a demo. Since we at 4DN have our own Snovault-backed application (Fourfront, CGAP), snowflakes has been entirely removed from our version of Snovault. It is still present in ENCODE's version which you can find `here <https://github.com/ENCODE-DCC/snovault>`_ .

Removing Snowflakes from Snovault proved more challenging than one may expect. Some parts of snowflakes were actually required for snovault to run, such as ``root.py``. These files have all been migrated into Snovault.

Testing
^^^^^^^^
-----------------

In addition, several relevant tests that lived in Snowflakes have been migrated into Snovault. These tests include only those that are specific to Snovault and are not covered in existing Fourfront/CGAP testing. Properly configuring the tests proved challenging as the test framework as previously configured intertwined Snowflakes and Snovault in such a way that Snovault tests could not function without the presence of Snovault.

To fix this, several aspects of the tests have been refactored. We now load test schemas from files and have migrated many of the relevant fixtures from Snowflakes. ``config.py`` also required changes to account for behavior Snovault expected that it inherited from Snowflakes due to how includes work in PyTest.

Test coverage for Snovault should still be fairly strong, especially when combined with that of Fourfront/CGAP. Some indexing tests are marked as flaky as we've found they experience intermittent failures. Updating how we clear the SQS queue has also helped to remidy this issue.
Test coverage for Snovault should still be fairly strong, especially when combined with that of Fourfront/CGAP. Some indexing tests are marked as flaky as we've found they experience intermittent failures. Updating how we clear the SQS queue has also helped to remedy this issue.

One issue of note that was not solved involved a particular logging related test that appears to pass on local and fail on Travis. The associated test is ``test_indexing_logging``. This tests makes a index post on the application and checks to see that a correct log message was emitted. The log message itself is emitted but for some reason on Travis it is truncated. Even spinning up Travis on an identical container could not reproduce the issue. The relevant line is marked in the test file.
One issue of note that was not solved involved a particular logging related test that appears to pass on local and fail on Travis. The associated test is ``test_indexing_logging``. This tests makes a index post on the application and checks to see that a correct log message was emitted. The log message itself is emitted but for some reason on Travis it is truncated. Even spinning up Travis on an identical container could not reproduce the issue. The relevant line is marked in the test file.
Loading