Google Cloud Datastore is a great NoSQL solution (hosted, scalable, free up to a point), but it can be tricky (i.e. there's lots of code glue needed) to get even the "Hello World" of data persistence up and running in PHP.
This library is intended to make it easier for you to get started with and to use Datastore in your applications.
Please note, this 2.0-series (dev-master) is in BETA and not suitable (yet) for production
The documentation is not yet fully representative of 2.x implementation.
- Examples
- New in 2.0
- Getting Started
- Defining Your Model
- Creating Records
- Queries, GQL & The Default Query
- Multi-tenant Applications & Data Namespaces
- Entity Groups, Hierarchy & Ancestors
- Transactions
- More About Google Cloud Datastore
- Unit Tests
- Footnotes
I find examples a great way to decide if I want to even try out a library, so here's a couple for you.
// Build a new entity
$obj_book = new GDS\Entity();
$obj_book->title = 'Romeo and Juliet';
$obj_book->author = 'William Shakespeare';
$obj_book->isbn = '1840224339';
// Write it to Datastore
$obj_store = new GDS\Store('Book');
$obj_store->upsert($obj_book);
See below for Alternative Array Syntax for creating Entities.
Now let's fetch all the Books from the Datastore and display their titles and ISBN numbers
$obj_store = new GDS\Store('Book');
foreach($obj_store->fetchAll() as $obj_book) {
echo "Title: {$obj_book->title}, ISBN: {$obj_book->isbn} <br />", PHP_EOL;
}
These initial examples assume you are either running a Google AppEngine application or in a local AppEngine dev environment. In both of these cases, we can auto detect the dataset and use the default Protocol Buffer Gateway (new in 2.0).
We use a GDS\Store
to read and write GDS\Entity
objects to and from Datastore.
These examples use the generic GDS\Entity
class with a dynamic Schema. See Defining Your Model below for more details on custom Schemas and indexed fields.
Check out the examples folder for many more and fuller code samples.
A little more configuration is required if you want or need to use the JSON API instead of Protocol Buffers.
The Store needs a GDS\Gateway
to talk to Google and the gateway needs a Google_Client
for authentication.
$obj_client = GDS\Gateway\GoogleAPIClient::createGoogleClient(APP_NAME, ACCOUNT_NAME, KEY_FILE);
$obj_gateway = new GDS\Gateway\GoogleAPIClient($obj_client, DATASET_ID);
$obj_book_store = new GDS\Store('Book', $obj_gateway);
A simple guest book application
Application: http://php-gds-demo.appspot.com/
Code: https://github.com/tomwalder/php-gds-demo
New features in 2.0 include
- Faster! Google Protocol Buffer allows faster, low-level access to Datastore
- Easier to use - sensible defaults and auto-detection for AppEngine environments
- Less dependencies - no need for the Google PHP API Client, unless running remote or from non-AppEngine environments
- Suite of unit tests
- Optional drop-in JSON API Gateway for remote or non-AppEngine environments (this was the only Gateway in 1.x)
The library is almost fully backwards compatible. And in fact, the main operations of the GDS\Store
class are identical.
There is one BC-break in 2.0 - the re-ordering of construction parameters for the GDS\Store
class.
GDS\Store::__construct(<Kind or Schema>, <Gateway>)
instead of
GDS\Store::__construct(<Gateway>, <Kind or Schema>)
This is because the Gateway is now optional, an has a sensible, automated, default - the new Protocol Buffer implementation.
Are you sitting comfortably? before we begin, you will need:
- a Google Account (doh), usually for running AppEngine - but not always
- a Project to work on with the "Google Cloud Datastore API" turned ON Google Developer Console
If you want to use the JSON API from remote or non-App Engine environments, you will also need
- a "Service account" and either
- (recommended, simpler) the JSON service key file, downloadable from the Developer Console
- or a P12 key file for that service account Service Accounts along with the service account name
To install using Composer, use this require line, for production
"tomwalder/php-gds": "v1.2.1"
and for bleeding-edge features, 2.0 dev-master
"tomwalder/php-gds": "dev-master"
Because Datastore is schemaless, the library also supports fields/properties that are not explicitly defined. But it often makes a lot of sense to define your Entity Schema up front.
Here is how we might build the Schema for our examples, with a Datastore Entity Kind of "Book" and 3 fields.
$obj_schema = (new GDS\Schema('Book'))
->addString('title')
->addString('author')
->addString('isbn');
// The Store accepts a Schema object or Kind name as it's first parameter
$obj_book_store = new GDS\Store($obj_schema);
By default, all fields are indexed. An indexed field can be used in a WHERE clause. You can explicitly configure a field to be not indexed by passing in FALSE
as the second parameter to addString()
.
If you use a dynamic schema (i.e. do not define on, but just use the Entity name) then all fields will be indexed for that record.
Avaialable Schema configuration methods:
GDS\Schema::addString
GDS\Schema::addInteger
GDS\Schema::addDatetime
GDS\Schema::addFloat
GDS\Schema::addBoolean
GDS\Schema::addStringList
Take a look at the examples
folder for a fully operational set of code.
There is an alternative to directly constructing a new GDS\Entity
and setting it's member data, which is to use the GDS\Store::createEntity
factory method as follows.
$obj_book = $obj_book_store->createEntity([
'title' => 'The Merchant of Venice',
'author' => 'William Shakespeare',
'isbn' => '1840224312'
]);
Support for DateTime object binding was added recently (also see query parameter binding below)
$obj_book = $obj_book_store->createEntity([
'title' => 'Some Book',
'author' => 'A N Other Guy',
'isbn' => '1840224313',
'published' => new DateTime('-5 years')
]);
At the time of writing, the GDS\Store
object uses Datastore GQL as it's query language. Here is an example:
$obj_book_store->fetchOne("SELECT * FROM Book WHERE isbn = '1853260304'");
And with support for named parameter binding (strings, integers)
$obj_book_store->fetchOne("SELECT * FROM Book WHERE isbn = @isbnNumber", [
'isbnNumber' => '1853260304'
]);
Support for DateTime object binding
$obj_book_store->fetchOne("SELECT * FROM Task WHERE date_date < @now", [
'now' => new DateTime()
]);
We provide a couple of helper methods for some common (root Entity) queries, single and batch (much more efficient than many individual fetch calls):
GDS\Store::fetchById
GDS\Store::fetchByIds
- batch fetchingGDS\Store::fetchByName
GDS\Store::fetchByNames
- batch fetching
When you instantiate a store object, like BookStore
in our example, it comes pre-loaded with a default GQL query of the following form (this is "The Default Query")
SELECT * FROM <Kind> ORDER BY __key__ ASC
Which means you can quickly and easily get one or many records without needing to write any GQL, like this:
$obj_store->fetchOne(); // Gets the first book
$obj_store->fetchAll(); // Gets all books
$obj_store->fetchPage(10); // Gets the first 10 books
When working with larger data sets, it can be useful to page through results in smaller batches. Here's an example paging through all Books in 50's.
$obj_book_store->query('SELECT * FROM Book');
while($arr_page = $obj_book_store->fetchPage(50)) {
echo "Page contains ", count($arr_page), " records", PHP_EOL;
}
In a standard SQL environment, the above pagination would look something like this:
SELECT * FROM Book LIMIT 0, 50
for the first pageSELECT * FROM Book LIMIT 50, 50
for the second, and so on.
Although you can use a very similar syntax with Datastore GQL, it can be unnecessarily costly. This is because each row scanned when running a query is charged for. So, doing the equivalent of LIMIT 5000, 50
will count as 5,050 reads - not just the 50 we actually get back.
This is all fixed by using Cursors. The implementation is all encapsulated within the GDS\Gateway
class so you don't need to worry about it.
Bototm line: the bult-in pagination uses Cursors whenever possible for fastest & cheapest results.
Do not supply a LIMIT
clause when calling
GDS\Store::fetchOne
- it's done for you (we addLIMIT 1
)GDS\Store::fetchPage
- again, it's done for you and it will cause a conflict.
Google Datastore supports segregating data within a single "Dataset" using something called Namespaces.
Generally, this is intended for multi-tenant applications where each customer would have separate data, even within the same "Kind".
This library supports namespaces, and they are be configured per Gateway
instance by passing in the optional 3rd namespace parameter.
ALL operations carried out through a Gateway with a namespace configured are done in the context of that namespace. The namespace is automatically applied to Keys when doing upsert/delete/fetch-by-key and to Requests when running GQL queries.
// Create a store for a particular customer or 'application namespace'
$obj_client = GDS\Gateway::createGoogleClient(APP_NAME, ACCOUNT_NAME, KEY_FILE);
$obj_namespaced_gateway = new GDS\Gateway($obj_client, DATASET_ID, 'customer-namespace');
$obj_namespaced_book_store = new BookStore($obj_namespaced_gateway);
Further examples are included in the examples folder.
Google Datastore allows for (and encourages) Entities to be organised in a hierarchy.
The hierarchy allows for some amount of "relational" data. e.g. a ForumThread
entity might have one more more ForumPosts
entities as children.
Entity groups are quite an advanced topic, but can positively affect your application in a number of areas including
- Transactional integrity
- Strongly consistent data
At the time of writing, I support working with entity groups through the following methods
GDS\Entity::setAncestry
GDS\Entity::getAncestry
GDS\Store::fetchEntityGroup
The GDS\Store
supports running updates and deletes in transactions.
To start a transaction
$obj_store->beginTransaction();
Then, any operation that changes data will commit and consume the transaction. So an immediate call to another operation WILL NOT BE TRANSACTIONAL.
// Data changed within a transaction
$obj_store->upsert($obj_entity);
// Not transactional
$obj_store->delete($obj_entity);
Whilst you can use the GDS\Entity
and GDS\Store
classes directly, as per the examples above, you may find it useful to extend one or the other.
For example
class Book extends GDS\Entity { /* ... */ }
$obj_store->setEntityClass('\\Book');
This way, when you pull objects out of Datastore, they are objects of your defined Entity class.
The Schema
holds the custom entity class name - this can be set directly, or via the Store
object.
When you change a field from non-indexed to indexed you will need to "re-index" all your existing entities before they will be returned in queries run against that index by Datastore. This is due to the way Google update their BigTable indexes.
I've included a simple example (paginated) re-index script in the examples folder, reindex.php
.
What Google say:
"Use a managed, NoSQL, schemaless database for storing non-relational data. Cloud Datastore automatically scales as you need it and supports transactions as well as robust, SQL-like queries."
https://cloud.google.com/datastore/
A few highlighted topics you might want to read up on
- Entities, Data Types etc.
- More information on GQL
- GQL Reference
- Indexes
- Ancestors
- More about Datastore Transactions
A full suite of unit tests is in the works. Click here for more details.
I am certainly more familiar with SQL and relational data models so I think that may end up coming across in the code - rightly so or not!
Thanks to @sjlangley for any and all input - especially around unit tests for Protocol Buffers.
Whilst I am using this library in production, it is my hope that other people find it of use. Feedback appreciated.