Mitigate the concurrency risk #1
Replies: 6 comments
-
@GioCirque answered by email I think an approach more aligned with https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.OptimisticLocking.html is possibly a better solution. Imagine that you write a version number with each write to a record. The first time the record is written is has version 0. Each subsequent write would expect to increment the version by 1. Using a DynamoDB support “WHERE” clause to restrict he write operation to only where the version is equal to current+1. This pattern would also enable optional concurrency. Imagine having required concurrency from a user perspective and the ability to override that from an administrative perspective. I think involving a FIFO queue unduly complicates the infrastructure and could significantly increase the operating costs while introducing several complexities that would further expand the solution footprint. Things like, repeat failure and dead-letter queue which then call into question the manageability of those failovers. How/who/where does all that get managed. It’s certainly an option, and you could even do it with this package if it’s what’s right for you implementation. You could have your… End-client write to the FIFO queue With something like that, you’re not tracking job number but relying on the integrity of the queue and you’re giving yourself a 10 (message) * 1000 (initial lambda concurrency) thru put rate at ~10k ops per seconds. In any case, you’re absolutely welcome to make a PR. Please be sure that any functionality, like queues, is introduced as an option that can be turned on for a use case … additive changes instead of breaking changes. |
Beta Was this translation helpful? Give feedback.
-
You are definitely right ! I understand that keeping the moving parts low would be better for the project. And adding an external queue management service introduces other unwanted constraints I hadn't seen. Optimistic locking seems a better place to start tackling this issue. It looks a lot to the way PouchDB handles conflicts by the way. However, would it be sufficent ? For what I can tell, writing operations to a leveldb update multiple documents : the document you actually update and global documents (metastore documents for example and others). Thus locking a document may not be enough to prevent database corruption. Would DynamoDB transactions solve this ? I'm willing to collaborate and submit a PR, but I am discovering the AWS SDK. I may give it a try with some help though ! |
Beta Was this translation helpful? Give feedback.
-
DynamoDB transactions MIGHT be of some help. It really depends on which aspect of potential corruption you’re focusing on. There are several ways that data, and databases, can become corrupt. I think optimistic concurrency management, as described, would be sufficient for managing most concurrency problems. Transactions are really more about multiple simultaneous record changes in concert, and would be a concern area to address. I think it an area to validate doesn’t break concurrency. |
Beta Was this translation helpful? Give feedback.
-
Hi Just to keep you informed. I successfully configured your dynamodb-leveldown inside a pouchdb adapter. kudos for that compatibility ! Unfortunately it broke at some point when I tried to pass the full pouchdb test suite with this setup. I couldn't make it work with dynamo-s3 either (it won't create the bucket and keeps posting the attachments into dynamo). You are definitely right about transactions vs optimistic concurrency. In my opinion, transaction is the responsibility of pouchdb, not the underlying store. I won't go any further down this road for now since it is really cheap and simple to bootstrap a couchdb inside AWS (single instance or clusterized with fargate, ECS, beanstalk, or whatever...) regards -- |
Beta Was this translation helpful? Give feedback.
-
@jpbourgeon its possibly even cheaper if you add some auto-scaling rules and use fargate spot instances. |
Beta Was this translation helpful? Give feedback.
-
Could you elaborate a little on that ? I may not understand very well how to do it and you seem to be knowledgeable on AWS architecturing. Here is my (naïve ?) personal analysis and the minimal costs associated to it. Do you validate it or did I miss something ? Is there a better way to go in your opinion ? As I see it for couchdb, the compute resource cannot be detached from the storage. To me, couchdb has to run in an instance/container attached to a dedicated storage (let's call this a couchdb instance). Thus making it unfit for a "serverless" setup and autoscaling (since compute and storage are tied together, there is a risk of data loss or corruption in case of concurrent writes into the same data storage). A single couchdb instance of this kind (instance/container + attached volume) is perfect for prototyping, a regional and/or a starting project. Price starts roughly around 4$/mo (smallest EC2 - T4g, or container for a month and a tiny 10GB EBS storage). In a multi couchdb instances setup, each independent couchdb instance can answer behind a load balancer (ELB with sticky session) and then replicates to its siblings in the background, bringing eventual consistency to the cluster. And the advantages of load balancing of course (fault tolerance, zero downtime updates, backup, etc.) Price starts roughly around 28$/mo for a cluster of 2 couchdb instances (20$/mo for the ELB, and 2x4$/mo for the compute and storage of 6 nano instances, as seen before). On top of that, minimal other costs are due: route53 for the domain if any (0,50$/mo), and the inbound data transfer (0.09$ per GB which should stay minimal at the beginning). The SSL/TLS certificate is free. Let's round these additional costs to 1$/mo for the dev instance, and 2$ for the production cluster. This brings the development instance to 5$/mo and the production instance to 30$/mo The costs of the computing resources can be optimized with : spot (up to 90%) or reserved instances (up to 72 %), saving some credits in each scenario. However spot instances may not be interesting in a production scenario since they can be terminated at any time, thus incurring the risk of dataloss in a mono instance setup, or if the cluster hasn't reached eventual consistency before the termination of an instance that holds unsynchronized data. |
Beta Was this translation helpful? Give feedback.
-
conversation started by email
I thought of an AWS way of mitigating the risk of global objects corruption in the database, due to concurrent write accesses. We could use an AWS SQS FIFO queue to build a stream of sequential writes to dynamodb, thus preventing any database corruption. This would make your component a viable storage for pouchdb, don't you agree ?
In pseudo-code the dynamodb-leveldown instance would :
generate a unique job number and publish it to the queue
subscribes to the queue awaiting for its turn to come
when its turn comes, perform the writing action
once finished (whatever the result), delete its job number from the queue, freeing the db
Beta Was this translation helpful? Give feedback.
All reactions