Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Restoring Cassandra from snapshot getting an Error, and discrepancy in the node count after restoring the data #4724

Open
rg2609 opened this issue Nov 11, 2024 · 0 comments

Comments

@rg2609
Copy link

rg2609 commented Nov 11, 2024

We are running JanusGraph along with two clusters of Cassandra and Elasticsearch in Docker.

version: "3"

services:
  janusgraph:
    image: local-janusgraph:latest
    container_name: jce-janusgraphdb
    environment:
      JANUS_PROPS_TEMPLATE: cql-es
      janusgraph.storage.hostname: jce-cassandra-1,jce-cassandra-2
      janusgraph.index.search.hostname: jce-elastic-1,jce-elastic-2
    ports:
      - "8182:8182"
    networks:
      - jce-network
    volumes:
      - janusgraph-data:/var/lib/janusgraph  # Mounts a volume to JanusGraph

  cassandra-1:
    image: cassandra:3
    container_name: jce-cassandra-1
    environment:
      CASSANDRA_SEEDS: "jce-cassandra-1,jce-cassandra-2"
      CASSANDRA_CLUSTER_NAME: "janusgraph-cluster"
    networks:
      - jce-network
    ports:
      - "9042:9042"
      - "9160:9160"
    volumes:
      - cassandra1-data:/var/lib/cassandra  # Mounts a volume to Cassandra

  cassandra-2:
    image: cassandra:3
    container_name: jce-cassandra-2
    environment:
      CASSANDRA_SEEDS: "jce-cassandra-1,jce-cassandra-2"
      CASSANDRA_CLUSTER_NAME: "janusgraph-cluster"
    networks:
      - jce-network
    volumes:
      - cassandra2-data:/var/lib/cassandra  # Mounts a volume to Cassandra

  elasticsearch-1:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.6.0
    container_name: jce-elastic-1
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "network.host=0.0.0.0"
      - "discovery.zen.ping.unicast.hosts=jce-elastic-1,jce-elastic-2"
    ports:
      - "9200:9200"
    networks:
      - jce-network
    volumes:
      - esdata1:/usr/share/elasticsearch/data  # Mounts a volume to Elasticsearch

  elasticsearch-2:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.6.0
    container_name: jce-elastic-2
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "network.host=0.0.0.0"
      - "discovery.zen.ping.unicast.hosts=jce-elastic-1,jce-elastic-2"
    networks:
      - jce-network
    volumes:
      - esdata2:/usr/share/elasticsearch/data  # Mounts a volume to Elasticsearch

networks:
  jce-network:

volumes:
  janusgraph-data:
  cassandra1-data:
  cassandra2-data:
  esdata1:
  esdata2:

Dockerfile as follow

FROM docker.io/janusgraph/janusgraph:latest

WORKDIR /opt/janusgraph
USER root

# Copy configuration files
COPY janusgraph-server.yaml /opt/janusgraph/conf/janusgraph-server.yaml
COPY empty-sample.groovy /opt/janusgraph/scripts/empty-sample.groovy
COPY janusgraph-keyspace-one.properties /opt/janusgraph/conf/janusgraph-keyspace-one.properties
COPY janusgraph-keyspace-two.properties /opt/janusgraph/conf/janusgraph-keyspace-two.properties

# Set ownership for the entire conf directory in one step
# RUN chown -R janusgraph:janusgraph /opt/janusgraph/conf

USER janusgraph

WORKDIR /opt/janusgraph

We attempted to obtain fresh graph data, took a snapshot of keyspace_one, and then dropped the keyspace by running the following command.

DROP Keyspace_two ;

We then recreated the keyspace by executing the following command.

CREATE KEYSPACE IF NOT EXISTS keyspace_one WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

We executed the following command to recreate the schema and copy/restore the data from a snapshot to the keyspace folder for keyspace_one. The command is as follows:

  1. Restore
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd edgestore-926ec1209d9311efa9ba31a44f2d5d77 && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd edgestore_lock_-92aa91a09d9311ef8a912791ce557ac0 && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd graphindex-93097a809d9311efa9ba31a44f2d5d77/ && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd graphindex_lock_-9343eb709d9311ef8a912791ce557ac0 && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd janusgraph_ids-9213cfe09d9311ef8a912791ce557ac0 && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd systemlog-93d52ef09d9311efa9ba31a44f2d5d77 && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd system_properties-91c6c1509d9311efa9ba31a44f2d5d77/ && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd system_properties_lock_-942cebe09d9311efa9ba31a44f2d5d77/ && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd txlog-93989b209d9311efa9ba31a44f2d5d77 && cp ./snapshots/1731070358151/* . && cd -

  1. Schema creation
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f edgestore-926ec1209d9311efa9ba31a44f2d5d77/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f edgestore_lock_-92aa91a09d9311ef8a912791ce557ac0/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f graphindex-93097a809d9311efa9ba31a44f2d5d77/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f graphindex_lock_-9343eb709d9311ef8a912791ce557ac0/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f janusgraph_ids-9213cfe09d9311ef8a912791ce557ac0/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f systemlog-93d52ef09d9311efa9ba31a44f2d5d77/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f system_properties-91c6c1509d9311efa9ba31a44f2d5d77/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f system_properties_lock_-942cebe09d9311efa9ba31a44f2d5d77/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f txlog-93989b209d9311efa9ba31a44f2d5d77/schema.cql >> null

  1. Nodetool refresh
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two edgestore
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two edgestore_lock_
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two graphindex
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two graphindex_lock_
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two janusgraph_ids
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two systemlog
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two system_properties
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two system_properties_lock_
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two txlog

After checking the node count of the newly restored data, it is not the same as the node count before the snapshot.

Node count for before snapshot

gremlin> g1.V().count();
==>3354

Node count after restore snapshot

gremlin> g1.V().count();
==>1592

We attempted to fetch some data from the restored keyspace, but we encountered the following error:

gremlin> g1.V().hasLabel("People").values("title");
Could not find type for id: 60941
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.NullPointerException: Could not find type for id: 60941
	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:994)
	at org.janusgraph.graphdb.types.vertices.JanusGraphSchemaVertex.name(JanusGraphSchemaVertex.java:73)
	at org.janusgraph.graphdb.vertices.AbstractVertex.label(AbstractVertex.java:122)
	at org.janusgraph.graphdb.types.system.ImplicitKey.computeProperty(ImplicitKey.java:94)
	at org.janusgraph.graphdb.query.vertex.BasicVertexCentricQueryBuilder.executeImplicitKeyQuery(BasicVertexCentricQueryBuilder.java:236)
	at org.janusgraph.graphdb.query.vertex.VertexCentricQueryBuilder.properties(VertexCentricQueryBuilder.java:119)
	at org.janusgraph.graphdb.util.ElementHelper.getValues(ElementHelper.java:48)
	at org.janusgraph.graphdb.query.condition.PredicateCondition.evaluate(PredicateCondition.java:72)
	at org.janusgraph.graphdb.query.condition.And.evaluate(And.java:55)
	at org.janusgraph.graphdb.query.graph.GraphCentricQuery.matches(GraphCentricQuery.java:157)
	at org.janusgraph.graphdb.query.QueryProcessor.lambda$getFilterIterator$2(QueryProcessor.java:138)
	at org.janusgraph.graphdb.util.CloseableIteratorUtils$1.computeNext(CloseableIteratorUtils.java:51)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
	at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:55)
	at org.janusgraph.graphdb.query.ResultSetIterator.<init>(ResultSetIterator.java:45)
	at org.janusgraph.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:68)
	at org.janusgraph.graphdb.query.graph.GraphCentricQueryBuilder.lambda$iterables$1(GraphCentricQueryBuilder.java:240)
	at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphStep.lambda$executeGraphCentricQuery$2(JanusGraphStep.java:203)
	at org.janusgraph.graphdb.util.ProfiledIterator.<init>(ProfiledIterator.java:36)
	at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphStep.executeGraphCentricQuery(JanusGraphStep.java:203)
	at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphStep.lambda$null$0(JanusGraphStep.java:106)
	at java.base/java.lang.Iterable.forEach(Unknown Source)
	at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphStep.lambda$new$1(JanusGraphStep.java:106)
	at org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep.processNextStart(GraphStep.java:158)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:155)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
	at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphMultiQueryStep.processNextStart(JanusGraphMultiQueryStep.java:111)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:155)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.hasNext(ExpandableStepIterator.java:47)
	at org.apache.tinkerpop.gremlin.process.traversal.step.map.NoOpBarrierStep.processAllStarts(NoOpBarrierStep.java:67)
	at org.apache.tinkerpop.gremlin.process.traversal.step.map.NoOpBarrierStep.processNextStart(NoOpBarrierStep.java:56)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:155)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
	at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:48)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:155)
	at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:192)
	at org.apache.tinkerpop.gremlin.server.op.AbstractOpProcessor.handleIterator(AbstractOpProcessor.java:98)
	at org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor.lambda$evalOpInternal$6(AbstractEvalOpProcessor.java:267)
	at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:283)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
gremlin> 

I suspect we are missing the backup and restore of index data in Elasticsearch.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

1 participant