Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

GCP support for stateless, HA installations #2768

Closed
joshdurbin opened this issue Jun 11, 2019 · 20 comments
Closed

GCP support for stateless, HA installations #2768

joshdurbin opened this issue Jun 11, 2019 · 20 comments
Assignees
Labels
feature-request Used for new features in Teleport, improvements to current should be #enhancements security-review The last stage of checking a product feature.

Comments

@joshdurbin
Copy link
Contributor

joshdurbin commented Jun 11, 2019

Howdy, I'm looking to add Google GCS support to Teleport for recording storage. I have an ask to see this built out for Teleport for a proposed rollout within several Google projects/environments and, based on scroll back in Gravitational Community Slack forum, it seems a few others have asked for similar functionality. I have reviewed the implementation for S3, which is straight forward enough, essentially:

  • add lib/events/gcssessions
  • add lib/events/gcssessions/gcshandler.go mirror functionality in lib/events/s3sessions/s3handler.go
  • add SchemGCS to constants.go
  • add a case for aforementioned scheme to the switch in initUploadHandler, in lib/service/service.go
  • update the TestParseSessionsURI in lib/utils/utils_test.go
  • update the relevant documentation

Adding GCS as a storage backend would require an additional dependency, which is currently at version 0.40.0:

[[constraint]]
  name = "cloud.google.com/go"
  version = "0.40.0"
@joshdurbin joshdurbin changed the title Add GCS storage support for recordings GCS support for session recordings Jun 12, 2019
@joshdurbin
Copy link
Contributor Author

joshdurbin commented Jun 19, 2019

Something like this. I've built/tested this locally, against Terraform-managed GCS buckets. I'd prefer not to have any code in there whatsoever relating to the provisioning of GCS buckets, but am following the pattern set in the S3 code.

@benarent
Copy link
Contributor

Thanks for the interest @joshdurbin and for the work to add Google Cloud Storage backend. For us to productise this, it would be good to have a complete HA GCP Story.

  • GCS for Storage
  • Firebase / Big Table as a backend alternative for DynanoDB
  • Best practice for LB and firewall setup.

I'm going to leave this open to gather more feedback from the community and team.

@benarent benarent added the feature-request Used for new features in Teleport, improvements to current should be #enhancements label Jun 20, 2019
@joshdurbin
Copy link
Contributor Author

joshdurbin commented Jun 20, 2019

@benarent Agreed. We're looking to house all the data for Teleport, stateless in GCP-native resources. In fact I'm presently working to get event and non event storage off the ground in Firestore and should have something workable by end of the week or very early next. Should I update this issue to reflect a broader GCP story?

@benarent
Copy link
Contributor

Yes @joshdurbin , that would be great. Let me know how you get on.

@klizhentas
Copy link
Contributor

I was not familiar with firestore specifically, but looking at it, it seems like it supports server side encryption by default, which is very nice and events.

@benarent can you put this on the next product planning meeting to we can discuss this with mr @kontsevoy

@klizhentas
Copy link
Contributor

@joshdurbin questions for you:

Who is going to use this feature in production (is this a hack project, production company policy, etc)

@joshdurbin joshdurbin changed the title GCS support for session recordings GCP support for stateless HA installations via GCS and Firestore Jun 26, 2019
@joshdurbin
Copy link
Contributor Author

joshdurbin commented Jun 26, 2019

@klizhentas I'm in the process of building several clusters in our environment(s) at BC. We'd prefer to keep our data in Google's systems/services, hence this effort, and additional care not to maintain other state storage systems (etcd, NFS or even NFS via GCS/Fuse) . Ideally this work would get brought into core Teleport until then, though, we're running custom builds.

@benarent Current state of things; I should have cluster state working tomorrow, I'm a bit behind schedule. Currently implemented are:

  • GCS storage
  • Firestore-backed events (minus programatic composite index creation - stab at it is in place but commented out)

That work is all in the BC fork.

@joshdurbin joshdurbin changed the title GCP support for stateless HA installations via GCS and Firestore GCP support for stateless, HA installations Jun 26, 2019
@joshdurbin
Copy link
Contributor Author

joshdurbin commented Jul 1, 2019

@klizhentas @benarent Alright, this work is close to done. I've been testing the changes locally against GCS and Firestore for the last few days.

The data in the screenshots is junk data for a local cluster, no secrets.

Indexes remain created manually, I'll add support for ensuring their creation in the next day or two.

Screen Shot 2019-07-01 at 1 25 09 PM

Events are stored in Firestore with the document ID equal to the session ID and event type.

Screen Shot 2019-07-01 at 1 26 59 PM

Datastore documents cannot be stored with their IDs equal to the key as the value of the key violates Firestore document ID requirements (contain forward slashes, periods, etc...). The document IDs are SHA1 hashed from the key to maintain fast fetching from a known point rather than a query.

Screen Shot 2019-07-01 at 2 31 15 PM

Both the Firestore Event "handler" and the Firestore backend have ticker-based expired entry removal. The Firestore Event "handler" does so by evaluating the timestamp on the record while the Firestore backend actually evaluates based on the "expires" property, if set.

The Firestore backend uses a document snapshot query stream to consume document changes for the collection -- allowing all auth servers watchers to receive updates.

I'll issue a PR once I drop in the programmatic index creation and cleanup some error handling.

@klizhentas
Copy link
Contributor

@kontsevoy has approved the feature in the product, so I'll be helping you along the way. Once you are ready to review, let's do a zoom review kick off session once you are ready.

@klizhentas klizhentas self-assigned this Jul 2, 2019
@klizhentas
Copy link
Contributor

@joshdurbin can you please post the configuration of the teleport with the new changes, we would like to review it as well.

@joshdurbin
Copy link
Contributor Author

joshdurbin commented Jul 2, 2019

Woot. What I'm using right now, outside GCP's environment, on my laptop is:

  storage:
    type: firestore
    collection_name: cluster-data
    credentials_path: /var/lib/teleport/gcs_creds
    project_id: bc-jdurbin
    audit_events_uri: 'firestore://events?projectID=bc-jdurbin&credentialsPath=/var/lib/teleport/gcs_creds'
    audit_sessions_uri: 'gs://teleport-session-storage-2?credentialsPath=/var/lib/teleport/gcs_creds&projectID=bc-jdurbin'

In GCP you'd likely make use of attached compute service accounts and forego credentials files.

@joshdurbin
Copy link
Contributor Author

Full config being used at the moment is:

#
# Sample Teleport configuration file.
#
teleport:
  nodename: C02WG09CHTDH
  data_dir: /var/lib/teleport
  pid_file: /var/run/teleport.pid
  auth_token: cluster-join-token
  auth_servers:
  - 0.0.0.0:3025
  connection_limits:
    max_connections: 15000
    max_users: 250
  log:
    output: stderr
    severity: DEBUG
  ca_pin: ""
  storage:
    type: firestore
    collection_name: cluster-data
    credentials_path: /var/lib/teleport/gcs_creds
    project_id: bc-jdurbin
    audit_events_uri: 'firestore://events?projectID=bc-jdurbin&credentialsPath=/var/lib/teleport/gcs_creds'
    audit_sessions_uri: 'gs://teleport-session-storage-2?credentialsPath=/var/lib/teleport/gcs_creds&projectID=bc-jdurbin'
auth_service:
  enabled: "yes"
  listen_addr: 0.0.0.0:3025
  tokens:
  - proxy,node:cluster-join-token
  session_recording: ""
  client_idle_timeout: 0s
  disconnect_expired_cert: false
  keep_alive_count_max: 0
ssh_service:
  enabled: "yes"
proxy_service:
  enabled: "yes"
  listen_addr: 0.0.0.0:3023
  web_listen_addr: 0.0.0.0:3080
  tunnel_listen_addr: 0.0.0.0:3024
  https_key_file: /var/lib/teleport/webproxy_key.pem
  https_cert_file: /var/lib/teleport/webproxy_cert.pem

@joshdurbin
Copy link
Contributor Author

Updates are in after running through the PR. Keys are now human readable.

@joshdurbin
Copy link
Contributor Author

Tests for the Firestor backend and Firestone events sub-system are configured in by default to use the gcloud emulator but can easily be modified to hit live Firestore via creds or GCE SA bindings. GCS uses FakeGCS for tests, essentially an in-code emulator.

@benarent
Copy link
Contributor

I'm just reviewing this open issue, since it looks like this will make 4.2, and wanted to get ahead with some Documentation. Will we cut a GCP specific release, since I see it's not enabled by default? d346f2b#diff-beec5651c04d7af5273733679b64c00c ADDFLAGS='-tags firestore' make teleport ?

@benarent benarent added this to the 4.2 "Alameda" milestone Oct 29, 2019
@webvictim
Copy link
Contributor

Just to indicate an early preference, I would prefer us not to need separate artefacts for this. Our downloads page is already getting crowded.

@joshdurbin
Copy link
Contributor Author

To be honest, I followed what was there for S3 / DynamoDB, however, I never had to supply the flags to get S3/DynamoDB or the GCS/Firestore support. I never looked into why that was the case and duplicated the docs from S3/DynamoDB. I’m not sure I follow with regards to downloads, do you currently split things out on your downloads pages? It doesn’t look like it; https://gravitational.com/teleport/download/. Same with the enterprise build, things are not broken out w/ for S3/DynamoDB support presently.

@klizhentas
Copy link
Contributor

@webvictim @benarent I think having teleport support both GCP and AWS out of the box is OK, as it does not affect the binary too much. FIPS is a bit different case, because it has to be recompiled with boringcrypto

@benarent benarent added the security-review The last stage of checking a product feature. label Dec 26, 2019
@benarent
Copy link
Contributor

I've added a security-review label and I'm going to assign to RJ so he can do a final security review once he's back from vacation.

@webvictim
Copy link
Contributor

I think most of this work is already done?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature-request Used for new features in Teleport, improvements to current should be #enhancements security-review The last stage of checking a product feature.
Projects
None yet
Development

No branches or pull requests

5 participants