Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

merge pubsub-hp into master #1593

Merged
merged 59 commits into from
Feb 7, 2017
Merged

merge pubsub-hp into master #1593

merged 59 commits into from
Feb 7, 2017

Conversation

pongad
Copy link
Contributor

@pongad pongad commented Feb 7, 2017

No description provided.

davidtorres and others added 30 commits December 2, 2016 12:16
* merge from master

* make pubsub high perf compile

* make AckDeadlineRenewerTest pass
* use fewer guava classes in public surface

- SubscriberImpl used to expose AbstractService.
  We just make this package-private.
- SubscriberStats defines its own Stats type.
* make AutoValue work in PubSub
Use error_prone_annotations so we don't clobber classpath with Java8 classes.
Migrate SubscriberStats and PublisherStats to AutoValue
…ss to use containment in favor of inheritance
* Update

* Fixing a race condition on shutdown
Some of our tests seem to still be flaking with timeout,
eg https://travis-ci.org/GoogleCloudPlatform/google-cloud-java/jobs/185655553

Making the test not quiet should give us an idea of what's timing out.
* delete pull-related methods from PubSub

Instead, provide a way to create the Subscriber object.

* also delete modifyAckDeadline

* google-java-format changed lines
This commit tries to deflake failures reported in https://ci.appveyor.com/project/garrettjonesgoogle/google-cloud-java/build/57 .

The failure seems to come from `FakeSubscriberServiceImpl::sendError`. When calling `getAndAdvanceCurrentStream`, we waited for there to be a subscription by waiting on the `subscriptionInitialized` variable.

The variable is notified before `addOpenedStream` is called though, and so it's possible that `getAndAdvanceCurrentStream` will see a list without any opened stream. The fix is to wait for at least one stream to be registered first.
* delete publish-related methods from PubSub
* make pubsub work with new gax

update documentation links too
* Delete the handwritten Message type
* delete handwritten PushConfig

* update code samples
* Delete handwritten PubSub layer

* make Publisher/Subscriber builders public

* also delete PubSubRpc
* fix doc annotations

* clean up Publisher and Subscriber surface

- PUBSUB_API_ADDRESS and PUBSUB_API_SCOPE are removed.
  We now use {Publisher,Subscriber}Settings instead.
- Default settings are moved into the Builder class,
  so that they have package-private visibility.
  We can make them public later if we want to.
- Subscriber doc is fixed so that it is not chopped off mid-sentence.
- Subscriber doc mistakenly referred to the Publisher. This is now
  fixed.
pongad added 14 commits January 24, 2017 09:52
* hide Publisher and Subscriber's getStats()

The methods are not yet implemented,
so there is no reason to expose it.
Another way out is to delete it for the time being.

* delete the methods
* fix improper use of iteration

The previous implementation modifies a HashMap while iterating through
its key set, causeing unpredictable iteration behavior.
This might be the reason our tests intermittently deadlock.

The new implementation uses a PriorityQueue.
The time complexity is O(M * log N),
where M is the number of expirations before the cut over time
and N is the total number of expirations.
I am not sure how this compares to O(N) intended in the previous
implementation.
If required, O(N) is also possible using an ArrayList.

Unfortunately, a new failure has emerged.
Instead of deadlocking, testModifyAckDeadline intermittently fails.
Maybe
- I have fixed the old bug and created a new one,
- I have fixed the old bug that was masking another one,
- The deadlock wasn't caused by the iteration. Now the tests just fail
  before they could deadlock,
or some combination thereof.
The incorrect iteration should be fixed regardless.

* bug fix

don't process the same job twice

record next expiration after the loop
This is a follow up to
#1552 .

The linked PR has a race condition:
If the ExtensionJobs is added to the queue after the alarm is set up,
it is possible that the alarm would fire before the jobs are added.
Fixing this is easy, just set up the alarm after.
However, doing this consistently deadlocks the tests. Why?

The tests uses a fake executor.
In tests, time does not flow naturally, but is forced to increment,
usually by the executor's `advanceTime` method.
There is a race between the test thread advancing the time and
the mock server thread inserting more tasks to the fake executor.
If the tasks get inserted first, all is well.
Otherwise, `advanceTime` doesn't see the tasks,
and they do not get executed.
The fix is to check the "current time" every time a task is inserted.
If the task is inserted "in the past", we run the task immediately.
Doing this still deadlocks the tests. Why?

The fake executor needs to lock the task queue when registering a task.
If the task is inserted in the past, it also needs to run the task.
Running the task might involve sending a requst to the mock server.
A GRPC thread on the mock server might handle the request by adding more
tasks to the executor.
The executor's queue is locked by the first thread,
resulting in a deadlock.
The fix is to lock the queue just long enough to retrieve a task,
then execute the task without the lock.
* fix race in TestReceiver

If TestReceiver is set so that messages must be explicitly acked,
test code acks a message by calling replyNextOutstandingMessage.
There is a race between calling the function and the message being
delivered.

Previously, if the function gets called before the message is delivered,
the test fails since polling an empty deque returns a null pointer.
This commit makes us wait until a message becomes available instead.
* fix race in mock publisher

FakePublisherServiceImpl::publish has a race.
If the call to publish happen before a response can be placed,
the server will try to respond with a null object.

The fix is to either
- always set the response before calling
- make publish wait for the response
For propriety, this commit does both.

This fix reveals another flake.
Publisher uses exponential backoff with jitter.
The jitter randomly picks a number between 0 and a maximum.
If we pick low values too many times,
it will retry too often and the server will run out of canned
transient errors to respond back with.
The test still passed since it expected any Throwable.

This commit fixed the test to expect FakeException,
and use a fake "random" number generator to ensure we don't
run out of return values.

Retrying can still causes random test failures,
independently of above changes.
If a request fails due to DEADLINE_EXCEEDED,
the future is completed with a corresponding error.
However, the last RPC might not have been successfully cancelled.
When a new test starts, it gives canned response to the server.
The server might use some of these responses to respond to
RPCs of previous tests.
Consequently, a misbehaving test can fail every test that comes
after it.
This commit changes the test setup code so that it
creates a new fake server for every test to avoid this problem.
* return Name objects instead of plain String
* use ChannelProvider instead of ChannelBuilder

* document that ChannelProvider should normally create new channels
* fix deadlock in testStreamAckDeadlineUpdate

Fixes #1577. The detailed description of the bug is available in the
issue.

This commit fixes the bug by ensuring that the Future received by
MessageReceiver cannot be completed before a callback is added to it.
This bug manifests as testBundleAcks deadlocking.

The test
1. publishes a series of messages to a mock server,
2. waits to receive them back,
3. and acknowledges them.

For performance reasons, the client does not immediately
send the acknowledgement request to the server.
Instead, it sends acks in batch every 100ms.
Using a fake clock, the test advances the time by 100ms,
sending the ack-batch, then verify that the mock server
receives the acks.

The bug is in step 2. The test thread waits by waiting on a
CountDownLatch, which is counted down by GRPC thread calling
receiveMessage().
However, the method decrements the latch before acking the message.

On a bad day, the test thread can wake up, advance the clock,
and send the ack-batch before the GRPC thread could add the
message to the batch.
The test then waits for the server to receive an ack it never sent,
deadlocking the test.

The fix is for receiveMessage() to ack the message before
decrementing the counter.
* Update version to 0.8.1-SNAPSHOT (#1467)

Also, update versions in README to 0.8.0

* Add link to Maven Central for maven-central badge. (#1468)

Used to link to the image, which wasn't super useful.

* fix more races in pubsub tests

Previously BlockingProcessStreamReader has a terminate() method,
used to tell the Reader to stop reading from the emulator process.

This causes an inter-process race.
If the Reader stops before reading emulator's output,
the emulator process will hang as it tries to write to stdout/stderr
as there's no one to read from the other side of the pipe.

Since there is no way to safely stop the Reader,
this commit deletes the method and its associated test.

Additionally, the timeout for LocalSystemTest is increased to 3 minutes,
since the emulator, somehow, consistently takes just longer than a
minute to shut down.

* Regenerating SPI layer (#1501)

* Converting Error Reporting and Monitoring to use resource name types
* Removing formatX/parseX methods from pubsub, converting usage of the
  same to resource name types
* New methods in Logging and PubSub

* Updating grpc dependency to 1.0.3 (#1504)

* Release 0.8.1 (#1512)

* Fix code snippet (wrong method name) in README.md

Original code snippet in _"Querying data"_ section: 

`..while (!queryResponse.jobComplete()) {..`

This results in a compile error: 

_"Cannot resolve method jobComplete()"_

The correct method is `jobCompleted()`

* Updating version in README files. [ci skip]

* Update version to 0.8.2-alpha-SNAPSHOT

* Allow path in URIs passed to newFileSystem (#1470)

* This makes it easier for users who start with a URI describing a full
path to get a FileSystem that can work with that path, since they no
longer have to needlessly remove the path from the URI.

Note that Oracle's description of newFileSystem [1] puts no restriction
on the passed URI.

[1]
https://docs.oracle.com/javase/7/docs/api/java/nio/file/FileSystems.html#newFileSystem(java.net.URI,%20java.util.Map)

* Preventing logging re-entrance at FINE level (#1523)

* Preventing logging re-entrance at FINE level

Also:
* Reducing the scope of synchronized blocks
* Removing logger exclusions except for Http2FrameLogger

* Add a PathMatcher for CloudStorageFileSystem (#1469)

Add a test, as well.
We reuse the default PathMatcher since it does a fine job of globbing files.

* Set timestamp from LogRecord (#1533)

* Initialize the default MonitoredResource from a GAE environment (#1535)

* BigQuery: Add support to FormatOptions for AVRO

#1441

Added new constant in FormatOptions, and a corresponding factory method. Updated test cases. Confirmed that AVRO does not require special treatment (like CSV does), so no additional changes are required.

* Reverting changed commited by mistake before review.

* BigQuery: Add support to FormatOptions for AVRO

#1441

Added new constant in FormatOptions and a corresponding factory method. Updated test cases. Confirmed that AVRO does not require special treatment (like CSV does), so no additional changes are required.

* use RpcFuture and remove old BundlingSettings (#1572)

* use RpcFuture and remove old BundlingSettings

* Release 0.8.2

Note: This version was accidentally released to Sonatype because of
experimenting with deployment commands, combined with the fact that
autoReleaseAfterClose is set to true. Since releases can't be taken
back, we might as well own up to the release and push the code
forward.

* Updating version in README files. [ci skip]

* Fixing javadoc error in GaeFlexLoggingEnhancer (#1582)

* get tests to compile and pass
This is the same race as
#1495
but in sendStreamResponse rather than sendError.
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Feb 7, 2017
@pongad pongad changed the title Merge merge pubsub-hp into master Feb 7, 2017
@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 83.452% when pulling 374210e on pongad:merge into 8f4828f on GoogleCloudPlatform:master.

@garrettjonesgoogle
Copy link
Member

Take care of any items I mentioned that are trivial to fix, and file issues for ones that are not trivial & fix them after merging.

@@ -1,5 +1,5 @@
/*
* Copyright 2016 Google Inc. All Rights Reserved.

This comment was marked as spam.

This comment was marked as spam.

@@ -54,6 +54,36 @@
<version>${grpc.version}</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>20.0</version>

This comment was marked as spam.

This comment was marked as spam.

<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.9.4</version>

This comment was marked as spam.

This comment was marked as spam.

<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>

This comment was marked as spam.

This comment was marked as spam.

<artifactId>mockito-all</artifactId>
<version>1.9.5</version>
<scope>test</scope>
</dependency>

This comment was marked as spam.

This comment was marked as spam.

<artifactId>os-maven-plugin</artifactId>
<version>1.4.0.Final</version>
</extension>
</extensions>

This comment was marked as spam.

This comment was marked as spam.

/**
* A barrier kind of object that helps to keep track and synchronously wait on pending messages.
*/
class MessagesWaiter {

This comment was marked as spam.

@pongad
Copy link
Contributor Author

pongad commented Feb 7, 2017

@garrettjonesgoogle PTAL

@garrettjonesgoogle
Copy link
Member

LGTM

@pongad pongad merged commit c230c0f into googleapis:master Feb 7, 2017
@pongad pongad deleted the merge branch February 7, 2017 23:47
@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 83.461% when pulling 8cf7a2b on pongad:merge into 8f4828f on GoogleCloudPlatform:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 83.461% when pulling 8cf7a2b on pongad:merge into 8f4828f on GoogleCloudPlatform:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 83.451% when pulling 8cf7a2b on pongad:merge into 8f4828f on GoogleCloudPlatform:master.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants