Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

reduce search results size #438

Closed
schabe77 opened this issue Jun 3, 2021 · 10 comments
Closed

reduce search results size #438

schabe77 opened this issue Jun 3, 2021 · 10 comments
Assignees
Labels
enhancement New feature or request P3 Prospects for future development or feature requests that would be nice to have. Typos or minor bugs

Comments

@schabe77
Copy link

schabe77 commented Jun 3, 2021

Describe the problem you are trying to solve
I try to migrate from old Adwords API to new Ads API and started by replacing the reports with SearchStreams. I recognized that loading some of the reports take longer than before. First I thought it was an server issues, but then I found out that the responses are much larger. An example: when doing the request

SELECT campaign.status, ad_group.status, ad_group_criterion.status, ad_group_criterion.type
 FROM ad_group_criterion
 WHERE ad_group_criterion.negative = false
 AND campaign.status != REMOVED
 AND ad_group.status != REMOVED
 AND ad_group_criterion.status != REMOVED

on a specific account, the response stream contains 280MB of data. When I do the same request with the old Adwords Reporting API (using GZIPPED_CSV as download format) the download size is only 200KB. The uncompressed report is 50MB large.

Describe the solution you would like
It would be great if the response size of the Ads API could be similar to the report size of the Adwords API and not differ by a factor of 1400.

There are two reasons why the response size differs so massively:

  • the new Google Ads API not only returns the requested status fields, but also the resource names of campaign, adgroup and adgroup criterion. This increases the "line" size by the factor of about 5,7 and explains why the report was a 50MB file and now contains 280MB of data.
  • the old API returned a compressed csv file, the response of the new API seems to be uncompressed

Describe alternatives you've considered
There is not much I can do on my side, but I hope the library could

  • omit the resource names in the answer. I didn't ask for it, I don't need it and they inflate the response size.
  • use compressed requests/responses. As far as I could see gRPC supports gzip compression: io.grpc.stub.AbstractStub.withCompression("gzip")

Additional context
It would be great if the current behaviour could be improved, because the larger downloads not only takes longer but is also wasting bandwidth and therefore costs money.

@schabe77 schabe77 added the enhancement New feature or request label Jun 3, 2021
@nwbirnie
Copy link
Contributor

nwbirnie commented Jun 3, 2021

So on the compression side, yes, but not now. We looked into adding support recently, however to do so we are blocked on a complicated internal (server-side) migration. Once this happens, we can absolutely enable compression from the client side.

Dropping resource names is an interesting idea, but not something we can easily do without breaking folks who are using the current behavior. Realistically I think that compression is a better option to reduce payload size in the longer-term.

@schabe77
Copy link
Author

schabe77 commented Jun 3, 2021

Hi Nick,

thanks for the fast reply. I appreciate every improvement, so I would be happy if activating the compression was possible in a near future. Depending on how good the compression works on streaming data it could reduce the response size of my example to 10MB. That's great news.

Regarding the unrequested occurrences of resource name: I understand the problem that users may rely on it. For me as newbie and long-term Adwords API user (and client of an ISP with data volume capping) this is unexpected and unwanted behaviour, if someone needs the field it's possible to request it. If there is a chance of changing it, I really would appreciate it. I don't know how you server software works, but if it's possible to change it in v8 in 3 month, the change could be announced at the migration guide.

As I said, I'd extremely happy if you could enable the compression, the removal of the resource names would be the icing on the cake.

How do we proceed here? Should the bug stay open here and you keep it in mind / an eye on it? Or should I close it now and re-open it in 1 to 2 months?

@nwbirnie nwbirnie added the P3 Prospects for future development or feature requests that would be nice to have. Typos or minor bugs label Jun 4, 2021
@nwbirnie
Copy link
Contributor

nwbirnie commented Jun 4, 2021

Let's keep this bug open to track the request.

No promises that we'll get the compression implemented, like I say the blocker is really complicated. Will keep pushing on it though.

@nwbirnie
Copy link
Contributor

nwbirnie commented Jun 4, 2021

And yes, let's check back in a couple months.

@schabe77
Copy link
Author

Hi,

5 month later I wanted to ask if there are any news regarding result compression and/or removal of unrequested resource_names from result.

And just to explain why I would like to have both options: I want to move the software to the cloud. Unfortunately this comes with a price: a price for the traffic. I wouldn't complain about the result size, if I had only one account. But my software maintains more than 3000 accounts, so the reduction of transferred bytes really matters. paying for gigabytes is much better than paying for terrabytes.

@nwbirnie
Copy link
Contributor

Let me double check the status on this and get back to you. It looks like we're still considering this, but the infrastructure probably isn't ready yet.

I did a quick check on the possible gains from applying compression, I got a range of results for this, from 30% speedup, to 90% speedup. The biggest gains were available for the most complicated query, with the largest result set. So if you run lots of small queries, it probably won't bring the 10x improvements you mentioned earlier. But if your traffic is dominated by large queries then with a bit of luck this should make a significant dent. Do you possibly have a representative CID and query?

@schabe77
Copy link
Author

Regarding the queries (with unnecessary resource names in result): It affects the most reports, were different resources are "joined". We create human readable keyword performance reports containing account name, campaign name, adgroup name, keyword details and perfomance data from keyword_view. A report that is sure to deliver results twice as large as requested/needed could be:

SELECT ad_group.name, ad_group_criterion.keyword.match_type, ad_group_criterion.keyword.text, campaign.name, customer.descriptive_name, metrics.clicks, metrics.conversions, metrics.cost_micros, metrics.impressions
  FROM keyword_view
 WHERE segments.date='2021-11-23'

There you have a lot of redundant data that is transferred without being requested:

  • customer.resource_name
  • adgroup.resource_name
  • adgroup_criterion.resource_name
  • keyword_view.resource_name

These resource names are in fact often longer than the value I'm actually interested in and they are definitely longer than the resources' numerical ids. If I was interested in resources name, I would request them explicitly and either use keyword_view.resource_name or adgroup_criterion.resource_name, because these already contain the data that is provided by customer.resource_name and adgroup.resource_name.

If I was interested in the ids, I personally would request the id fields instead because they are not only much shorter, but already contain the numerical value what saves me the parsing of first the resource name and then the id-string to get the number.

@jradcliff
Copy link
Member

Version v10 of the Google Ads API has a new omit_unselected_resource_names feature that we added for this issue. Check out the updated Query Structure guide and GetKeywords example for details.

As mentioned in the guide, if you set omit_unselected_resource_names=false in your queries, please remember to explicitly request any resource_name fields that you require. This is particularly important if you are taking the results of your searchStream or search calls and then sending those objects in update operations.

@jradcliff jradcliff self-assigned this Feb 10, 2022
@schabe77
Copy link
Author

Thank you for this fix, I will definitely use this parameter.

Do have an update on the compression-topic?

I recently made a test with the shopping performance view. The report was 2,8GB large. Without the resource names it was "only" 2,1GB large. After all, a saving of 25%, what is great. But then I compressed the results (using gzip) and the reports where 178MB (with resource names) and 173MB (without resource names) large. So the compression would have the much greater impact.

@schabe77
Copy link
Author

I just opened #564 because it seems that using the parameter breaks some reports.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request P3 Prospects for future development or feature requests that would be nice to have. Typos or minor bugs
Projects
None yet
Development

No branches or pull requests

3 participants