Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Print::print(const __FlashStringHelper *) is very inefficient #6524

Closed
nomis opened this issue Sep 15, 2019 · 12 comments · Fixed by #6893
Closed

Print::print(const __FlashStringHelper *) is very inefficient #6524

nomis opened this issue Sep 15, 2019 · 12 comments · Fixed by #6893
Assignees
Milestone

Comments

@nomis
Copy link
Contributor

nomis commented Sep 15, 2019

The implementation calls pgm_read_byte() to read every character. This is inefficient because it will read every block of 4 bytes multiple times.

It calls calls write(c) for every character. This is very inefficient at the network level if TCP_NODELAY is turned on because suddenly all flash strings are sent out as one byte per packet.

It should use strncpy_P() with a small buffer instead.

@earlephilhower
Copy link
Collaborator

A PR for this would be quite welcome. Just make sure the buffer isn't too large, as the stack space is pretty limited on this device. I think 64 bytes or so is what is used elsewhere.

@TD-er
Copy link
Contributor

TD-er commented Sep 29, 2019

I was looking into the code to see if this suggested change could also benefit my own code.

But what I don't understand is how it could be faster to use strncpy_P(), since that code is also using pgm_read_byte()

char* strncpy_P(char* dest, PGM_P src, size_t size) {
bool size_known = (size != SIZE_IRRELEVANT);
const char* read = src;
char* write = dest;
char ch = '.';
while (size > 0 && ch != '\0')
{
ch = pgm_read_byte(read++);
*write++ = ch;
size--;
}
if (size_known)
{
while (size > 0)
{
*write++ = 0;
size--;
}
}
return dest;
}

Not sure if the code has changed, as it appears to be some external code now?

@nomis
Copy link
Contributor Author

nomis commented Sep 29, 2019

I didn't realise none of the implementations were efficient :(

It would still be faster to use strncpy_P() if the output is a socket with TCP_NODELAY, otherwise it creates a lot of 1 byte packets.

@d-a-v
Copy link
Collaborator

d-a-v commented Sep 29, 2019

Try with nagle / without NODELAY, and with client.flush() after your prints.

@earlephilhower
Copy link
Collaborator

@TD-er , that's a very old version. The newlib-xtensa ones are significantly optimized and pretty close to RAM ones when possible: https://github.com/earlephilhower/newlib-xtensa/blob/xtensa-2_2_0-lock-arduino/newlib/libc/sys/xtensa/string_pgmspace.c.

In any case, larger chunks would be faster due to lower overhead. PRs always appreciated. :)

@nomis
Copy link
Contributor Author

nomis commented Sep 29, 2019

Try with nagle / without NODELAY, and with client.flush() after your prints.

flush() is part of Stream (input) not Print (output). You do not want to call it on a Client because it will discard all currently available input.

@d-a-v
Copy link
Collaborator

d-a-v commented Sep 29, 2019

You do want do call it because it forces data out, per arduino API.
https://www.arduino.cc/en/Reference.StreamFlush

And this is the way ::flush() is implemented in this repo's TCP (check WiFiClient, ClientContext).

And Stream:: inherits from Print::. So while Print:: is output only, Stream:: is I/O.

@nomis
Copy link
Contributor Author

nomis commented Sep 30, 2019

flush() is implemented here as "wait for the send buffer to empty" not "flush the send buffer into packets".

Look at the ESP32 implementation if you don't believe that it gets implemented as flushing input.

Print is output only, so it should have the flush() function. It's not present so it's effectively an input function.

@d-a-v
Copy link
Collaborator

d-a-v commented Sep 30, 2019

Quoting Serial.flush() (ref)

Waits for the transmission of outgoing serial data to complete. (Prior to Arduino 1.0, this instead removed any buffered incoming serial data.)

Quoting Stream::flush() (ref)

flush() clears the buffer once all outgoing characters have been sent.

While it is unclear what buffer it is about, Serial.flush() documentation explains it well.

In official arduino avr implementation, ::flush() is declared in Print:: like in this core.
Now concerning implementation in this core, WiFiClient::flush() follows Serial's one.

About esp32 implementation, I don't agree with it, and comments state that we have the good implementation. @me-no-dev might have more insights about that. I think this is an inheritance from Arduino 1.0.

Print is output only, so it should have the flush() function. It's not present so it's effectively an input function.

Well then. We can speak for hours, or you can just try whether WiFiClient.flush() fits your needs, or PR/fix like you proposed in OP.

@devyte
Copy link
Collaborator

devyte commented Nov 9, 2019

@earlephilhower @d-a-v what is the status here? is there anything to improve, or is the current code already reasonable?

@nomis
Copy link
Contributor Author

nomis commented Nov 10, 2019

@devyte The function still needs to be modified to write in blocks of 64 bytes.

It's not something I'm likely to get around to doing because I've implemented a general outgoing buffer in my application so that separate print calls can be merged too.

@earlephilhower
Copy link
Collaborator

@devyte current code is functionally correct, but does only 1 byte-sized writes when it could potentially be doing larger chunks. That's an optimization, yes, but not critical. If there's downtime it's worth doing, but it's not a high priority IMO.

@devyte devyte added this to the 2.7.0 milestone Nov 20, 2019
earlephilhower added a commit to earlephilhower/Arduino that referenced this issue Dec 9, 2019
Fixes esp8266#6524

Should help with speed of output when printing large flash strings to
things like a file or a TCP connection.

Use a 128 byte chunk in a temp buffer to send data using write(),
reducing the # of write calls by ~128x.
devyte pushed a commit that referenced this issue Dec 10, 2019
…r) (#6893)

Fixes #6524

Should help with speed of output when printing large flash strings to
things like a file or a TCP connection.

Use a 128 byte chunk in a temp buffer to send data using write(),
reducing the # of write calls by ~128x.
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants