Skip to content

[SUGGESTION] Support for a "main v2" signature #262

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
aslilac opened this issue Mar 1, 2023 · 31 comments
Closed

[SUGGESTION] Support for a "main v2" signature #262

aslilac opened this issue Mar 1, 2023 · 31 comments

Comments

@aslilac
Copy link

aslilac commented Mar 1, 2023

The usual argc/argv signature requires familiarity some C idioms that don't really have broad influence on C++. An alternative signature could allow new learners to not be bogged down by the dubious char** and \0 termination, and would instead provide them with a clearer, more familiar data type, that would be less error and abuse-prone. Ideally it would only take a single argument, args, which would be a standard list container, holding some number of arguments. Perhaps main : (args: std::vector<std::string_view>) -> int.

Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?

I've seen some wild forms of argument parsing out there that abuse argc. The percentage would definitely be small, but I'm sure there are people out there misusing argc enough to have allowed a memory safety bugs to slip in.

Will your feature suggestion automate or eliminate X% of current C++ guidance literature?

It's largely agreed that passing a length and a C array is bad practice when writing your own functions, but you don't get to dictate the signature of main. Requiring main to follow an idiom that is otherwise frowned upon (and in a place that every C++ programmer is bound to come across!) raises a lot of questions about why things are this way, and why it's a bad idea to use a signature like main's elsewhere in your code. If main took something a little more idiomatic, those conversations can be kicked down the road until much later, instead of requiring they be addressed while trying to write a "Hello, world!"

Some references:

  • Jason Turner recently did a whole video that basically ended with the advice "you should probably be constructing a std::vector<std::string_view> immediately, and then only using that.
  • The SerenityOS project has a header file which can be included in any individual program that lets you write a serenity_main function with a nicer type signature, instead of writing the usual main function.

Describe alternatives you've considered.
Some potential types that could be used for args

  • std::array<_>
  • std::vector<_>
  • _<std::string>
  • _<std::string_view>
@aslilac aslilac changed the title [SUGGESTION] Support for a "main v2" signature [SUGGESTION] Support for a "main v2" signature Mar 1, 2023
@hsutter hsutter closed this as completed in b34f23a Mar 1, 2023
@hsutter
Copy link
Owner

hsutter commented Mar 1, 2023

Thanks, and thanks again to everyone else who has made this suggestion in past issues. This has been on my mind as well.

Yes, there's a safety aspect to this. Just as importantly, there's a simplicity aspect: Simplifying C++ includes addressing the things, even little things, that are constant annoyances and sources of friction and just distractions because we keep having to talk about them (and in many but not all cases have to teach them). That this keeps coming up shows that this is one of those things.

I've already started implementing this a couple of times, and backed it out only because string_view isn't guaranteed to be null-terminated. But I don't think that should hold this back anymore... after all, these string_views will be null-terminated because of how we got them.

With the above checkin, the following works:

main: (args) = {
    for args do :(arg) =
        std::cout << arg << "\n";
}

There can now be a single-parameter form of global main that defaults to deduced type (you can write : _ but you don't have to), and the type will be deduced to vector<string_view>. You can name the parameter whatever you want, but I like args... :)

If the programmer tries to write something like main: (inout x) or main: (x: int), they'll get this error:

error: when 'main' has a single parameter, that parameter should be declared as just:  main: (x)  - the type 'std::vector<std::string_view>' will be deduced

Thanks again, everyone!

@aslilac
Copy link
Author

aslilac commented Mar 1, 2023

Sorry for the duplicate! I tried to search for if someone had already suggested this and must've missed it! But very happy to see this change. :) Thanks Herb!

@filipsajdak
Copy link
Contributor

filipsajdak commented Mar 8, 2023

@hsutter, one question regarding the new main() syntax there are libraries that expect to process argc and argv arguments, e.g., command line options parsers. In my project I am using cxxopts and previously the following code worked:

main: (argc:int, argv:**char) -> int = {

    options : cxxopts::Options = ("execspec", "Executable specification - create docs that test your code");

    options.add_options()
        ("i,input",  "Input markdown",   cxxopts::value<std::string>())
        ("o,output", "Save output to",   cxxopts::value<std::string>())
        ("d,debug",  "Enable debugging", cxxopts::value<bool>()*.default_value("false"))
        ("html",     "Generate html",    cxxopts::value<bool>()*.default_value("false"))
        ("h,help",   "Print usage");

    result := options.parse(argc, argv); // <----- here the lib uses argc & argv... and have no other methods available

    if result.count("help") {
      std::cout << options.help() << std::endl;
      exit(0);
    }
// ...
}

Maybe we should provide a way to get argc & argv for some cases? I think it should not be the default, but for the sake of

backward source compatibility always available

I think we should provide it.

I am asking here to clarify if I should post a bug report or if is it a feature that I need to learn to live with.

@filipsajdak
Copy link
Contributor

I have realized that I can rewrite it to:

main: (args) -> int = {
// ...
    result := options.parse(argc, argv); // <---- argc & argv are available on the cpp1 side as implementation detail
// ...
}

It will generate the following cpp1 code:

[[nodiscard]] auto main(int argc, char **argv) -> int{
    auto args = cpp2::args(argc, argv); 
// ...
    auto result {CPP2_UFCS(parse, options, argc, argv)}; 
// ...
}

The question is if using implementation details is the solution we want to support.

@hsutter
Copy link
Owner

hsutter commented Mar 8, 2023

That's a good point...

The named parameter args gives you easy access to the value of argc by asking args.ssize(), but not to argv if you really really want an array of raw pointers.

But because under the covers cppfront generates the standard signature, you actually do have access to both argc and argv directly with those names. So this works right now:

main: (args) =
    std::cout << "argc is (argc)$, and argv[0] is (argv[0])$\n";

That said, I didn't actually realize that until you asked the question, so thanks! And I like it... it encourages the right thing (the visible named parameter is safe and the thing you naturally use), but writing a named parameter also makes argc and argv implicitly available and that strikes me as a useful convenience (i.e., a feature, not a bug). Bonus: And they are guaranteed to have those familiar names, whereas today those two parameters can be called whatever the programmer wants to name them, those names are just a convention today.

Speaking of which... one thing that I haven't done, but have sometimes considered, is requiring the name of the single parameter to actually be args (right now you can name it anything). Requiring it to be the name args removes a degree of freedom, but one that doesn't matter, so it arguably could be a "constraints are liberating" benefit in the sense of directing creativity toward the directions that matter, and it would guarantee the visual consistency (args, argc, argv).

@hsutter
Copy link
Owner

hsutter commented Mar 8, 2023

Ah, racing replies -- you saw it too. Yes, I think it's important to guarantee those names' availability because you gave a great motivating example showing that it's part of full compatibility with all existing C and C++ libraries, which rely on argc and argv.

What do you think of requiring the single-main-parameter name to be args?

@filipsajdak
Copy link
Contributor

Most of us were using argc and argv by convention. My first choice, nevertheless, would be args. So, from that point of view, requiring the name to be exactly args should not harm and would be aligned with what most of us will do. That will reduce complexity by reducing name variation.

@JohelEGP
Copy link
Contributor

JohelEGP commented Mar 8, 2023

I like the idea of making argc and argv available.

Instead of a std::vector of std::string_views for args, have you considered using a std::ranges::view, like lefticus/cpp_weekly#255? Perhaps even a new type, model of std::ranges::view, with accessors to the values of argc and argv. Then those names don't have to be available so as to not make main more special. Accessors like size and array-for-argv could be more readable than 3 similarly named variables.

It looks like the code at #262 (comment) would generate an unused args warning. If argc and argv are available, it'd make sense to make args [[maybe_unused]].

@filipsajdak
Copy link
Contributor

Last quick thought before going to sleep. Maybe we can do something like this: https://godbolt.org/z/s7bq33MT7

struct Args : std::vector<std::string_view> {
    Args(int c, char **v) : vector{(size_t)c}, argc{c}, argv{v} {}
    int argc = 0;
    char** argv = nullptr;
};

inline auto args(int argc, char **argv) -> Args {
    auto ret = Args{argc, argv};
    for (auto i = 0; i < argc; ++i) {
        ret[i] = std::string_view{argv[i]};
    }
    return ret;
}

args will behave like before but will give access to argc & argv without hiding them in implementation details.

@filipsajdak
Copy link
Contributor

Array requires the size to be known at compile time which is not the case for the main() function.

@AbhinavK00
Copy link

Sorry for my replies, realised they weren't correct so deleted both of them.

@hsutter
Copy link
Owner

hsutter commented Mar 8, 2023

@filipsajdak I like your implementation strategy, and I'll pursue the direction of providing args.argc and args.argv, and deliberately uglifying the actual main parameter names so that they shouldn't be used directly. Thanks!

(Fun fact: Generating the main parameter name __argc causes MSVC to generate an executable that crashes when you try to use that name to access the int. That's what can happen when you stray into the implementer's namespace... so I'll use something like argc_ and argv_ as the uglified names instead.)

@JohelEGP
Copy link
Contributor

JohelEGP commented Mar 8, 2023

argc_ and argv_ would still be accessible implementation details. If it's worth it, and at the expense of changing the value of __func__, you could wrap the statements generated from Cpp2's main into a non-capturing IILE.

Or maybe you could ban those names in Cpp2's main.

Perhaps I'm suggesting complications of unproven value.

@gregmarr
Copy link
Contributor

gregmarr commented Mar 8, 2023

I'm wondering, if the signature must be exactly the magic main: (args) -> int, can it also be the magic main: () -> int and args is just automatically provided in the rewritten version? I'm not sure it's a great idea. It seems like it's a bit weird to access args in the body when it's never declared anywhere, but wanted to at least discuss. It could also mean that a program that doesn't use command line arguments can pay for the setup if there's no way to disable it. We could maybe rely on the optimizer to eliminate the call. It does look like the code is pretty inexpensive to begin with.

@hsutter
Copy link
Owner

hsutter commented Mar 8, 2023

@gregmarr Yes, I had the same thought, but then I thought it would be inconsistent with removing magic... I'm deliberately making this explicit so it's not invisible magic, and for the same reason I think args should be explicit too, and then I also don't need to emit a [[maybe_unused]] on it in case I guessed wrong. So I like the current path of allowing both main: () and main: (args)... the programmer can express their intent conveniently about whether they want to look at the arguments or not.

@gregmarr
Copy link
Contributor

gregmarr commented Mar 8, 2023

Yes, I had the same thought, but then I thought it would be inconsistent with removing magic.

I think that's mainly why I wasn't sure it was a great idea. I'm glad for the discussion to bring out why it's not a great idea.

There is sill a little bit of magic if you require a specific variable name and that you must not specify its type, but main has always had magic on its parameter types.

I agree with everything you wrote about why having the two forms is a good idea.

I'm still a little unsure about the exact parameter name though, as that does seem like more magic than we had before.

@filipsajdak
Copy link
Contributor

@hsutter one more thing - there are other main() signatures that we might consider for portability reasons (https://learn.microsoft.com/en-us/cpp/cpp/main-function-command-line-args?view=msvc-170#the-envp-command-line-argument):

int main(int argc, char* argv[], char* envp[]);
int wmain(int argc, wchar_t* argv[], wchar_t* envp[]);

I like the idea to require the name of the argument to be exactly args. If we follow the same reasoning we could allow to add e.g. envs that could add char* envp[] to the main. Going further we can have separate names for wchar_t or we can use the same names for the argument but adjust the cpp1 code based on the name of the function:

  • main -> char,
  • wmain -> wchar_t

So, the cpp2 code can look like the following:

main: (args, envs) = {
//...
}

can generate:

auto main(int argc_, char* argv_[], char* envp_[]) -> int {
  [[maybe_unused]] auto args = Args{argc_, argv_};
  [[maybe_unused]] auto envs = Envs{envp_}; 
// ...
}

and for wchar_t:

wmain: (args, envs) = {
//...
}

can generate:

auto wmain(int argc_, wchar_t* argv_[], wchar_t* envp_[]) -> int {
  [[maybe_unused]] auto args = Args{argc_, argv_};
  [[maybe_unused]] auto envs = Envs{envp_}; 
// ...
}

I have created a prototype for Args & Envs (https://godbolt.org/z/3zd93edd3) that looks like the following:

template <typename CharT>
struct Args : std::vector<std::basic_string_view<CharT>> {
    using Super = std::vector<std::basic_string_view<CharT>>;
    Args(int c, CharT **v) : Super{(size_t)c}, argc{c}, argv{v} {
        for (auto i = 0; i < argc; ++i) {
            this->emplace_back(argv[i]);
        }
    }
    int const argc;
    CharT** const argv;
};

template <typename CharT>
struct Envs : std::vector<std::basic_string_view<CharT>> {
    Envs(CharT **v) : envp{v} {
        for (auto it = envp; *it; ++it) {
            this->emplace_back(*it);
        }
    }
    CharT** const envp;
};

@filipsajdak
Copy link
Contributor

@hsutter if you like the idea I can prepare a Pull request. Just let me know.

@hsutter
Copy link
Owner

hsutter commented Mar 8, 2023

Interesting ideas, thanks... Hmm. I worry about portability: I think wmain is Windows-only, and envp has wider support as your Godbolt example shows but is still not supported everywhere.

Here's my hot take:

Re wmain: I think this is the harder one that's less portable today, and would require invention (that might step on SG16's toes, and I'm not a Unicode expert). So I'm reluctant to try to invent something here, unless we see urgent requests from people adopting cppfront that not having it is an adoption obstacle (someday in the future when cppfront is actually ready to try adopting).

Re envp: The good news is that the real standard/portable solution for environment access is std::getenv, and that should work already. I just tried it to be sure:

main: () = std::cout << std::getenv("PATH");
    // ok: prints the expected big long string

Is that sufficient, or is there a need for a thin STL-style wrapper to allow convenient and safer usage like envs["PATH"] that yields a std::string_view? I might be interested in that. It seems like it would be a library that could go entirely in cpp2util.h and not require any actual cppfront compiler support.


The only WG21 proposal in this area I know of was P1275, which was encouraged but the author didn't come back with a revision to progress it. What's in cppfront now for args covers the first half of P1275 though, and std::getenv gives an interface.

@filipsajdak
Copy link
Contributor

I think that is sufficient. std::getenv for sure works. envs["PATH"] is also good idea to provide easy to use syntax - my only doubt is that there might be no PATH environment variable and operator[] usually expects to have requested element (that is why we have at() methods). Maybe that will confuse programers, but nevertheless I like its simplicity.

I post it to be sure that we have covered all possible cases regarding main().

@jcanizales
Copy link

and operator[] usually expects to have requested element (that is why we have at() methods).

It's the other way around, so should be OK 🙂 at() in the STL containers throws if the key isn't present. operator[]() instead would give you a default-constructed value, which in the case of reading environmental variables seems like the right choice.

hsutter added a commit that referenced this issue Mar 9, 2023
This is an omnibus commit of the last few evenings' changes. Primarily it was to start laying the groundwork for constructors, but it includes other fixes and closes a few issues.

Details:

- Begin infrastructure preparation for constructors

- Started creating navigation APIs to replace the low-level graph node chasing; this makes cppfront's own code cleaner and the tree easier to change if needed, but it's also a step toward a reflection API

- Extended `main:(args)` to require the name "args" for the single-parameter `main`, and to support `args.argc` and `args.argv`, further on #262 (see comment thread)

- Changed default return type for unnamed functions to be `-> void`, same as named functions, closes #257

- Disallow single-expression function body to be just `return`, closes #267

- Make `make_args` inline, closes #268

- Bug fix: emit prolog also for single-expression function body. Specifically, this enables main:(args)=expression; to work correctly. Generally, this also corrects the code gen for examples that were trying (but failing) to inject prologs in single-expression functions... in the regression tests this corrected the code gen for `pure2-forward-return.cpp2` which was missing the contract check before.
@gregmarr
Copy link
Contributor

gregmarr commented Mar 9, 2023

which in the case of reading environmental variables seems like the right choice.

Only if the value in the dictionary value is something like an optional string view, as it needs to preserve the difference between null and an empty string.

std::getenv:

Character string identifying the value of the environmental variable or null pointer if such variable is not found.

@jcanizales
Copy link

it needs to preserve the difference between null and an empty string

Is that a thing people do with env variables? 😩

@gregmarr
Copy link
Contributor

gregmarr commented Mar 9, 2023

Is that a thing people do with env variables?

Yes, sometimes it is enough to have a variable set, without any particular content.

@Sarcasm
Copy link

Sarcasm commented Mar 10, 2023

I wonder if allocation is really needed for args, maybe vector could be replaced by some similarly styled API that do not require an owning container, std::span<std::string_view> could work right?

argv is mutable I think, it's not uncommon to see some frameworks take argv as a parameter, extract the framework options and return an updated argv, e.g. Qt, instantiating a QApplication(int &argc, char **argv) will extract some general Qt parameters like -style Fusion and update argc and argv accordingly, so the application can just parse the remaining arguments as it pleases : https://doc.qt.io/qt-6/qapplication.html#QApplication

A useful property of char **environ that getenv/setenv do not provide, is that enumerating all the environment variables. This is sometimes useful, e.g. when spawning subprocesses.

@gregmarr
Copy link
Contributor

I wonder if allocation is really needed for args, maybe vector could be replaced by some similarly styled API that do not require an owning container, std::spanstd::string_view could work right?

No, you need something to own the std::string_view objects that are created.

@Sarcasm
Copy link

Sarcasm commented Mar 11, 2023

Right! 👍
I knew I was missing something...

@filipsajdak
Copy link
Contributor

It could be omitted using ranges. When ranges will be available on all platforms we can work on that further.

@filipsajdak
Copy link
Contributor

Because my apple-clang-14 has just provided support for ranges, I prepared a PR that uses ranges to avoid allocating std::vector - see: #380

@hsutter
Copy link
Owner

hsutter commented Apr 19, 2023

Because my apple-clang-14 has just provided support for ranges

Thanks! I worry about that newness though -- is it worth discarding compatibility with last month's Apple Clang to save a single allocation?

(will repeat this comment on the PR)

@filipsajdak
Copy link
Contributor

Thanks. Sorry for the trouble.

zaucy pushed a commit to zaucy/cppfront that referenced this issue Dec 5, 2023
zaucy pushed a commit to zaucy/cppfront that referenced this issue Dec 5, 2023
This is an omnibus commit of the last few evenings' changes. Primarily it was to start laying the groundwork for constructors, but it includes other fixes and closes a few issues.

Details:

- Begin infrastructure preparation for constructors

- Started creating navigation APIs to replace the low-level graph node chasing; this makes cppfront's own code cleaner and the tree easier to change if needed, but it's also a step toward a reflection API

- Extended `main:(args)` to require the name "args" for the single-parameter `main`, and to support `args.argc` and `args.argv`, further on hsutter#262 (see comment thread)

- Changed default return type for unnamed functions to be `-> void`, same as named functions, closes hsutter#257

- Disallow single-expression function body to be just `return`, closes hsutter#267

- Make `make_args` inline, closes hsutter#268

- Bug fix: emit prolog also for single-expression function body. Specifically, this enables main:(args)=expression; to work correctly. Generally, this also corrects the code gen for examples that were trying (but failing) to inject prologs in single-expression functions... in the regression tests this corrected the code gen for `pure2-forward-return.cpp2` which was missing the contract check before.
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

8 participants