PnnnnR0
std::arguments

Draft Proposal,

This version:
https://jeremy-rifkin.github.io/proposals/drafts/cpp/arguments-draft-1.html
Author:
Jeremy Rifkin
Audience:
SG17, LEWG
Project:
ISO/IEC 14882 Programming Languages — C++, ISO/IEC JTC1/SC22/WG21
Source:
https://github.com/jeremy-rifkin/proposals/blob/main/cpp/arguments.bs

Abstract

This paper proposes an encoding-friendly and modern interface for accessing command line arguments throughout a program.

1. Credits

std::arguments was initially proposed by Izzy Muerte in [P1275]. Corentin Jabot and Aaron Ballman also proposed an interface for accessing command line arguments outside main to WG14 in [N2948]. This paper borrows wording, design elements, and good ideas from both.

2. Introduction

This paper aims to solve three problems: Encoding and portability problems with command line arguments, an interface for accessing arguments outside of main, and a modern interface for accessing arguments. It does so by introducing a global std::arguments object with a modern and encoding-friendly interface.

Encoding: The only standard way to access command-line arguments in C++ is via int main(int argc, char** argv). This is a staple of C and C++, however, it’s not well-suited for portable applications. The encoding of argv varies system to system [[What is the encoding of argv?]]. On Windows, the native encoding is UTF-16 and it’s recommended to use wmain instead of main for portable code. In order to facilitate argv, UTF-16 arguments must be converted using legacy windows code pages. The only correct ways to handle command line arguments on Windows are platform-specific functions, WinMain, or wmain. Even on Unix-based systems, the encoding of char** argv is not always clear. Tackling this problem more or less necessitates an interface for accessing command line arguments independent of main as adding a new signature to main has been rejected by the committee.

Access outside main: It’s often desirable to be able to access command line arguments outside of main and even do so before main. Some examples could include including diagnostic information in a crash handler, some designs for a command line argument parser, and cases where main is out of your control. A common case of this is testing frameworks. Currently command line arguments are only available inside of main, requiring a programmer to manually pass arguments throughout the program or create their own global storage for arguments. This can add clutter and introduce unecessary complexity, especially if argument handling doesn’t happen "close" to main. There is precedent from other languages for this sort of capability, notably languages such as Python, Go, Rust, Swift, Ruby, C#, Haskell, Ada, and many others provide an interface for accessing arguments from anywhere in a program. Additionally, many C++ frameworks make arguments available outside main, such as QT with QCoreApplication::arguments.

Modernity: Passing arrays via a pointer and length argument is a very antiquated pattern rendered obsolete by modern solutions such as std::span. main is the one case where, if the programmer wants to utilize command line arguments, separate pointer and length arguments are still a requirement. A modern signature for main along the lines of int main(std::span<char*> argv), int main(std::span<std::string_view> argv), or int main(std::argument_list argv) was previously rejected by the committee due to concerns surrounding complexity, overhead, and encoding issues [P0781]. On top of new functionality and increased portability, a facility such as std::arguments provides a modern C++ solution for accessing arguments. An important benefit to this interface is teachability: Currently main, if command line arguments are desired, requires introduction to pointers relatively early on in education as well as subjection to footguns and confusion about the difference between C strings and C++ strings. This adds steepness to an already hazardously steep learning curve.

3. Previous Straw Polls and Discussion

Early polling surrounding an alternative to argc/argv and a means of accessing arguments outside of main occurred during discussion of [P0781]:

POLL: A trivial library solution for iterating parameters?
SF F N A SA
2 12 14 2 1
POLL: A non-main-based way of fetching command line arguments?
SF F N A SA
7 9 9 1 2

Polls on [P1275] by LEWGI:

POLL: We should promise more committee time to the std::arguments part.
Unanimous consent
Attendance: 11

POLL: std::arguments should be available before main
SF F N A SA
6 0 3 1 0
Attendance: 11

Polls on [P1275] by SG16:

POLL: std::environments and std::arguments should follow the precedent set by std::filesystem::path.
SF F N A SA
4 6 1 0 2
Attendance: 14
POLL: std::environment and std::arguments should return a bag-o-bytes and conversion is up to the user.
SF F N A SA
3 4 2 1 2
Attendance: 14

Key concerns discussed included mutability of arguments, overhead of initializing data structures before main, and how to handle different encodings.

4. Design

This paper introduces a global std::arguments object of type std::arguments_view and a header <arguments>.

std::arguments_view in many ways mirrors the interface of a constant std::span. More specifically, excluding the subview interface, modifiers, size_bytes, and data. This class is not copyable or movable and is intended to only be constructed by the implementation.

std::arguments_view has a value_type of std::argument, which follows the precedent of std::filesystem::path in providing observers that can convert between encodings. SG16 indicated a desire to follow the path precedent and there are a lot of similarities between the two cases: Both can be encoded arbitrarily or even have no encoding - paths could be any sequence of bytes and command line arguments can be too. std::argument is, itself, just a view and requires no extra allocation or overhead. Implementations may choose to, for example, cache the result of strlen but this can be done lazily.

While it is not uncommon practice to modify the contents of char** argv, std::arguments is entirely read-only in order to not introduce dangers surrounding global mutable state. Whether changes made to argv are reflected in std::arguments is left to the implementation.

4.1. Future Interface Expansion

Author’s note: While most large applications should probably use a library for argument parsing, it is my hope that in the case of more ad-hoc argument parsing it would be possible to portably write a check such as std::arguments.at(1) == "--help" or std::arguments.at(1).native() == "--help". Another helpful operation would be .starts_with("--"). Unfortunately, encoding makes it challenging to do operations such as this portably.

Because encoding will vary between systems and native() is implementation-defined, currently the only way to do this would involve the overhead of creating a string for a given encoding or an ugly macro to create a platform-dependent string literal:

// The overhead here is unfortunate but OK for 99% of uses
if(std::arguments.at(1).string() == "--help") {
  // ...
}

// or:

#ifdef _WIN32
#define ARG(str) L##str
#else
#define ARG(str) str
#endif
if(std::arguments.at(1).native() == ARG("--help")) {
  // ...
}

A UDL could also be considered, however, this is a more general problem that, in the author’s opinion, should be addressed directly rather than through a bespoke solution. The problem of operations between strings of different encodings would best be tackled in another paper.

5. Implementability

On Windows, std::arguments could be implemented with GetCommandLineW or __wargv and __argc. On mac, _NSGetArgv and _NSGetArgc could be used. Implementation on Linux and other Unixes is more challenging as there is currently no means at all to access argc and argv outside of or before main. Implementation here would probably require a modification to libc to make argc and argv available similar to where __environ is set in __libc_start_main. [N2948] offered a reference implementation for this mechanism in a private glibc fork. Alternatively, if a change to libc imposes a substantial burden, an implementation could save argc and argv in the program entry point.

6. Proposed Wording

Wording is relative to [N4950] and borrows extensively from existing wording.

Insert into [headers] table 24:

<arguments>

Insert a new section [arguments]:

Header <arguments> synopsis [arguments.syn]

namespace std {
  class arguments_view;
  class argument;

  // [arguments.access] arguments access
  const arguments_view& arguments();
}

Arguments access [arguments.access]

const arguments_view& arguments();

Returns: A reference to an arguments_view object.

Throws: May throw bad_alloc.

Class arguments_view [arguments.view]

An arguments_view provides a random access interface for accessing arguments passed to the program.

All member functions of arguments_view have constant time complexity.

namespace std {
  class arguments_view {
  public:
    using value_type = argument;
    using size_type = size_t;
    using difference_type = ptrdiff_t;
    using reference = value_type;
    using const_reference = value_type;
    using const_iterator = /* implementation-defined */; // see [arguments.view.iterators]
    using iterator = const_iterator;
    using const_reverse_iterator = std::reverse_iterator<const_iterator>;
    using reverse_iterator = const_reverse_iterator;

    arguments_view(const arguments_view&) = delete;
    arguments_view& operator=(const arguments_view&) = delete;

    // [arguments.view.access], access
    reference operator[](size_type index) const noexcept;
    reference at(size_type index) const;

    // [arguments.view.obs], observers
    size_type size() const noexcept;
    bool empty() const noexcept;

    // [arguments.view.iterators], iterators
    const_iterator begin() const noexcept;
    const_iterator end() const noexcept;

    const_iterator cbegin() const noexcept;
    const_iterator cend() const noexcept;

    const_reverse_iterator rbegin() const noexcept;
    const_reverse_iterator rend() const noexcept;

    const_reverse_iterator crbegin() const noexcept;
    const_reverse_iterator crend() const noexcept;
  };
}

Access [arguments.view.access]

value_type operator[](size_type index) const noexcept;

Preconditions: index < size() is true.

Returns: The argument at index index passed into the program from the environment. It is implementation-defined whether, in a main function with signature main(int argc, char** argv), any modifications to argv are reflected by arguments_view::operator[].

Throws: Nothing.

value_type at(size_type index) const;

Effects: Equivalent to: return operator[](index); if index >= size() is true.

Throws: out_of_range if index >= size() is true.

Observers [arguments.view.obs]

size_type size() const noexcept;

Returns: The number of program argument.

size_type empty() const noexcept;

Effects: Equivalent to: return size() == 0;.

Iterators [arguments.view.iterators]

using const_iterator = /* implementation-defined */;

The type models a constant random_access_iterator ([iterator.concept.random.access]). Its value type is value_type and its reference type is reference.

All requirements on container iterators ([container.reqmts]) apply to arguments_view::iterator as well.

const_iterator begin() const noexcept;
const_iterator cbegin() const noexcept;

Returns: An iterator referring to the first program argument. If empty() is true, then it returns the same value as end().

const_iterator end() const noexcept;
const_iterator cend() const noexcept;

Returns: An iterator which is the past-the-end value.

const_iterator rbegin() const noexcept;
const_iterator crbegin() const noexcept;

Effects: Equivalent to: return reverse_iterator(end());.

const_iterator rend() const noexcept;
const_iterator crend() const noexcept;

Effects: Equivalent to: return reverse_iterator(begin());.

Class argument [arguments.argument]

An object of class argument is a view of a character string argument passed to the program in an operating system-dependent format.

It is implementation-defined whether, in a main function with signature main(int argc, char** argv), any modifications to argv are reflected by an argument.

namespace std {
  class argument {
  public:
    using value_type  = /* see below */;
    using string_type = basic_string<value_type>;
    using string_view_type = basic_string_view<value_type>;

    argument(const argument&) noexcept = default;
    argument& operator=(const argument&) noexcept = default;

    // [arguments.argument.native], native observers
    const string_view_type native() const noexcept;
    const string_type      native_string() const;
    const value_type*      c_str() const noexcept;
    explicit operator string_type() const;
    explicit operator string_view_type() const noexcept;

    // [arguments.argument.obs], converting observers
    template<class EcharT, class traits = char_traits<EcharT>,
              class Allocator = allocator<EcharT>>
      basic_string<EcharT, traits, Allocator>
        string(const Allocator& a = Allocator()) const;
    std::string    string() const;
    std::wstring   wstring() const;
    std::u8string  u8string() const;
    std::u16string u16string() const;
    std::u32string u32string() const;

    // [arguments.argument.compare], comparison
    friend bool operator==(const argument& lhs, const argument& rhs) noexcept;
    friend strong_ordering operator<=>(const argument& lhs, const argument& rhs) noexcept;

    // [arguments.argument.ins], inserter
    template<class charT, class traits>
      friend basic_ostream<charT, traits>&
        operator<<(basic_ostream<charT, traits>& os, const argument& a);
  };

  // [arguments.argument.fmt], formatter
  template<typename charT>
    struct formatter<argument, charT>
      : formatter<argument::string_view_type, charT> {
        template<class FormatContext>
          typename FormatContext::iterator
            format(const argument& argument, FormatContext& ctx) const;
    };
}

Conversion [arguments.argument.cvt]

The native encoding of an ordinary character string is the operating system dependent current encoding for arguments. The native encoding for wide character strings is the implementation-defined execution wide-character set encoding ([character.seq]).

For member functions returning strings, value type and encoding conversion is performed if the value type of the argument or return value differs from argument::value_type. For the return value, the method of conversion and the encoding to be converted to is determined by its value type:

If the encoding being converted to has no representation for source characters, the resulting converted characters, if any, are unspecified.

Native Observers [arguments.argument.native]

The string returned by all native observers is in the native default argument encoding ([arguments.argument.cvt]).

const string_view_type native() const noexcept;

Returns: A string_view_type representing the argument.

const string_type native_string() const;

Returns: A string_type representing the argument.

const value_type* c_str() const noexcept;

Returns: A pointer to a null-terminated array of value_type representing the argument.

operator string_type() const;

Returns: A string_view_type representing the argument.

operator string_view_type() const noexcept;

Returns: A string_type representing the argument.

Converting Observers [arguments.argument.obs]

template<class EcharT, class traits = char_traits<EcharT>,
          class Allocator = allocator<EcharT>>
  basic_string<EcharT, traits, Allocator>
    string(const Allocator& a = Allocator()) const;

Returns: A string representing the argument.

Remarks: All memory allocation, including for the return value, shall be performed by a. Conversion, if any, is specified by [arguments.argument.cvt].

std::string string() const;
std::wstring wstring() const;
std::u8string u8string() const;
std::u16string u16string() const;
std::u32string u32string() const;

Returns: A string representing the argument.

Remarks: Conversion, if any, is specified by [arguments.argument.cvt].

Comparison [arguments.view.compare]

friend bool operator==(const argument& lhs, const argument& rhs) noexcept;

Effects: Equivalent to: return lhs.native() == rhs.native();.

friend strong_ordering operator<=>(const argument& lhs, const argument& rhs) noexcept;

Effects: Equivalent to: return lhs.native() <=> rhs.native();.

Inserter [arguments.argument.ins]

template<class charT, class traits>
  friend basic_ostream<charT, traits>&
    operator<<(basic_ostream<charT, traits>& os, const argument& a);

Effects: Equivalent to: return os << a.string<charT, traits>();.

Formatter [arguments.argument.fmt]

template<class FormatContext>
  typename FormatContext::iterator
    format(const argument& argument, FormatContext& ctx) const;

Effects: Equivalent to: return std::formatter<argument::string_view_type>::format(argument.string<charT, char_traits<charT>>(), ctx);.

References

Normative References

[N4950]
Thomas Köppe. Working Draft, Standard for Programming Language C++. 10 May 2023. URL: https://wg21.link/n4950

Informative References

[N2948]
Accessing the command line arguments outside of main(). URL: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2948.pdf
[P0781]
A Modern C++ Signature for main. URL: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0781r0.html
[P1275]
Desert Sessions: Improving hostile environment interactions. URL: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1275r0.html
[What is the encoding of argv?]
What is the encoding of argv?. URL: https://stackoverflow.com/questions/5408730/what-is-the-encoding-of-argv