1. Credits
was initially proposed by Izzy Muerte in [P1275]. Corentin Jabot and Aaron Ballman also proposed
an interface for accessing command line arguments outside
to WG14 in [N2948]. This paper borrows wording,
design elements, and good ideas from both.
2. Introduction
This paper aims to solve three problems: Encoding and portability problems with command line arguments, an interface for
accessing arguments outside of main, and a modern interface for accessing arguments. It does so by introducing a global
object with a modern and encoding-friendly interface.
Encoding: The only standard way to access command-line arguments in C++ is via
.
This is a staple of C and C++, however, it’s not well-suited for portable applications. The encoding of
varies
system to system [[What is the encoding of argv?]].
On Windows, the native encoding is UTF-16 and it’s recommended to use
instead of
for portable code. In
order to facilitate
, UTF-16 arguments must be converted using legacy windows code pages. The only correct ways to
handle command line arguments on Windows are platform-specific functions,
, or
. Even on Unix-based
systems, the encoding of
is not always clear. Tackling this problem more or less necessitates an interface
for accessing command line arguments independent of
as adding a new signature to
has been rejected by the
committee.
Access outside main: It’s often desirable to be able to access command line arguments outside of
and even do
so before
. Some examples could include including diagnostic information in a crash handler, some designs for a
command line argument parser, and cases where
is out of your control. A common case of this is testing frameworks. Currently command line arguments are only
available inside of
, requiring a programmer to manually pass arguments throughout the program or create their own
global storage for arguments. This can add clutter and introduce unecessary complexity, especially if argument handling
doesn’t happen "close" to
. There is precedent from other languages for this sort of capability, notably languages
such as Python, Go, Rust, Swift, Ruby, C#, Haskell, Ada, and many others provide an interface for accessing arguments
from anywhere in a program. Additionally, many C++ frameworks make arguments available outside
, such as QT with
.
Modernity: Passing arrays via a pointer and length argument is a very antiquated pattern rendered obsolete by modern
solutions such as
.
is the one case where, if the programmer wants to utilize command line arguments,
separate pointer and length arguments are still a requirement. A modern signature for
along the lines of
,
, or
was previously rejected by the committee due to concerns surrounding complexity, overhead, and encoding issues [P0781]. On top of new functionality and increased portability, a facility such as
provides a modern
C++ solution for accessing arguments. An important benefit to this interface is teachability: Currently
, if
command line arguments are desired, requires introduction to pointers relatively early on in education as well as
subjection to footguns and confusion about the difference between C strings and C++ strings. This adds steepness to an
already hazardously steep learning curve.
3. Previous Straw Polls and Discussion
Early polling surrounding an alternative to
/
and a means of accessing arguments outside of
occurred
during discussion of [P0781]:
POLL: A trivial library solution for iterating parameters?
SF F N A SA 2 12 14 2 1
POLL: A non--based way of fetching command line arguments?
main
SF F N A SA 7 9 9 1 2
Polls on [P1275] by LEWGI:
POLL: We should promise more committee time to the
part.
std :: arguments
Unanimous consent
Attendance: 11
POLL:should be available before main
std :: arguments Attendance: 11
SF F N A SA 6 0 3 1 0
Polls on [P1275] by SG16:
POLL:and
std :: environments should follow the precedent set by
std :: arguments .
std :: filesystem :: path Attendance: 14
SF F N A SA 4 6 1 0 2
POLL:and
std :: environment should return a bag-o-bytes and conversion is up to the user.
std :: arguments Attendance: 14
SF F N A SA 3 4 2 1 2
Key concerns discussed included mutability of arguments, overhead of initializing data structures before
, and how to handle different encodings.
4. Design
This paper introduces a global
object of type
and a header
.
in many ways mirrors the interface of a constant
. More specifically, excluding the
subview interface, modifiers,
, and
. This class is not copyable or movable and is intended to only be
constructed by the implementation.
has a
of
, which follows the precedent of
in
providing observers that can convert between encodings. SG16 indicated a desire to follow the path precedent and there
are a lot of similarities between the two cases: Both can be encoded arbitrarily or even have no encoding - paths could
be any sequence of bytes and command line arguments can be too.
is, itself, just a view and requires no
extra allocation or overhead. Implementations may choose to, for example, cache the result of
but this can be
done lazily.
While it is not uncommon practice to modify the contents of
,
is entirely read-only in
order to not introduce dangers surrounding global mutable state. Whether changes made to
are reflected in
is left to the implementation.
4.1. Future Interface Expansion
Author’s note: While most large applications should probably use a library for argument parsing, it is my hope that in
the case of more ad-hoc argument parsing it would be possible to portably write a check such as
or
. Another helpful operation would be
. Unfortunately, encoding makes it challenging to do operations such as this portably.
Because encoding will vary between systems and
is implementation-defined, currently the only way to do this
would involve the overhead of creating a string for a given encoding or an ugly macro to create a platform-dependent
string literal:
// The overhead here is unfortunate but OK for 99% of uses if ( std :: arguments . at ( 1 ). string () == "--help" ) { // ... } // or: #ifdef _WIN32 #define ARG(str) L##str #else #define ARG(str) str #endif if ( std :: arguments . at ( 1 ). native () == ARG ( "--help" )) { // ... }
A UDL could also be considered, however, this is a more general problem that, in the author’s opinion, should be addressed directly rather than through a bespoke solution. The problem of operations between strings of different encodings would best be tackled in another paper.
5. Implementability
On Windows,
could be implemented with
or
and
. On mac,
and
could be used. Implementation on Linux and other Unixes is more challenging as there is currently no
means at all to access
and
outside of or before
. Implementation here would probably require a
modification to libc to make
and
available similar to where
is set in
. [N2948] offered a reference implementation for this mechanism in a private glibc fork. Alternatively, if a change to libc imposes a substantial burden, an
implementation could save argc and argv in the program entry point.
6. Proposed Wording
Wording is relative to [N4950] and borrows extensively from existing wording.
Insert into [headers] table 24:
< arguments >
Insert a new section [arguments]:
Header < arguments >
synopsis [arguments.syn]
namespace std { class arguments_view ; class argument ; // [arguments.access] arguments access const arguments_view & arguments (); }
Arguments access [arguments.access]
Returns: A reference to an
object.
Throws: May throw
.
Class arguments_view
[arguments.view]
An
provides a random access interface for accessing arguments passed to the program.
All member functions of
have constant time complexity.
namespace std { class arguments_view { public : using value_type = argument ; using size_type = size_t ; using difference_type = ptrdiff_t ; using reference = value_type ; using const_reference = value_type ; using const_iterator = /* implementation-defined */ ; // see [arguments.view.iterators] using iterator = const_iterator ; using const_reverse_iterator = std :: reverse_iterator < const_iterator > ; using reverse_iterator = const_reverse_iterator ; arguments_view ( const arguments_view & ) = delete ; arguments_view & operator = ( const arguments_view & ) = delete ; // [arguments.view.access], access reference operator []( size_type index ) const noexcept ; reference at ( size_type index ) const ; // [arguments.view.obs], observers size_type size () const noexcept ; bool empty () const noexcept ; // [arguments.view.iterators], iterators const_iterator begin () const noexcept ; const_iterator end () const noexcept ; const_iterator cbegin () const noexcept ; const_iterator cend () const noexcept ; const_reverse_iterator rbegin () const noexcept ; const_reverse_iterator rend () const noexcept ; const_reverse_iterator crbegin () const noexcept ; const_reverse_iterator crend () const noexcept ; }; }
Access [arguments.view.access]
Preconditions:
is true
.
Returns: The argument at index
passed into the program from the environment. It is implementation-defined whether, in a
function with signature
, any modifications to
are reflected by
.
Throws: Nothing.
Effects: Equivalent to:
if
is true
.
Throws:
if
is true
.
Observers [arguments.view.obs]
Returns: The number of program argument.
Effects: Equivalent to:
.
Iterators [arguments.view.iterators]
The type models a constant
([iterator.concept.random.access]). Its value type is
and its reference type is
.
All requirements on container iterators ([container.reqmts]) apply to
as well.
Returns: An iterator referring to the first program argument. If
is true
, then it returns the same value
as
.
Returns: An iterator which is the past-the-end value.
Effects: Equivalent to:
.
Effects: Equivalent to:
.
Class argument
[arguments.argument]
An object of class
is a view of a character string argument passed to the program in an operating
system-dependent format.
It is implementation-defined whether, in a
function with signature
, any modifications to
are reflected by an
.
namespace std { class argument { public : using value_type = /* see below */ ; using string_type = basic_string < value_type > ; using string_view_type = basic_string_view < value_type > ; argument ( const argument & ) noexcept = default ; argument & operator = ( const argument & ) noexcept = default ; // [arguments.argument.native], native observers const string_view_type native () const noexcept ; const string_type native_string () const ; const value_type * c_str () const noexcept ; explicit operator string_type () const ; explicit operator string_view_type () const noexcept ; // [arguments.argument.obs], converting observers template < class EcharT , class traits = char_traits < EcharT > , class Allocator = allocator < EcharT >> basic_string < EcharT , traits , Allocator > string ( const Allocator & a = Allocator ()) const ; std :: string string () const ; std :: wstring wstring () const ; std :: u8string u8string () const ; std :: u16string u16string () const ; std :: u32string u32string () const ; // [arguments.argument.compare], comparison friend bool operator == ( const argument & lhs , const argument & rhs ) noexcept ; friend strong_ordering operator <=> ( const argument & lhs , const argument & rhs ) noexcept ; // [arguments.argument.ins], inserter template < class charT , class traits > friend basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& os , const argument & a ); }; // [arguments.argument.fmt], formatter template < typename charT > struct formatter < argument , charT > : formatter < argument :: string_view_type , charT > { template < class FormatContext > typename FormatContext :: iterator format ( const argument & argument , FormatContext & ctx ) const ; }; }
Conversion [arguments.argument.cvt]
The native encoding of an ordinary character string is the operating system dependent current encoding for arguments. The native encoding for wide character strings is the implementation-defined execution wide-character set encoding ([character.seq]).
For member functions returning strings, value type and encoding conversion is performed if the value type of the
argument or return value differs from
. For the return value, the method of conversion and the
encoding to be converted to is determined by its value type:
-
: The encoding is the native ordinary encoding. The method of conversion, if any, is operating system dependent.char -
: The encoding is the native wide encoding. The method of conversion is unspecified.wchar_t -
: The encoding is UTF-8. The method of conversion is unspecified.char8_t -
: The encoding is UTF-16. The method of conversion is unspecified.char16_t -
: The encoding is UTF-32. The method of conversion is unspecified.char32_t
If the encoding being converted to has no representation for source characters, the resulting converted characters, if any, are unspecified.
Native Observers [arguments.argument.native]
The string returned by all native observers is in the native default argument encoding ([arguments.argument.cvt]).
Returns: A
representing the argument.
Returns: A
representing the argument.
Returns: A pointer to a null-terminated array of
representing the argument.
Returns: A
representing the argument.
Returns: A
representing the argument.
Converting Observers [arguments.argument.obs]
Returns: A string representing the argument.
Remarks: All memory allocation, including for the return value, shall be performed by a. Conversion, if any, is specified by [arguments.argument.cvt].
Returns: A string representing the argument.
Remarks: Conversion, if any, is specified by [arguments.argument.cvt].
Comparison [arguments.view.compare]
Effects: Equivalent to:
.
Effects: Equivalent to:
.
Inserter [arguments.argument.ins]
Effects: Equivalent to:
.
Formatter [arguments.argument.fmt]
Effects: Equivalent to:
.