1. Credits
was initially proposed by Izzy Muerte in [P1275]. Corentin Jabot and Aaron Ballman also proposed
an interface for accessing command line arguments outside
to WG14 in [N2948]. This paper borrows wording,
design elements, and good ideas from both.
2. Revision History
2.1. R1
Incorporated feedback from LEWGI regarding interface, wording, bags of bytes, bags of code points, etc.
-
Added path accessor
-
Added a means to construct from a user-specified list of arguments (useful for testing etc.)
-
Added feature test macro
-
Added reply-to email
-
Fixed a handful of problems with paper consistency and wording
3. Introduction
This paper aims to solve two main problems:
-
Encoding and portability problems with command line arguments
-
isn’t a modern-C++ way of representing information like this and is not friendly for novicesint argc , char ** argv
Encoding: The only standard means for accessing access command-line arguments in C++ is via
. This is a staple of C and C++, however, it’s not well-suited for portable applications because the
encoding of
varies system to system [[What is the encoding of argv?]].
On Windows, the native encoding is UTF-16 and it’s recommended to use
or
instead of
for
portable code. In order to facilitate the standard
, UTF-16 arguments must be converted using legacy
Windows code pages. Even on Unix-based systems the encoding of
is not always clear.
Modernity: Passing arrays via a pointer and length argument is a very antiquated pattern rendered obsolete by modern
solutions such as
.
is one of the last places in C++ where where separate pointer and length arguments
are still needed. A modern signature for
along the lines of
,
, or
was previously rejected by the
committee due to concerns surrounding complexity, overhead, and encoding issues [P0781]. An important benefit to a
modern interface is teachability: Currently command line arguments require introduction to pointers relatively early on
in education as well as subjection to footguns such as
and confusion about the difference between
C strings and C++ strings. This adds pitch to an already hazardously steep learning curve.
Tackling both of these problems requires a solution independent of
, as changes to
’s signature have
previously been rejected. As such, this paper proposes a
function which provides an object that can
be used for accessing command line arguments.
While it’s not a primary goal of this paper, this interface results in providing some additional helpful functionality:
Access outside main: In some cases it may be desirable to access command line arguments outside of
and even
to do so before
. Some examples could include:
-
Logging diagnostic information in a crash handler
-
Some designs for a command line argument parser
-
Diagnostic libraries
-
Access to arguments from pre-loadable libraries
Currently, command line arguments are only available inside of
which requires a programmer to manually pass this
information throughout the program or create their own global storage for arguments. This can add clutter and introduce
unnecessary complexity, especially if argument handling doesn’t happen "close" to
. There is precedent from other
languages for global access, notably languages such as Python, Go, Rust, Swift, Ruby, C#, Haskell, Ada, and many others
provide an interface for accessing arguments from anywhere in a program. Additionally, many C++ frameworks make
arguments available outside
, such as QT with
.
4. Previous Straw Polls and Discussion
Early polling surrounding an alternative to
/
and a means of accessing arguments outside of
occurred
during discussion of [P0781]:
POLL: A trivial library solution for iterating parameters?
SF F N A SA 2 12 14 2 1
POLL: A non--based way of fetching command line arguments?
main
SF F N A SA 7 9 9 1 2
Polls on [P1275] by LEWGI:
POLL: We should promise more committee time to the
part.
std :: arguments
Unanimous consent
Attendance: 11
POLL:should be available before main
std :: arguments Attendance: 11
SF F N A SA 6 0 3 1 0
Polls on [P1275] by SG16:
POLL:and
std :: environments should follow the precedent set by
std :: arguments .
std :: filesystem :: path Attendance: 14
SF F N A SA 4 6 1 0 2
POLL:and
std :: environment should return a bag-o-bytes and conversion is up to the user.
std :: arguments Attendance: 14
SF F N A SA 3 4 2 1 2
Key concerns discussed included mutability of arguments, overhead of initializing data structures before
, and how to handle different encodings.
LEWGI discussion on P3474R0:
-
Continued support for bag of bytes with conversion up to user
-
Easily accessible conversions are very important, it’s easy to make errors
-
Support for user-provided arguments through this interface
-
Suggestion for path accessors
-
Some discussion on low and zero-overhead approaches for this
5. Implementability
On Windows, command line arguments can be accessed by
. This function returns the command line as a
string which must then be tokenized. This is called by the Windows CRT during startup to populate
for main. The
Windows CRT also provide
and
global variables but only populates one depending on
.
Additionally, neither may be populated if the command line parsing is disabled via options tailored to applications
trying to minimize startup time.
On MacOS,
and
can be used to access
and
outside of
. These are both
trivial functions that don’t allocate.
Implementation on other Unix-based systems is more challenging. There are four options:
-
Modify libc to store
andargv
globally, e.g.argc
and__argc
, similar to__argv
. (reference implementation for this from N2948).__environ -
Alternatively, store
andargc
from the program’s entry point. This would only require compiler support instead of a libc change.argv -
Use
which exists in glibc. Unfortunately, absent a glibc change, looping through__dl_argv
would be needed to determine__dl_argv
asargc
is hidden.__dl_argc -
Read from and tokenize
. (this has length limitations)/ proc / self / cmdline -
Use
andargc
passed by glibc to entries in theargv
.. init_array
Approaches 2-4 are undesirable for various reasons. Approach 5 works on glibc but not necessarily other libcs and there are also implications with shared libraries. Approach 1 is most comprehensive but requires a paper.
6. Proposed Design
This paper proposes a function
, a class
, and a header,
.
returns a span of
s corresponding to the program command line arguments.
mirrors the design of
by providing observers that can convert to desired
encodings. SG16 previously indicated a desire to follow the precedent of
. Both paths and
arguments can be encoded arbitrarily or even have no encoding; paths could be any sequence of bytes and command line
arguments can be too.
may be a view of a string or may own an allocation.
While it is not uncommon practice to modify the contents of
,
returns a read-only span in
order to not introduce dangers surrounding global mutable state. Whether changes made to
in
are reflected
in
is implementation-defined.
6.1. Design Considerations
The main design considerations come down to allocation, when potential tokenization or other argument preprocessing
happens, and whether modifications to
in
are reflected in
.
Reflecting
modifications from
: It is desirable for
to contain the same values
throughout the lifetime of a program and to not reflect changes to
in
. Unfortunately, this would require
allocation and copying on some systems. On Unix-based systems all means to access
will
reflect changes to
in
, including
. Discussion on [P1275] and [P0781] made clear
that any overhead before
in the case of programs that don’t use
is unacceptable. Unfortunately,
an initializer similar to
isn’t an option due to shared libraries not necessarily being loaded
before
. Additionally, with
this would translate to overhead before
that is not pay for what
you use. Due to implementations challenges, this paper leaves behavior implementation-defined in the case of
being modified in
.
Saving
: On Unix-based systems, producing string views for arguments will involve a
. It
may be desirable to save the result of this computation, however, the issue of modification mostly rules this out.
While the storage for the arguments from the system will always be there, the pointers in
could be modified and
detecting this would be sufficiently complicated, involve overhead, or in general may be impossible. Because of this,
every access of an argument string view will require a
unless the implementation makes copies of
string
entries. It would likely be undesirable to make it undefined behavior to use
after modifications in
so this paper leaves the possibility of a strlen cost open.
Preprocessing: On Windows
will return a string which needs to be split into individual
arguments. It may be desirable in some use-cases to only split this string lazily with an input-iterator interface for
arguments. This paper does not suggest any design constrained to input-iteration, though, as much use will want more
general access and iteration abilities and will require having tokenized all arguments anyway - whether by looping
through all the arguments or even just looking at the argument count.
Backing storage for
s: On Unix-based systems it would be simple for
to not
involve any allocation and simply provide iterators over
that dereference to ephemeral
objects.
Unfortunately, this would prevent the iterator from satisfying the Cpp17RandomAccessIterator requirements, container
requirements, and may be error prone in the case of trying to store a reference to a
. The proposed
requirements here will require backing storage.
Global singleton, a function returning a span, or construction: TODO
could be implemented as a
global singleton similar to
, a
function returning a reference to a singleton, or as an
object that the user constructs. While an object the user constructs potentially results in allocation at multiple
points in a program, as well as possibly seeing different values if
is modified in
, it’s also desirable to
allow the
allocation to be cleaned up. As such, this paper proposes a
class which may
perform allocation and various preprocessing at construction.
Globs and
: On Unix-based systems glob expansion is done by the shell. On Windows it is neither done by
the shell or the Windows CRT. This paper proposes
should correspond directly to
in
without any additional glob expansion. This paper also does not propose any special handling for the first entry of
.
Comparison with other performance-oriented languages: Rust’s
function creates an
object
which involves creating a vector of strings in the OS native encoding, copying from
on Unix-based systems and
tokenizing on Windows. Rust accesses
and
on most Unix-based systems by placing an initializer in the
. Rust doesn’t have to worry about modification of
in
.
Because the design of this library feature involves a lot of tradeoffs, it is the goal of this paper to offer as much implementation flexibility as possible.
7. Ergonomics
While most large applications should probably use a library for argument parsing, it is my hope that in the case of more
ad-hoc argument parsing it would be possible to portably write a check such as
or
. Another helpful operation would be
. Unfortunately,
encoding makes these operations challenging to do portably.
Because encoding will vary between systems and
is implementation-defined, currently the only way to do this
would involve the overhead of creating a string for a given encoding or an ugly macro to create a platform-dependent
string literal:
// The overhead here is unfortunate but OK for 99% of uses if ( std :: arguments (). at ( 1 ). string () == "--help" ) { // ... }
ok
#ifdef _WIN32 #define ARG(str) L##str #else #define ARG(str) str #endif if ( std :: arguments (). at ( 1 ). native () == ARG ( "--help" )) { // ... }
A UDL could also be considered, however, this is a more general problem that, in the author’s opinion, should be addressed directly rather than through a bespoke solution. The problem of operations between strings of different encodings would best be tackled in another paper.
Alternatively, since this paper is targeting C++29 maybe transcoding[P2728] will solve all problems:
if ( std :: arguments (). at ( 1 ) | std :: uc :: to_utf8 == u8"--help" ) { // ... }
If transcoding is seen as desirable here,
should provide some helper to do the system encoding to utf-N
conversion
8. Bikeshedding
This paper uses the
naming from [P1275], however, the name is subject to bikeshedding. One point
brought up on the mailing list was that
is a very generic name and it might be desirable to reserve it for
future use. Some names that could be considered instead include:
-
std :: program_arguments -
std :: command_line -
std :: command_line :: arguments -
std :: program_options -
std :: argv -
std :: process : arguments
Naming in other notable languages:
-
Python:
sys . argv -
Go:
os . Args () -
Rust
std :: env :: args () -
Swift:
CommandLine . arguments -
Ruby:
ARGV -
C#:
Environment . GetCommandLineArgs () -
Haskell:
getArgs () -
Ada:
Ada . Command_Line . Argument
In a very informal approval-voting-style poll on the Together C & C++ Discord server (participants were asked to vote
for all they found appealing) members showed a strong preference for either
or
with eight
and 17 votes respectively. Other options had no more than two votes. N.b.: The last option,
,
came up after the poll was started and thus wasn’t captured in the poll.
9. Reference Implementation
TODO: Update for R1
A reference implementation / proof of concept is at https://github.com/jeremy-rifkin/arguments.
10. Proposed Wording
Wording is relative to [N4950] and borrows extensively from existing wording.
Insert into [headers] table 24:
< arguments >
Insert into [version.syn]:
#define __cpp_lib_arguments 20????L // freestanding, also in <arguments>
Insert a new section [arguments]:
Header < arguments >
synopsis [arguments.syn]
namespace std { class argument ; span < const argument > arguments (); template < class Allocator = allocator < argument >> span < const argument > arguments ( const Allocator & ); }
Function arguments
[arguments.arguments]
The function and function template
return read-only spans of
objects corresponding to
arguments passed to the program.
namespace std { span < const argument > arguments (); template < class Allocator = allocator < argument >> span < const argument > arguments ( const Allocator & ); }
Effects: Returns a span of
objects representing the program’s arguments.
Throws: May throw if allocation throws.
Effects: Returns a span of
objects representing the program’s arguments.
Throws: May throw if
throws.
Class argument
[arguments.argument]
An object of class
is a view of a character string argument passed to the program in an operating
system-dependent format.
It is implementation-defined whether, in a
function with signature
, any modifications to
are reflected by an
.
namespace std { class argument { public : using value_type = /* see below */ ; using string_type = basic_string < value_type > ; using string_view_type = basic_string_view < value_type > ; // [arguments.argument.native], native observers const string_view_type native () const noexcept ; const string_type native_string () const ; const value_type * c_str () const noexcept ; explicit operator string_type () const ; explicit operator string_view_type () const noexcept ; // [arguments.argument.obs], converting observers template < class EcharT , class traits = char_traits < EcharT > , class Allocator = allocator < EcharT >> basic_string < EcharT , traits , Allocator > string ( const Allocator & a = Allocator ()) const ; std :: string string () const ; std :: wstring wstring () const ; std :: u8string u8string () const ; std :: u16string u16string () const ; std :: u32string u32string () const ; filesystem :: path path () const ; // [arguments.argument.compare], comparison friend bool operator == ( const argument & lhs , const argument & rhs ) noexcept ; friend strong_ordering operator <=> ( const argument & lhs , const argument & rhs ) noexcept ; // [arguments.argument.ins], inserter template < class charT , class traits > friend basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& os , const argument & a ); }; // [arguments.argument.fmt], formatter template < typename charT > struct formatter < argument , charT > : formatter < argument :: string_view_type , charT > { template < class FormatContext > typename FormatContext :: iterator format ( const argument & argument , FormatContext & ctx ) const ; }; }
Conversion [arguments.argument.cvt]
The native encoding of an ordinary character string is the operating system dependent current encoding for arguments. The native encoding for wide character strings is the implementation-defined execution wide-character set encoding ([character.seq]).
For member functions returning strings, value type and encoding conversion is performed if the value type of the
argument or return value differs from
. For the return value, the method of conversion and the
encoding to be converted to is determined by its value type:
-
: The encoding is the native ordinary encoding. The method of conversion, if any, is operating system dependent.char -
: The encoding is the native wide encoding. The method of conversion is unspecified.wchar_t -
: The encoding is UTF-8. The method of conversion is unspecified.char8_t -
: The encoding is UTF-16. The method of conversion is unspecified.char16_t -
: The encoding is UTF-32. The method of conversion is unspecified.char32_t
If the encoding being converted to has no representation for source characters, the resulting converted characters, if any, are unspecified.
Native Observers [arguments.argument.native]
The string returned by all native observers is in the native default argument encoding ([arguments.argument.cvt]).
Returns: A
representing the argument.
Returns: A
representing the argument.
Returns: A pointer to a null-terminated array of
representing the argument.
Returns: A
representing the argument.
Returns: A
representing the argument.
Converting Observers [arguments.argument.obs]
Returns: A string representing the argument.
Remarks: All memory allocation, including for the return value, shall be performed by a. Conversion, if any, is specified by [arguments.argument.cvt].
Returns: A string representing the argument.
Remarks: Conversion, if any, is specified by [arguments.argument.cvt].
Returns: A
corresponding to the argument.
Comparison [arguments.view.compare]
Effects: Equivalent to:
.
Effects: Equivalent to:
.
Inserter [arguments.argument.ins]
Effects: Equivalent to:
.
Formatter [arguments.argument.fmt]
Effects: Equivalent to:
.