Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make each .cpp/.h file self-sufficient in #includes #72

Merged
merged 9 commits into from
Jun 30, 2024

Conversation

generalmimon
Copy link
Member

Until now, some parts of the code used symbols from the standard library without including the corresponding standard library header using #include to ensure that the symbol is defined. That is, they silently assumed that the header has been already included by "someone else".

These assumptions were sometimes met in our code due to transitive inclusions. For example, the missing #include <stdexcept> in tests/unittest.cpp would not turn into a compile error, because tests/unittest.cpp included kaitai/exceptions.h, which contains #include <stdexcept>. However, it is discouraged to rely on transitive inclusions, see https://google.github.io/styleguide/cppguide.html#Include_What_You_Use:

Do not rely on transitive inclusions. This allows people to remove no-longer-needed #include statements from their headers without breaking clients. This also applies to related headers - foo.cc should include bar.h if it uses a symbol from it even if foo.h includes bar.h.

Instances of this problem in our code aren't limited to "maintainability issues" - they can also reduce portability. For instance, @roelschroeven apparently ran into a strtoll not found error in our kaitai/kaitaistream.cpp source file that he had to patch like this (see roelschroeven@db668fa):

diff --git a/kaitai/kaitaistream.cpp b/kaitai/kaitaistream.cpp
index a3810ab..4d5ded2 100644
--- a/kaitai/kaitaistream.cpp
+++ b/kaitai/kaitaistream.cpp
@@ -6,6 +6,7 @@
 #include <iostream>
 #include <vector>
 #include <stdexcept>
+#include <cstdlib>

 // ========================================================================
 // Integer from raw data with correct endianness
@@ -585,7 +586,7 @@ int64_t kaitai::kstream::string_to_int(const std::string& str, int base) {
     char *str_end;

     errno = 0;
-    int64_t res = strtoll(str.c_str(), &str_end, base);
+    int64_t res = std::strtoll(str.c_str(), &str_end, base);

     // Check for successful conversion and throw an exception if the entire string was not converted
     if (str_end != str.c_str() + str.size()) {

This makes sense according to https://en.cppreference.com/w/cpp/string/byte/strtol - we shouldn't assume that std::strtoll exists when we don't have #include <cstdlib> anywhere in our code. It worked fine without it in CI and on my local gcc 12.3.0 only because some standard library header that we include happened to use <cstdlib> as part of its implementation. For curiosity, I used this command locally to dump a tree that shows why each header was #included:

c++ -DKS_STR_ENCODING_ICONV -DKS_ZLIB -I. -std=c++11 -H -MM ./kaitai/kaitaistream.cpp 2> kaitaistream-deps.log

So for example, we can see that on the gcc 12.3, the <algorithm> header includes <cstdlib> for some reason, so that's one of the reasons there's no compile error if we forget #include <cstdlib> in kaitaistream.cpp:

. /usr/include/c++/12/algorithm
.. /usr/include/c++/12/bits/stl_algo.h
... /usr/include/c++/12/cstdlib

This is completely implementation-specific, of course. There's nothing in the C++ standard that requires <algorithm> to include <cstdlib>, and in other implementations this may not be the case at all.

See also https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#sf10-avoid-dependencies-on-implicitly-included-names

@GreyCat
Copy link
Member

GreyCat commented Apr 30, 2024

Very generally, I believe you're actually touching 2 connected, but separate problems here:

  • Unoptimized / incomplete inclusion of headers
  • Usage of C functions (strtoll) in C++ code, where C++ equivalents exist (std::strtoll)

I don't mind either of these changes, but what I ideally want us to have is reproduction of any kind of problem in CI. I wonder if @roelschroeven can share details of their particular compilation which runs into problems, so we can add it here?

generalmimon added a commit that referenced this pull request Apr 30, 2024
See #72 (comment)

The standard library header `<type_traits>` is only available since
C++11 (see https://en.cppreference.com/w/cpp/header/type_traits), so we
must not try to include it in C++98 mode.
@generalmimon
Copy link
Member Author

generalmimon commented Apr 30, 2024

@GreyCat:

I don't mind either of these changes, but what I ideally want us to have is reproduction of any kind of problem in CI.

I agree. I don't think we can realistically get our hands on a set of compiler/platform combinations in CI that would be able to reliably detect whether our set of #includes is sufficient (what I mean by this is that adding @roelschroeven's environment helps a bit, but is not a general solution for missing #includes), but we can get https://github.com/include-what-you-use/include-what-you-use running in CI. This seems to be just the tool we need.

After installing it using sudo apt install iwyu (see https://packages.ubuntu.com/jammy/iwyu), it didn't quite work yet (I was encountering include-what-you-use/include-what-you-use#679), but after resolving this issue with sudo apt install clang-13, this is the output I'm getting locally at 69cb210 (i.e. after this PR's fixes):

pp@DESKTOP-89OPGF3:/mnt/c/temp/kaitai_struct/runtime/cpp_stl$ include-what-you-use -Xiwyu --verbose=3 -I. -DKS_STR_ENCODING_ICONV -DKS_ZLIB -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Werror ./kaitai/kaitaistream.cpp

kaitai/kaitaistream.h should add these lines:

kaitai/kaitaistream.h should remove these lines:
- #include <istream>  // lines 15-15
- #include <sstream>  // lines 17-17

The full include-list for kaitai/kaitaistream.h:
#include <stdint.h>     // for uint64_t, uint8_t, int64_t, int16_t, int32_t, uint16_t, uint32_t, int8_t
#include <ios>          // for istream, istringstream, streamsize
#include <limits>       // for numeric_limits
#include <string>       // for string
#include <type_traits>  // for enable_if, is_integral
---
kaitai/kaitaistream.cpp:553:42: warning: Bytef is defined in <zconf.h>, which isn't directly #included.

kaitai/kaitaistream.cpp should add these lines:
#include <zconf.h>              // for Bytef

kaitai/kaitaistream.cpp should remove these lines:
- #include <cstddef>  // lines 39-39
- #include <sstream>  // lines 43-43

The full include-list for kaitai/kaitaistream.cpp:
#include <kaitai/kaitaistream.h>
#include <byteswap.h>           // for bswap_32, bswap_64, bswap_16
#include <endian.h>             // for __BYTE_ORDER, __BIG_ENDIAN, __LITTLE_ENDIAN
#include <iconv.h>              // for iconv, iconv_close, iconv_open, iconv_t
#include <kaitai/exceptions.h>  // for bytes_to_str_error, illegal_seq_in_encoding, unknown_encoding
#include <stdint.h>             // for uint64_t, uint8_t, int64_t, uint32_t, int16_t, int32_t, uint16_t, int8_t
#include <zconf.h>              // for Bytef
#include <zlib.h>               // for z_stream, Z_NULL, Z_OK, inflate, inflateEnd, Z_STREAM_END, inflateInit
#include <algorithm>            // for reverse
#include <cerrno>               // for errno, EINVAL, E2BIG, EILSEQ, ERANGE
#include <cstdlib>              // for size_t, strtoll
#include <ios>                  // for istream, operator|, ostringstream, basic_ios::clear, fpos, istringstream, streamsize, stringstream
#include <istream>              // for basic_istream::read, operator<<, basic_istream::seekg, basic_istream::tellg, basic_istream::get, basic_istream<>::pos_type, basic_ostream, basic_ostream::operator<<, basic_istream::unget, basic_ostream<>::__ostream_type
#include <stdexcept>            // for runtime_error, invalid_argument, out_of_range
#include <string>               // for string, basic_string, getline, operator!=, basic_string<>::const_iterator, char_traits, char_traits<>::pos_type
#include <vector>               // for vector
---
  1. As you can see, the only #include that it considers missing is #include <zconf.h>, which I think is somewhat unimportant because zlib is a specific library that has only one implementation (so the argument about various implementations of standard library headers doesn't apply) and <zlib.h> that we use includes "zconf.h". And BTW, tests/examples in the official zlib repository also refer to Bytef without including zconf.h, e.g. https://github.com/madler/zlib/blob/0f51fb4933fc9ce18199cb2554dacea8033e7fd3/test/example.c#L71.

    It probably doesn't hurt to include it anyway, it just doesn't bring much benefit in my view, but of course this tool can't know that.

  2. It seems to have a problem recognizing that <sstream> is needed - it claims that it can be removed from both kaitaistream.cpp and kaitaistream.h:

    /mnt/c/temp/kaitai_struct/runtime/cpp_stl/kaitai/kaitaistream.h should remove these lines:
    ...
    - #include <sstream>  // lines 17-17
    
    kaitai/kaitaistream.cpp should remove these lines:
    ...
    - #include <sstream>  // lines 43-43
    

    However, this is not true - if you remove these #include <sstream> lines, you'll get these compile errors, proving that they are in fact necessary:

    pp@DESKTOP-89OPGF3:/mnt/c/temp/kaitai_struct/runtime/cpp_stl$ .build/build -DCMAKE_CXX_STANDARD=11 -DCMAKE_CXX_STANDARD_REQUIRED=ON -DCMAKE_CXX_EXTENSIONS=OFF > /dev/null
    In file included from /mnt/c/temp/kaitai_struct/runtime/cpp_stl/kaitai/kaitaistream.cpp:1:
    /mnt/c/temp/kaitai_struct/runtime/cpp_stl/kaitai/kaitaistream.h:342:24: error: field ‘m_io_str’ has incomplete type ‘std::istringstream’ {aka ‘std::__cxx11::basic_istringstream<char>’}
      342 |     std::istringstream m_io_str;
          |                        ^~~~~~~~
    In file included from /usr/include/c++/12/ios:38,
                     from /mnt/c/temp/kaitai_struct/runtime/cpp_stl/kaitai/kaitaistream.h:14:
    /usr/include/c++/12/iosfwd:100:11: note: declaration of ‘std::istringstream’ {aka ‘class std::__cxx11::basic_istringstream<char>’}
      100 |     class basic_istringstream;
          |           ^~~~~~~~~~~~~~~~~~~
    /mnt/c/temp/kaitai_struct/runtime/cpp_stl/kaitai/kaitaistream.cpp: In static member function ‘static std::string kaitai::kstream::process_zlib(std::string)’:
    /mnt/c/temp/kaitai_struct/runtime/cpp_stl/kaitai/kaitaistream.cpp:534:23: error: aggregate ‘std::stringstream dst_strm’ has incomplete type and cannot be defined
      534 |     std::stringstream dst_strm;
          |                       ^~~~~~~~
    /mnt/c/temp/kaitai_struct/runtime/cpp_stl/kaitai/kaitaistream.cpp:563:28: error: aggregate ‘std::ostringstream exc_msg’ has incomplete type and cannot be defined
      563 |         std::ostringstream exc_msg;
          |                            ^~~~~~~
    gmake[2]: *** [CMakeFiles/kaitai_struct_cpp_stl_runtime.dir/build.make:76: CMakeFiles/kaitai_struct_cpp_stl_runtime.dir/kaitai/kaitaistream.cpp.o] Error 1
    gmake[1]: *** [CMakeFiles/Makefile2:100: CMakeFiles/kaitai_struct_cpp_stl_runtime.dir/all] Error 2
    gmake: *** [Makefile:146: all] Error 2

    Apparently, it is confused because it knows that <ios> includes <iosfwd> (see https://en.cppreference.com/w/cpp/header/ios#Includes), which contains forward declarations for a lot of stuff including std::stringstream and std::ostringstream, but it doesn't know that for these usages forward declarations are not enough and a full type definition is required.

    So it's just a tool and can fail - we should still verify its suggestions. But it's possible that they have already fixed this bug, include-what-you-use --version shows that I'm using version 0.17 and the latest version is 0.22.

  3. It correctly points out that #include <cstddef> is redundant when we already include <cstdlib> which also defines size_t:

    kaitai/kaitaistream.cpp should remove these lines:
    - #include <cstddef>  // lines 39-39
    ...
    
    The full include-list for kaitai/kaitaistream.cpp:
    ...
    #include <cstdlib>              // for size_t, strtoll
  4. It is technically correct that we don't necessarily have to include <istream>:

    kaitai/kaitaistream.h should remove these lines:
    - #include <istream>  // lines 15-15

    Since kaitaistream.h only refers to std::istream behind a pointer:

    kstream(std::istream* io);

    ... we don't really need the full definition of std::istream - a forward declaration is enough, and #include <ios> (that we already need to get std::streamsize) provides it, since it includes <iosfwd>.

@generalmimon
Copy link
Member Author

@GreyCat Please take another look. I added include-what-you-use to CI and fixed the "violations".

I only took the easy way to install it - https://packages.ubuntu.com/jammy/iwyu. That means we're using the version 0.17, which is somewhat dated (the latest version is 0.22 at the time of writing), but it's the latest version we can get from the Ubuntu package registry for our distribution, which is Ubuntu 22.04 "jammy". Installing a later version would be more complicated - I guess we'd have to build it from source. On top of that, since Clang 15 is the latest version we can get on "jammy", and 0.19 is the latest IWYU compatible with Clang 15, it wouldn't be much of an update (I mean it's better than 0.17, but still a few versions below 0.22) unless we figured out some way to install a newer Clang as well (it seems https://apt.llvm.org/ could help with that, but it obviously still requires some effort to make it all work).

include-what-you-use is supposedly also able to work on Windows (I saw some mentions of that in issues), so in theory we could also run it in the windows CI job, but I don't know of a better way to install it than to build it from source, so I'm not super eager to do that right now. It likely wouldn't be that useful anyway, except for Windows-specific includes, but the only one we have is <windows.h>.

@generalmimon
Copy link
Member Author

generalmimon commented May 3, 2024

I should note that IWYU suggests some changes in unittest.cpp even after the fixes in cd14531:

/home/runner/work/kaitai_struct_cpp_stl_runtime/kaitai_struct_cpp_stl_runtime/tests/unittest.cpp should add these lines:
#include <gtest/gtest-message.h>    // for Message
#include <gtest/gtest-test-part.h>  // for TestPartResult
#include "gtest/gtest_pred_impl.h"  // for Test, SuiteApiResolver, EXPECT_EQ, TEST, TestFactoryImpl, CmpHelperFloatingPointEQ, FAIL, InitGoogleTest, RUN_ALL_TESTS, EXPECT_DOUBLE_EQ, EXPECT_FLOAT_EQ

/home/runner/work/kaitai_struct_cpp_stl_runtime/kaitai_struct_cpp_stl_runtime/tests/unittest.cpp should remove these lines:
- #include <gtest/gtest.h>  // lines 4-4

These changes are incorrect - according to GoogleTest docs, #include <gtest/gtest.h> is the only #include needed. The ones suggested by IWYU are actually private headers not intended for public use. In GoogleTest 1.12, special comments like // IWYU pragma: private, include "gtest/gtest.h" were added to these internal headers to make this clear to IWYU (see google/googletest@100f6fb), but unfortunately, we're using GoogleTest 1.11 because we're installing the libgtest-dev Ubuntu package:

sudo apt-get install -y libgtest-dev

... and the latest libgtest-dev available for "jammy" is 1.11.0-3.

So let me also update to GoogleTest 1.14 to get rid of these incorrect suggestions. It requires building GoogleTest from source, but fortunately that's quite easy to do.

@generalmimon generalmimon requested a review from GreyCat May 3, 2024 16:11
@generalmimon generalmimon requested review from GreyCat and removed request for GreyCat May 24, 2024 08:27
generalmimon added a commit that referenced this pull request Jun 19, 2024
See #72 (comment)

The standard library header `<type_traits>` is only available since
C++11 (see https://en.cppreference.com/w/cpp/header/type_traits), so we
must not try to include it in C++98 mode.
@generalmimon generalmimon force-pushed the self-sufficient-includes branch from ec91637 to 00015f8 Compare June 19, 2024 15:42
Until now, some parts of the code used symbols from the standard library
without including the corresponding standard library header using
`#include` to ensure that the symbol is defined. That is, they silently
assumed that the header has been already included by "someone else".

These assumptions were sometimes met in our code due to transitive
inclusions. For example, the missing `#include <stdexcept>` in
`tests/unittest.cpp` would not turn into a compile error, because
`tests/unittest.cpp` included `kaitai/exceptions.h`, which contains
`#include <stdexcept>`. However, it is discouraged to rely on transitive
inclusions, see
https://google.github.io/styleguide/cppguide.html#Include_What_You_Use:

> Do not rely on transitive inclusions. This allows people to remove
> no-longer-needed `#include` statements from their headers without
> breaking clients. This also applies to related headers - `foo.cc`
> should include `bar.h` if it uses a symbol from it even if `foo.h`
> includes `bar.h`.

Instances of this problem in our code aren't limited to "maintainability
issues" - they can also reduce portability. For instance, @roelschroeven
apparently ran into a `strtoll not found` error in our
`kaitai/kaitaistream.cpp` source file that he had to patch like this
(see
roelschroeven@db668fa):

```diff
diff --git a/kaitai/kaitaistream.cpp b/kaitai/kaitaistream.cpp
index a3810ab..4d5ded2 100644
--- a/kaitai/kaitaistream.cpp
+++ b/kaitai/kaitaistream.cpp
@@ -6,6 +6,7 @@
 #include <iostream>
 #include <vector>
 #include <stdexcept>
+#include <cstdlib>

 // ========================================================================
 // Integer from raw data with correct endianness
@@ -585,7 +586,7 @@ int64_t kaitai::kstream::string_to_int(const std::string& str, int base) {
     char *str_end;

     errno = 0;
-    int64_t res = strtoll(str.c_str(), &str_end, base);
+    int64_t res = std::strtoll(str.c_str(), &str_end, base);

     // Check for successful conversion and throw an exception if the entire string was not converted
     if (str_end != str.c_str() + str.size()) {
```

This makes sense according to
https://en.cppreference.com/w/cpp/string/byte/strtol - we shouldn't
assume that `std::strtoll` exists when we don't have
`#include <cstdlib>` anywhere in our code. It worked fine without it in
CI and on my local gcc 12.3.0 only because some standard library header
that we include happened to use `<cstdlib>` as part of its
implementation. For curiosity, I used this command locally to dump a
tree that shows why each header was `#include`d:

```
c++ -DKS_STR_ENCODING_ICONV -DKS_ZLIB -I. -std=c++11 -H -MM ./kaitai/kaitaistream.cpp 2> kaitaistream-deps.log
```

So for example, we can see that on the gcc 12.3, the `<algorithm>`
header includes `<cstdlib>` for some reason, so that's one of the
reasons there's no compile error if we forget `#include <cstdlib>` in
`kaitaistream.cpp`:

```
. /usr/include/c++/12/algorithm
.. /usr/include/c++/12/bits/stl_algo.h
... /usr/include/c++/12/cstdlib
```

This is completely implementation-specific, of course. There's nothing
in the C++ standard that requires `<algorithm>` to include `<cstdlib>`,
and in other implementations this may not be the case at all.

See also
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#sf10-avoid-dependencies-on-implicitly-included-names
We use `std::size_t` (and not the global `size_t`) because it is the
only variant that `<cstddef>` is guaranteed to provide, see
<https://developers.redhat.com/blog/2016/02/29/why-cstdlib-is-more-complicated-than-you-might-think>:

> You must not assume that `<cxxx>` adds any names to the global
> namespace, and you must not assume that `<xxx.h>` adds any names to
> namespace `std`.

So technically we could do it the other way around (`#include
<stddef.h>` and use `size_t` everywhere), but the `<xxx.h>` headers are
deprecated in C++ in favour of `<cxxx>`. Plus I think it's somewhat
beneficial to use the `std::` prefix, because it reminds us that it's a
type from the standard library, so it needs an `#include`.
See #72 (comment)

The standard library header `<type_traits>` is only available since
C++11 (see https://en.cppreference.com/w/cpp/header/type_traits), so we
must not try to include it in C++98 mode.
IWYU is actually right that `sys/param.h` is not used on Linux. However,
we need it for the BSD detection, so let's tell IWYU that we want to
keep this "unnecessary" inclusion.
@generalmimon generalmimon force-pushed the self-sufficient-includes branch from 00015f8 to 0fd74d4 Compare June 22, 2024 21:56
@generalmimon
Copy link
Member Author

@GreyCat This is now ready to merge - let me know if you have any objections.

@generalmimon generalmimon merged commit 675899a into master Jun 30, 2024
7 checks passed
@generalmimon generalmimon deleted the self-sufficient-includes branch June 30, 2024 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants