-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with the Stable ABI #4
Comments
Here's where I got that number: https://blog.trailofbits.com/2022/11/15/python-wheels-abi-abi3audit/
Not sure how much the "downloaded in the last 21 days" affects things. From a single pass skim over the gist just now I recognised five packages. Probably could twist abi3audit into telling you specifically how many packages would break from removing some macros. |
cc @woodruffw |
Thanks for the ping! Yeah, I got those numbers from the BigQuery dataset for the last 21 days (which was a purely arbitrary limitation that should be lifted for a full analysis). Because of how weak the controls around the
Yeah, this should be doable -- you can use the big JSON blob in that blog post to see which symbols are actually in use, and I'm open to further improvements to |
How would that be able to tell you that a given binary is or isn't using |
One of the big projects using the stable ABI is cryptography, here is their pypi download page. I think they would not be happy if they had to make wheels for all the versions of CPython they support on all the platforms they support. This is the flip-side of the faster (yearly) release cycle. If I recall correctly the stable ABI was one of the carrots offered to package maintainers worried about continually chasing new CPython versions.
It makes sense, I don't think the intent of the stable ABI was to freeze the ABI forever. Could there be an |
It only works for symbols and not macros, unfortunately, so
I'm biased as a package maintainer, but I'm personally a fan of something like this: the problem with the Stable ABI isn't the idea itself, but the restrictions that it's accumulated over the years. A similar idea was floated on the forums a few months ago: https://discuss.python.org/t/lets-get-rid-of-the-stable-abi-but-keep-the-limited-api/18458 |
Could we stick to facts, please?
AFAIK, This is the biggest issue in the stable ABI (which is 14 years old now). Your post suggests there are other big issues, are there? (I know about many of issues, of course, but nothing this big...)
Replacing them with functions would mean an |
In this repo we're trying to just do an inventory of problems, without immediately jumping to solutions (unless the problem is small and the solution is similarly contained). With apologies for the provocative title, it does seem that the remaining struct fields exposed by the Stable ABI, even if they are just Looking at the discourse thread Let's get rid of the stable ABI, but keep the limited API I actually see a spectrum of opinions on how to address the problem (and some confusion between Stable ABI and Limited API), but no additional problems, so let's just say that struct fields (including the all-important Another problem is that the Stable ABI doesn't provide the full functionality needed by some projects, e.g. PyObjC. (This may explain that relatively few projects use it.) That thread also confirms the rough count of abi3-using projects (around 320). Anyway, I suggest that we stick to this for the inventory for now, unless additional problems related to the Stable ABI are uncovered. (@encukou Feel free to add the other problems you know of, maybe we'll eventually discern a pattern leading to an innovative solution.) |
Most other problems of the stable ABI (and limited API) are shared with the C API in general. Let me give some thoughts, hoping that they're useful even if they are a bit off-topic. The C API is vast, and my approach to improving it was to find a manageable subset, and then improve its quality and make it complete enough for general usage. Limited API (defined before my time, in PEP-384) was a reasonable starting point, and the stability guarantees make a good “carrot" for people to try it out (and find its shortcomings).
Yup. It's not big. But it's now possible to use limited API in real-world project, although core dev involvement ( As for “only two of those have a name anyone recognizes” -- that might not change much: for a popular leaf project, releasing yearly in the 2-month RC window is not that big of a problem. I expect Stable ABI to be much more useful for the long tail of smaller extensions. Anyway, for the inventory, I see several issues identified here. I'm not sure how you want to organize them:
|
👋, one of the cryptography maintainers here. Happy to answer any questions about our setup, or what's important to us. |
If we’re ever going to propose to deprecate the stable ABI I promise there will be a debate where you will be heard. But that’s not on the table now. We’re just trying to inventory problems with the C API here. So if anything about the C API bugs you, please create an issue for it! (Come to think of it, specific praise for the C API might also be useful.) |
(Closed by mistake, clumsy fingers. :-) |
I think that there would be far more stable ABI packages if it would be straightforward to compile C++ projects into stable ABI wheels. But it isn't. For one thing, the limited API has so far been too limited for it to be usable by various general C++ binding tools. That is changing now with 3.12, which has a number of relevant API calls exposed in the limited API. Drawing a conclusion from the PyPI data point may be missing part of the big picture. |
Where? As issues? Here are some quotes from my language binding survey from a while ago: David Hewitt (PyO3) said:
Wenzel Jakob (pybind11/nanobind) said:
Karl Nelson (JPype) said:
|
Those are great! Did Karl's issues get addressed? Are the pain points from the Google Doc you linked to entered as issues here yet? (How old is that doc, and how widely did you distribute it?) |
Also, one point about the motivating number:
For this to be an meaningful ratio, the denominator should be the number of PyPI projects with binary extensions and not the total number of packages (of which most are pure Python code). |
Yup, which is why I asked @hauntsaninja to verify my numbers. I'm sure there's someone here who can do some more queries and report back. (Maybe @woodruffw ?) |
Some of Karl's issues are hopefully addressed in 3.12, others are reported here, a few I left with “PRs welcome” (which is rather cruel of me, as Karl can't sign Python's CLA). |
It's unfortunately difficult to get accurate numbers here, in part for the reasons identified in the post: wheels can be mis-tagged on PyPI and might just coincidentally be working for the overwhelming majority of users. That being said, I can try and get the following numbers/ratios:
If that sounds okay/topical, I can try and get those numbers tonight. |
That sounds great -- let's hear it for more data! |
Okay, here are some queries and their numbers, as of today: Total number of wheels: SELECT COUNT(*) AS num_wheels
FROM `bigquery-public-data.pypi.distribution_metadata`
WHERE packagetype = "bdist_wheel" Produces: Total number of binary wheel distributions (calculated by searching for non-binary wheels and subtracting from the total): SELECT COUNT(*) AS num_pure_wheels
FROM `bigquery-public-data.pypi.distribution_metadata`
WHERE packagetype = "bdist_wheel"
AND filename LIKE "%none-any%" Produces: SELECT COUNT(*) AS num_abi3_wheels
FROM `bigquery-public-data.pypi.distribution_metadata`
WHERE packagetype = "bdist_wheel"
AND filename LIKE "%-abi3-%" Produces: Total list of packages that contain SELECT DISTINCT name
FROM `bigquery-public-data.pypi.distribution_metadata`
WHERE packagetype = "bdist_wheel"
AND filename LIKE "%-abi3-%" Produces: https://drive.google.com/file/d/1D_zFlsxVJmmqXJ4UMZrCOm0jzdC0NNSa/view?usp=sharing (That last link should be a CSV dump of packages that contain I can do some more concrete popularity statistics later (since the numbers above only indicate raw package counts, not how popular each individual package is) 🙂 |
I've renamed the issue. As you said, it's provocative, and I don't think stirring up emotions is good for the discussion. Let's keep accusations of causing harm out of this forum. I'd be happy to talk privately, if you're interested (you might be, given your other Summit talk), or if you want to keep the title. Anyway: My interpretation is that a stable ABI is a very useful feature. It's just our current implementation -- and marketing of it -- that is lacking. Also: Don't only look at PyPI. Python is bigger than that. |
In Python 3.11, I prepared the C API to convert Py_REFCNT(), Py_TYPE() and Py_SIZE() to opaque function calls, rather than macro or static inline functions accessing directly structure members. https://peps.python.org/pep-0670/ and https://peps.python.org/pep-0674/ were designed for that. One API issue was that these 3 macros were used to modify an object reference count, type or size, and so it was not possible to convert them to opaque function calls. Converting them to opaque function calls can have a cost on performance which deserves to be mesured. Also the benenit of such change was unclear to most people so i didn't do it. Maybe immortal objects and nogil projects make the issues more obvious? Note: this issue scope is too broad, i suggest to open more specific issues. |
More up-to-date numbers based on https://py-code.org/ and a ClickHouse version of the associated database: The current count of projects shipping abi3 wheels appears to be 648. |
Petr asked to leave a comment here, based on the discussion on Discourse: His question was
My answer The main difference is that you cannot change the APIs in the limited API at all, without breaking the ABI. Without this limitation, the APIs could be changed subject to the normal deprecation procedures and participate in the evolution of the APIs. And because the stable ABI doesn’t even specify an expected lifetime in years or number of releases, it means that no changes are possible until we move to 4.x. Lifting the requirement to be stable across all 3.x versions would help with this problem, of course, but then I don’t think we’re that far off from the regular deprecation process, which also supports compatibility for at least 3 releases. I'll see where the discussion goes on Discourse and then update this post accordingly |
Which kind of change are you thinking about? I looked at Limited C API since Python 3.2 and I found these changes, mostly API removals: When a limited C API function is removed, the function remains available at the ABI level: libpython still provides the symbol. Recently, I proposed deprecating passing NULL as the value in PyObject_SetAttr(), since currently it does remove the attribute, and this behavior can be a bug, when the caller created an object, but the creation failed. The question was how to deal with this issue in the stable ABI? See issue: python/cpython#106572 I proposed to keep the same API, but depending on selected Py_LIMITED_API, select between the old behavior (accept NULL value) or the new behavior (NULL value raises an exception): call a different function at the ABI level. But other participant were not really convinced that it's an important issue to solve, and so I gave up on my attempt to address this issue. The API can evolve without losing support for existing stable ABI binaries, there are technical solutions for that. |
It would be nice to consider designing an abi4 to cleanup the dust: remove functions which are already removed in the API level. The abi3 has 61 symbols which are "ABI only": has been removed from the limited C API. Well, there are also private symbols which are used by limited C API macros/functions which still exist. For example, the ABI-only variable These removals are related to different Python changes. Examples:
I see that ABI-only symbols (functions and variables) of the Python 3.13 stable ABI:
|
@vstinner Instead of killing |
What does it mean for an inline function to be in the Stable ABI? |
Edit: I think I meant to say "Limited API". Which I interpret as: this will compile to something that is supported by the stable ABI. (As is happening right now, by calling |
What do you mean by "killing" it? The
Static inline functions are part of the limited C API, but not part of the stable ABI: the code is copied into the built binary, it's not shipped by libpython. It's similar to macros. Simplified implementation of static inline Py_ALWAYS_INLINE void Py_INCREF(PyObject *op)
{
_Py_IncRef(op);
} The compiler copies In the past, C extensions were linked to libpython. That's no longer needed. The C extension just looks for symbols in the process which loads it. It just works :-) It solves issues depending on if Python is built with or without The previous limited C API implementation of When |
No, it is not. The limited API is defined by an explicit list, not just by what happens to be visible to the compiler. This is a very important feature: we avoid #34 for the limited subset of the API. The limited API is defined in
Please avoid macros/inline functions in the limited API (except as optimizations). Such functions can only be used by C/C++, not even by |
@vstinner -- I seem to have misunderstood your message a few posts up. Weren't you proposing an |
I don't need abi4. I'm not aware of anyone actively pushing for this. PyObject_New() is not going away. |
That's very confusing for me. For me, the limited C API is what is accessible by If PyObject_New() is not part of the limited C API, it should be removed when the macro is defined, no? What is the point of having _PyObject_New() in the stable ABI if PyObject_New() is not part of the limited C API? |
Phew. Thank you for clarifying!
Wait, what? That is exactly the thing I asked not to do. ;) |
Well, Python has a long history, and C was the main target of the C API. Right, we can slowly convert macros and static inline functions to regular functions, but it's a slow incremental work since it can affect performance and may introduce surprising issues (like how macros can be badly abused in C, see PEP 674). I got bitten by converting PyType_HasFeature() macro to an opaque function which made Python slower on macOS because Python wasn't built with LTO there (on Linux, there was no impact). I wrote some articles on the topic:
Well, and https://peps.python.org/pep-0670/ obviously. So far, what worked the best is to have an opaque function call in the limited C API, and override the name with a macro or static inline function in the non-limited C API. |
That would be good to have.
That, or it should be added to the limited API :) |
Discussion for solving this: https://discuss.python.org/t/making-pyobject-opaque-in-the-limited-api/77206 |
During PyCon, @hauntsaninja mentioned some surprising data: out of 400,000 PyPI projects, only about 300 are using the Stable ABI, and only two of those have a name anyone recognizes. (Shantanu could you link to those results and verify the numbers?) [UPDATE: There are better numbers further down in the thread.]
So I'd like to add the existence of the Stable ABI to the list of problems. It makes evolving certain APIs hard -- in particular, the
ob_refcnt
field is exposed through macros in the Stable ABI which means that for immortal objects in 3.12, we had to go through contortions to keep supporting wheels built with the 3.11 or older versions ofPy_INCREF
and friends.Maybe the Stable ABI could at least be revised to not contain any macros that access object fields directly (replacing them with equivalent functions)?
The text was updated successfully, but these errors were encountered: