-
Notifications
You must be signed in to change notification settings - Fork 291
File Descriptors in IronPython
The conceptual picture of file descriptors (FDs) usage on Windows, for the most interesting case of FileStream
:
graph LR;
FileIO --> StreamBox --> FileStream --> Handle(Handle) --> OSFile[OS File];
FD(FD) <--> StreamBox;
Conceptually, the relationship between FD
(a number) and StreamBox
(a class) is bidirectional because PythonFileManager
(a global singleton) maintains the association between the two so it is cost-free to obtain the one having the other. FD is not the same as the handle, which is created by the OS. FD is an emulated (fake) file descriptor, assigned by the PythonFileManager
, for the purpose of supporting the Python API. The descriptors are allocated lazily, i.e. only if the user code makes an API call that accesses it. Once assigned, the descriptor does not change. The FD number is released once the FD is closed (or the associated FileIO
is closed and had closefd
set to true.)
It is possible to have the structure above without FileIO
; for instance when an OS file is opened with one of the low-level functions in os
, or when an existing FD is duplicated. It is also possible to associate one FD with several FileIO
. In such cases it is the responsibility of the user code to take care that the FD is closed at the right time.
When an FD is duplicated (using dup
or dup2
), the associated StreamBox
is duplicated too (there is always a 1-to-1 relationship between FD and StreamBox
), but the underlying FileStream
object remains the same, and so is the underlying OS handle. The new FD may be used to create a FileIO
(or several, just as for the original FD). All read/seek/write operations on both descriptors go though the same FileStream
object and the same OS handle.
graph LR;
FD1(FD1) <--> StreamBox --> FileStream --> Handle(Handle) --> OSFile[OS File];
FD2(FD2) <--> StreamBox2[StreamBox] --> FileStream;
The descriptors can be closed independently, and the underlying FileStream
is closed when the last StreamBox
using it is closed.
On Unix-like systems (Linux, maxOS), FileStream
uses the actual file descriptor as the handle. In the past. IronPython was ignoring this and still issuing its own fake file descriptors as it is in the case of Windows. Now, however, the genuine FD is extracted from the handle and used as FD at the PythonFileManager
level, ensuring that clients of Python API obtain the genuine FD.
graph LR;
FileIO --> StreamBox --> FileStream --> FDH(FD) --> OSFile[OS File];
FD(FD) <--> StreamBox;
When a file descriptor FD is duplicated, the actual OS call is made to create the duplicate FD2. In order to use FD2 directly, a new Stream
object has to be created around it.
The straightforward solution is to create another FileStream
using the constructor that accepts an already opened file descriptor.
graph LR;
FD1(FD1) <--> StreamBox --> FileStream --> FDH1(FD1) --> OSFile[OS File];
FD2(FD2) <--> StreamBox2[StreamBox] --> FileStream2[FileStream] --> FDH2(FD2) --> OSFile;
In this way, the file descriptor on the PythonFileManager
level is the same as the file descriptor used by FileStream
.
Unfortunately, on .NET, somehow, two FileStream
instances using the same file descriptor will have the two independent read/write positions. This is not how duplicated file descriptors should work: both descriptors should point to the same file description structure and share the read/seek/write position. In practice, on .NET, writing through the second file object will overwrite data already written through the first file object. In regular Unix applications (incl. CPython), the subsequent writes append data, regardless which file object is used. The same principle should apply to reads.
Also unfortunately, on Mono, the FileStream
constructor accepts only descriptors opened by another call to a FileStream
constructor[1]. So descriptors obtained from direct OS calls, like open
, creat
, dup
, dup2
are being rejected.
On .NET, FileStream
that was backing an open FileIO
or an open FD from a direct call to os.open
has been replaced by PosixFileStream
. This class operates directly on the given file descriptor providing unbuffered file access, and replicating CPython's behaviour. So, a duplicated file descriptor looks like in the following diagram:
graph LR;
FD1(FD1) <--> StreamBox --> PosixFileStream --> FDH1(FD1) --> OSFile[OS File];
FD2(FD2) <--> StreamBox2[StreamBox] --> PosixFileStream2[PosixFileStream] --> FDH2(FD2) --> OSFile;
The solution on .NET 6 is the same as on .NET 8: PosixFileStream
is used instead of FileStream
. However, an issue arises when an mmap
object is requested for a given FD. mmap
implementation is backed by MemoryMappedFile
from the .NET library. On .NET 8, a MemoryMappedFile
instance can be created from a given FD. .NET 6 lacks this constructor and only accepts FileStream
(for maps that are backed by a regular file). Therefore, for the purpose of supporting MemoryMappedFile
, a deficated FileStream
is created around the given FD. This instance of FileStream
is not registered with PythonFileManager
but managed directly by MmapDefault
, which implements mmap
.
graph LR;
FD(FD) <--> StreamBox --> PosixFileStream --> FDH(FD) --> OSFile[OS File];
MmapDefault --> FileStream2[FileStream] --> FDH;
To use system-opened file descriptors on Mono, UnixStream
could be used instead of FileStream
.
graph LR;
FD1(FD1) <--> StreamBox --> FileStream --> FDH1(FD1) --> OSFile[OS File];
FD2(FD2) <--> StreamBox2[StreamBox] --> UnixStream --> FDH2(FD2) --> OSFile;
Since FileIO
works with various types of the underlying Stream
, using UnixStream
should be OK.
Although UnixStream
is available in .NET through package Mono.Posix
, this solution still does not work around desynchronized read/write position, which FileStream
using the original FD1 must somehow maintain independently.
Another problem with using UnixStream
is that this class is unsuitable to create MemoryMappedFile
, which on Mono (like on .NET before 8.0) has to be created by being given FileStream
(for file-backed mmaps). Therefore, on Mono, FileStream
is being used as the backing for FileIO
and a naked FD, just as it is the case on Windows. The difference with Windows is, however, is that PythonFileManager
uses actual FDs when managing files, not emulated ones. When those actual descriptors are being duplicated, the code tries first to use FileStream
to access the duplicated descriptor. This leads to a situation described in the "Straightforward Mechanism" section, with all caveats listed there. If using FileStream
fails, UnixStream
is employed, as presented in the diagram above.
As mentioned before, using UnixStream
may lead to problems when such FD is used to create mmap
, but mmap
created on a file opened regularly (not duplicated) will work.
In Python, a file can be opened with mode "ab+". The file is opened for appending to the end (created if not exists), and the +
means that it is also opened for updating. i.e. reading and writing. The file pointer is initially set at the end of the file (ready to write to append) but can be moved around to read already existing data. However, each write will append data to the end and reset the read/write pointer at the end again.
This opening mode is not supported by FileStream
. On platforms that don't rely on FileStream
(.NET 6.0+/POSIX), this is not an issue as PosixFileStream
handles it the same way as CPython. On other plaforms (Windows — all frameworks, Mono) mode "ab+" is simulated by using two file streams, one for reading and one for writing. Both are maintained in a single StreamBox
but will have different file handles (Mono: file descriptors).
graph LR;
FileIO --> StreamBox --> FileStreamR["FileStream (R)"] --> HandleR("Handle (R)") --> OSFile[OS File];
StreamBox --> FileStreamW["FileStream (W)"] --> HandleW("Handle (W)") --> OSFile;
FD(FD) <--> StreamBox;
On Windows, since a file descriptor is emulated, this does not create problems. The question might arise which FileStream
should be used as backing for MemoryMappedFile
but it is not relevant since file opened in mode "a" is not suitable to be used for mmap
anyway.
On Mono, the file desriptor reported by such combo is a genuine descriptor of the write-stream. When the descriptor is duplicated, it is the write-stream's descriptor that gets duplicated, with the exception that if the target FD (using dup2
) is 0 (stdin
), the read-stream's descriptor gets duplicated.
Still looking for more? Send a message by creating a feature request on the Issues tab.
🐍 IronPython