Skip to content

File Descriptors in IronPython

Pavel Koneski edited this page Jan 19, 2025 · 2 revisions

Windows

The conceptual picture of file descriptors (FDs) usage on Windows, for the most interesting case of FileStream:

graph LR;

FileIO --> StreamBox --> FileStream --> Handle(Handle) --> OSFile[OS File];
FD(FD) <--> StreamBox;
Loading

Conceptually, the relationship between FD (a number) and StreamBox (a class) is bidirectional because PythonFileManager (a global singleton) maintains the association between the two so it is cost-free to obtain the one having the other. FD is not the same as the handle, which is created by the OS. FD is an emulated (fake) file descriptor, assigned by the PythonFileManager, for the purpose of supporting the Python API. The descriptors are allocated lazily, i.e. only if the user code makes an API call that accesses it. Once assigned, the descriptor does not change. The FD number is released once the FD is closed (or the associated FileIO is closed and had closefd set to true.)

It is possible to have the structure above without FileIO; for instance when an OS file is opened with one of the low-level functions in os, or when an existing FD is duplicated. It is also possible to associate one FD with several FileIO. In such cases it is the responsibility of the user code to take care that the FD is closed at the right time.

When an FD is duplicated (using dup or dup2), the associated StreamBox is duplicated too (there is always a 1-to-1 relationship between FD and StreamBox), but the underlying FileStream object remains the same, and so is the underlying OS handle. The new FD may be used to create a FileIO (or several, just as for the original FD). All read/seek/write operations on both descriptors go though the same FileStream object and the same OS handle.

graph LR;

FD1(FD1) <--> StreamBox --> FileStream --> Handle(Handle) --> OSFile[OS File];
FD2(FD2) <--> StreamBox2[StreamBox] --> FileStream;
Loading

The descriptors can be closed independently, and the underlying FileStream is closed when the last StreamBox using it is closed.

POSIX

On Unix-like systems (Linux, maxOS), FileStream uses the actual file descriptor as the handle. In the past. IronPython was ignoring this and still issuing its own fake file descriptors as it is in the case of Windows. Now, however, the genuine FD is extracted from the handle and used as FD at the PythonFileManager level, ensuring that clients of Python API obtain the genuine FD.

graph LR;

FileIO --> StreamBox --> FileStream --> FDH(FD) --> OSFile[OS File];
FD(FD) <--> StreamBox;
Loading

When a file descriptor FD is duplicated, the actual OS call is made to create the duplicate FD2. In order to use FD2 directly, a new Stream object has to be created around it.

Straightforward Mechanism

The straightforward solution is to create another FileStream using the constructor that accepts an already opened file descriptor.

graph LR;

FD1(FD1) <--> StreamBox --> FileStream --> FDH1(FD1) --> OSFile[OS File];
FD2(FD2) <--> StreamBox2[StreamBox] --> FileStream2[FileStream] --> FDH2(FD2) --> OSFile;
Loading

In this way, the file descriptor on the PythonFileManager level is the same as the file descriptor used by FileStream.

Unfortunately, on .NET, somehow, two FileStream instances using the same file descriptor will have the two independent read/write positions. This is not how duplicated file descriptors should work: both descriptors should point to the same file description structure and share the read/seek/write position. In practice, on .NET, writing through the second file object will overwrite data already written through the first file object. In regular Unix applications (incl. CPython), the subsequent writes append data, regardless which file object is used. The same principle should apply to reads.

Also unfortunately, on Mono, the FileStream constructor accepts only descriptors opened by another call to a FileStream constructor[1]. So descriptors obtained from direct OS calls, like open, creat, dup, dup2 are being rejected.

Solution on .NET 8+

On .NET, FileStream that was backing an open FileIO or an open FD from a direct call to os.open has been replaced by PosixFileStream. This class operates directly on the given file descriptor providing unbuffered file access, and replicating CPython's behaviour. So, a duplicated file descriptor looks like in the following diagram:

graph LR;

FD1(FD1) <--> StreamBox --> PosixFileStream --> FDH1(FD1) --> OSFile[OS File];
FD2(FD2) <--> StreamBox2[StreamBox] --> PosixFileStream2[PosixFileStream] --> FDH2(FD2) --> OSFile;
Loading

Workaround on .NET 6

The solution on .NET 6 is the same as on .NET 8: PosixFileStream is used instead of FileStream. However, an issue arises when an mmap object is requested for a given FD. mmap implementation is backed by MemoryMappedFile from the .NET library. On .NET 8, a MemoryMappedFile instance can be created from a given FD. .NET 6 lacks this constructor and only accepts FileStream (for maps that are backed by a regular file). Therefore, for the purpose of supporting MemoryMappedFile, a deficated FileStream is created around the given FD. This instance of FileStream is not registered with PythonFileManager but managed directly by MmapDefault, which implements mmap.

graph LR;

FD(FD) <--> StreamBox --> PosixFileStream --> FDH(FD) --> OSFile[OS File];
MmapDefault --> FileStream2[FileStream] --> FDH;
Loading

Mono Workaround

To use system-opened file descriptors on Mono, UnixStream could be used instead of FileStream.

graph LR;

FD1(FD1) <--> StreamBox --> FileStream --> FDH1(FD1) --> OSFile[OS File];
FD2(FD2) <--> StreamBox2[StreamBox] --> UnixStream --> FDH2(FD2) --> OSFile;
Loading

Since FileIO works with various types of the underlying Stream, using UnixStream should be OK.

Although UnixStream is available in .NET through package Mono.Posix, this solution still does not work around desynchronized read/write position, which FileStream using the original FD1 must somehow maintain independently.

Another problem with using UnixStream is that this class is unsuitable to create MemoryMappedFile, which on Mono (like on .NET before 8.0) has to be created by being given FileStream (for file-backed mmaps). Therefore, on Mono, FileStream is being used as the backing for FileIO and a naked FD, just as it is the case on Windows. The difference with Windows is, however, is that PythonFileManager uses actual FDs when managing files, not emulated ones. When those actual descriptors are being duplicated, the code tries first to use FileStream to access the duplicated descriptor. This leads to a situation described in the "Straightforward Mechanism" section, with all caveats listed there. If using FileStream fails, UnixStream is employed, as presented in the diagram above.

As mentioned before, using UnixStream may lead to problems when such FD is used to create mmap, but mmap created on a file opened regularly (not duplicated) will work.

Special Case: Double Stream

In Python, a file can be opened with mode "ab+". The file is opened for appending to the end (created if not exists), and the + means that it is also opened for updating. i.e. reading and writing. The file pointer is initially set at the end of the file (ready to write to append) but can be moved around to read already existing data. However, each write will append data to the end and reset the read/write pointer at the end again.

This opening mode is not supported by FileStream. On platforms that don't rely on FileStream (.NET 6.0+/POSIX), this is not an issue as PosixFileStream handles it the same way as CPython. On other plaforms (Windows — all frameworks, Mono) mode "ab+" is simulated by using two file streams, one for reading and one for writing. Both are maintained in a single StreamBox but will have different file handles (Mono: file descriptors).

graph LR;

FileIO --> StreamBox --> FileStreamR["FileStream (R)"] --> HandleR("Handle (R)") --> OSFile[OS File];
StreamBox --> FileStreamW["FileStream (W)"] --> HandleW("Handle (W)") --> OSFile;
FD(FD) <--> StreamBox;
Loading

On Windows, since a file descriptor is emulated, this does not create problems. The question might arise which FileStream should be used as backing for MemoryMappedFile but it is not relevant since file opened in mode "a" is not suitable to be used for mmap anyway.

On Mono, the file desriptor reported by such combo is a genuine descriptor of the write-stream. When the descriptor is duplicated, it is the write-stream's descriptor that gets duplicated, with the exception that if the target FD (using dup2) is 0 (stdin), the read-stream's descriptor gets duplicated.