-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linked list #68
Comments
C++ has std::list for this. (I added Petaca to your Examples above.) I would mention that I personally have never had a need for a But since there are at least 6 different people who reimplemented this already in Fortran and given that C++ has it too in their standard library, I would say that this would be a good candidate to include in stdlib, so that if people want to use it, they can. So +1 from me. |
Me too, but how do you do it? I thought that appending to an array always re-allocates on heap, e.g.: integer :: i
integer, allocatable :: a(:)
a = [integer ::]
do i = 1, 100000000
a = [a, i] ! re-allocates a on every append
end do It's okay, for small-to-moderate arrays, but for very large ones, isn't it crippling? |
The canonical way is to pre-allocate the array and then append to it, like this: integer :: i
integer, allocatable :: a(:)
allocate(a(100000000))
do i = 1, 1000
a(i) = i
end do Then you use your actual application to figure out what the maximum size of the array is (100000000 in this example), and then you can either keep But as I said, it's good to have linked list in stdlib, if people prefer that, so that they do not need to reimplement it. |
First I think we need to define which types of linked list we need. I prefer a circular double-linked list as the basic type since its the type I use most in FEM codes etc. I also think we would need a single-link list to implement stacks and queues. Also do we need some form of reference counting. As to @milancurcic question as to current Fortran support list that can contain both intrinsic and user defined types, yes it can. I've implemented both a circular list class and a single link class using unlimited polymorphic variables. They works but are not pretty and will probably have poor perfomance when compared to a type specific list generated by pre-processing/templating methods ala the |
Generic linked-list, or really any generic data structure, is really cumbersome with the current Fortran capabilities. They work, but you end up having to use a |
I think the supported data types should be wrapped with containers in order to be extendible. I think FPL (https://github.com/victorsndvg/FPL) contains a smart implementation strategy for supporting native data types and allow to extend to other user defined data types. It contains lists, hash tables, etc. All of them depend on containers (aka wrappers) in order to manage different data types. I agree that with this kind of data types you don't get performance, but amazing flexibility. This kind of data types (usually) are not for computation purposes. Edit:
|
I think many of the projects in the list of popular projects contain linked list implementations. Perhaps it would be good to do a grep over all of those repositories to get a feeling for linked list usage in production codes (e.g. whether they use generic lists supporting multiple kinds or only specific ones for the intrinsic kinds and potentially derived types). |
I agree with @everythingfunctional on this issue. There's a ton of up-front labor in implementing fully polymorphic containers, and I'm not convinced that they're that much more useful than having generic (but homogeneous) containers. That is, I don't think it's worthwhile to support, say, linked lists where each element is of arbitrary type. The more common use case I find is to need a linked list of Letting users make containers of derived types is tricker. The common solution is to provide an abstract base class that users need to extend in order to have containers of derived types. I think that solution kind of sucks, but I have an alternate idea... Just ship source code templates that implement each container for I confess I have not thought through if there is some great pitfall to this approach besides being slightly "icky" from a distribution p.o.v. |
I'm in agreement with @nshaffer here. I've done the linked-list-of Someone else seemed to suggest that perhaps performance shouldn't be a concern here. I think it would be a big mistake to ignore performance. Linked lists come with their intrinsic performance overhead that most would be aware of, but any implementation that significantly added to that I would find unacceptable to include in a standard library. I think the best solution beyond intrinsic types, which could all have very performant implementations, would be, as @nshaffer suggested, to provide a literal template that a user could adapt for their particular case. In fact that's more or less what I do myself. |
A note on performance:
I think there is merit to providing classic data structures and algorithms. I would add hash tables to this list as well as binary-trees, octrees, K-D trees, and a number of others. Obviously they are not useful to all users and applications but having a decent implementation is worthwhile. I agree that right now the |
@zbeekman Generic programming will not make it to the next standard revision -- simply because there is no proposal that is ready. I think the latest most developed idea is pursued at j3-fortran/fortran_proposals#125, and we need everybody's help to help transform the idea into a solid proposal. Once we have a proposal that is community backed, I'll be happy to bring it to the committee and try to get it into the next standard. |
I know @rouson is working with Magne who leads the Bergen Language Design Lab and also @tclune on generics. They have something here but I don't know how up to date it is with their current efforts. Hopefully they can combine efforts and we can get something in, we'll see. |
Yes, the issue j3-fortran/fortran_proposals#125 is the latest based on our discussion with Magne at the last meeting. Anyway, let's move the discussion about this there, I just wanted to point this out, that we need help. |
In the mean time I have a project https://github.com/Goddard-Fortran-Ecosystem/gFTL which provides (by far less elegant means) a generic container system. Currently it supports Vector and Map (ala C++ STL), but also has Set which is used under the hood. gFTL uses the C preprocessor and requires explicit instantiation, but is still a real game changer for doing some common operations within Fortran. I have a separate project gFTL-shared that provides common instantiations. But I do look forward to the day that this could be done much more elegantly through a proper generic facility. (And yes, I realize that other preprocessors could do what I have done more elegantly than the C preprocessor, but ... cpp is already integrated into the build systems for the other projects I work with. |
I agree here with @zbeekman that linked lists are essential and I think the approach to preallocate array is very ineffective (cause then you have to check for overflow and re-allocate it etc). I also sadly agree that this is undoable in the current Fortran. Gotta wait for generics (or hopefully an intrinsic highly-optimized types for lists and dicts). |
Thanks for mentioning my little example (should have been updated a long
time ago). I do agree that `select type` is a big drawback and I use tye
parametric version (using cpp macros) whereever possible. I think a linked
list is a useful structure in many areas, and so are also binary trees and
other. Especially hn doing more CS stuff, as opposed to just scienific
computation.
And many thanks for linking the current work on a proposal. I have looked
at Java interfaces as a possible alternative to multiple inheritance in
normal dynamic-dispatch polymorphism a long time ago. I did not realize the
closeness to Haskell type-classes and I did not realize it could be useful
for compile-time parallelism. I ill have to take more time to study it. I
am still worried whether it will be optimizable to be as efficient as are
C++ templates.
BTW, Ondřej @certik I happen to be a member of the MFF XC skiing club you
used to be in some years ago :) I know your fortran90.org and LFortran
projects but I did not know you were in J3.
Dne so 4. 1. 2020 0:40 uživatel Dominik Gronkiewicz <
[email protected]> napsal:
… I agree here with @zbeekman <https://github.com/zbeekman> that linked
lists are essential and I think the approach to preallocate array is very
ineffective (cause then you have to check for overflow and re-allocate it
etc). I also sadly agree that this is undoable in the current Fortran.
Gotta wait for generics (or hopefully an intrinsic highly-optimized types
for lists and dicts).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#68?email_source=notifications&email_token=AAFSIEJBJR6YAQX2UWDNTW3Q37EGNA5CNFSM4KCFV36KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEICKR7A#issuecomment-570730748>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFSIEMGUOBUGC5JHT2EKLTQ37EGNANCNFSM4KCFV36A>
.
|
@LadaF nice to meet you! Small world. You should put your name and photo at your GitHub profile if you can. |
The FTL ( https://github.com/SCM-NV/ftl Fortran Template Library) is a substantial library that is worth looking at. |
Linked lists are a prime example of an anti-pattern, i.e., "a common response to a recurring problem that is usually ineffective and risks being highly counterproductive." In order of preference, I recommend
If 1 is untenable, start with 2 to reduce development time and maintenance hassles. |
Every data structure (linked list, array, binary tree, heap, priority queue, kd-trees, etc ) has a set of operations along with its computational complexity for each data operation (Cormen et all, Introduction to algorithms). Each has its place in larger algorithms (e.g. ray queries, computational geomety, etc) and programs. Many of them have a place in larger patterns. Having said that, there are some that others are build on (e.g. linked list; heap; priority queue). On top of that, there are some that are so simple that if you can't easily implement it in a language, then the language has serious problems and don't bother even trying the more sophisticated ones. I think the liked list is a good if not natural place to start. |
Great advice @rouson! It would fit well into a Fortran book similar to the Effective C++ series by Scott Meyers. I never realized one can make a forward list using allocatable components. Recently, I was writing some code for operating with polygons. I decided to implement a polygon as a linked list of edges, half-knowing I might come to regret it, but still thought it will be a fun challenge. Two-weeks later I already regret it. The pointers make it difficult to write getter functions which are pure, the copy behavior is convoluted, and so is finalization. Instead I might try to implement the polygon with recursive allocatable components:
(I am aware that this could also be solved with a simple array of vertex coordinates, and the solution above duplicates some information.) Btw, your link in the 4th bullet point is broken. |
@ivan-pi thanks! I just fixed the link in my 4th bullet point. |
Well, I agree with the critique on the seemingly obvious implementation of
linked lists. But the API together with indirect addressing might be worth
investigating: what is the performance difference between a linked list
based on recursive derived types with pointers/allocatables and linked
lists based on arrays with indirect addressing. Plus the difficulty of
correct implementation and "proving" the correctness. It seems a nice
(little?) fun project to me :). And an efficient and effective
implementation could be used as a building block more complex structures.
Op wo 3 mrt. 2021 om 23:59 schreef Damian Rouson <[email protected]>:
… @ivan-pi <https://github.com/ivan-pi> thanks! I just fixed the link in my
4th bullet point.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#68 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN6YR3O5S77525ATC4NYV3TB25LLANCNFSM4KCFV36A>
.
|
Hello everyone, Mentors assigned for this project in GSOC are @arjenmarkus and @milancurcic. So, could any one of you finalize a clear picture of what kind of linked list is required? A homogenous generic linked list (a linked list that contains only a single type of data i.e. decided by the user, this is similar to the list given in standard library in C++) [1,2,3,4,5,20] or A heterogeneous generic linked list (A linked list that can contain any kind of data at its node be it an integer, character array, double or any other data type. this is similar to the list used in python) [1,'Hello',3.14,20] |
Welcome to the project :). The questions you raise are actually part of the project. The design questions need to be elaborated:
|
Ok.
|
It is certainly possible to have a linked list or an array of container types that contains a |
I know that it is possible for one to make such a linked list but how can we expect our user to remember or to know what type there might be inside the node. User will have to maintain different data that describes the type carried by that node. |
No, you do not have to remember what data is in *that node*. You just have
to know what kind of data you expect to encounter in your list. That is
usually a reasonable assumption.
Dne ne 21. 3. 2021 20:52 uživatel ChetanKarwa ***@***.***>
napsal:
… I know that it is possible for one to make such a linked list but how can
we expect our user to remember or to know what type there might be inside
the node. User will have to maintain different data that describes the type
carried by that node.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#68 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFSIEMI23AZDKDSLDISNFLTEZFAVANCNFSM4KCFV36A>
.
|
Not sure about "compiler can optimize". The algorithm is baked into the implementation of STL vectors. The vector interface does not really give the compiler any hint of how the container will be used. Also note that there are other mechanisms for growing a vector. One can specify a minimum size even after the container is partially filled. This might be useful if you know you are memory constrained and don't want to risk the default doubling. |
If |
I have been programming in Fortran since 1981. So I guess I know a thing or two about Fortran. I do a lot of high performance computing. I use Fortran for computations and write my GUI in VB, C#, Java, and had used Pascal and QBasic in their hey days for the front end. I use a lot of graphs and data structures in these languages. Lists are the more commonly used than arrays. One can keep adding to a list without predefining its length. This is important for system programmers. I have never missed lists in my computational engines, except when trying to use stacks. I have my own implementation of a stack where I create a new array and copy the old one when the stack size increases beyond the allocated size. I know all about inefficiency. But then try replacing a stack with an array in any algorithm. Should Fortran have lists, the answer is YES. Should it be super efficient ? The answer is NO. Allow the end users like me make the choice. Not all programmers would be writing weather forecasting routines. Those who write will never use a list. List is a basic feature of modern languages. If you don't want to implement, then you are essentially limiting Fortran to a special use, namely Computations. I am okay with this. But then don't cry that the new generation is not using Fortran. Add modern features like lists or die. The whole arguments above seem to miss the point that we stopped teaching Fortran formally in my University and elsewhere more than 20 years ago. We teach Python to freshmen, and they love it. An avid programmer friend from NIST advised me to shift to Python since Python has so many libraries. And here you are, arguing about implementing a simple list. So funny. Anyway, thanks for trying to build the stdlib, even if it is 30 years too late. |
@KoldMen I agree with your post. One nuance is that above we are arguing about the API of a list data structure in stdlib built using Fortran. Notice that in Python the list is part of the language itself, it is not implemented in Python. In LFortran we have actually recently added List as a first class feature into the intermediate representation, and it will get very efficient implementation in the backends. We have not exposed it in the frontend (syntax) yet, but one option is to simply recognize a list from stdlib, and transform it into the List node in the intermediate representation (instead of using the Fortran implementation from stdlib), and thus it would get excellent performance with LFortran. Other compiles could either do the same, or they could just use the actual implementation from stdlib (slower, but it would work). We also have dictionaries, tuples and sets in LFortran. So once stdlib has an implementation of those, we can hook them in in the frontend. |
Now that we have hash maps in stdlib, it would be nice to also have lists for arbitrary types. This would allow IO libraries to directly export the data they are reading in a stdlib compatible format. |
and
Sorry for going slightly off topic (although I see I'm not the only one pushing for arrays instead of lists, for the case of homogeneous collections), but I parsed (admittedly fast) the whole thread and surprisingly did not find a mention to F2003 intrinsic I actually did not know about automatic allocations with array constructors, but have implemented in the past a compressed-sparse-row grower using Aside of all of this, I also agree that the user should have a choice and adding linked lists in stdlib may be beneficial for many people, just wanted to go a little off-track and point out this alternative to the readers (like me 🙃). But maybe I would prefer an inhomogeneous one, following @awvwgk's idea of simplifying IO operations (very much like Matlab does with its cell arrays, that are returned whenever an IO intrinsic operates on inhomogeneous files). The idea would then be quite "classic": homogenous collections go into (dynamic) arrays, inhomogeneous ones into lists. |
Dear Ondrej,
In LFortran we have actually recently added List as a first class feature
into the intermediate representation, and it will get very efficient
implementation in the backends. We have not exposed it in the frontend
(syntax) yet, but one option is to simply recognize a list from stdlib, and
transform it into the List node in the intermediate representation (instead
of using the Fortran implementation from stdlib), and thus it would get
excellent performance with LFortran. Other compiles could either do the
same, or they could just use the actual implementation from stdlib (slower,
but it would work).
We also have dictionaries, tuples and sets in LFortran. So once stdlib has
an implementation of those, we can hook them in in the frontend.
Simply wow. Hats off. This is the kind of change that this language needs.
|
Thank you @gronki! The LPython compiler (which shares the middle end and backends with LFortran) has lists and on our initial preliminary benchmarks it seems faster than Clang/GCC's The same with dictionaries against So LFortran already has this capability; all that is needed is to expose it via some syntax. Our experience with LPython is to use regular Python syntax for our fast features and provide a CPython implementation. That seems to work really well. In the same way, as indicated above, we can expose these nice LFortran features via regular (existing) Fortran syntax and provide a Fortran implementation. LFortran would recognize it and use List. That still leaves the door open to also create an extension of the Fortran language, we can still do that. @milancurcic, @everythingfunctional if you want to move this forward, let's get a usable List implementation into stdlib, and then we can teach LFortran about it. Everything is ready from my side, we just need the "syntax" in stdlib. I recommend not to use linked list, but rather store the length, capacity and the data, just like Python, or |
That's a neat way to do it. I can implement a list for stdlib. It shouldn't be hard. However, what should the API look like? I'd lean towards something like
|
There's an implementation in #491. It needs some work (mainly docs and tests) to get it through the finish line. I suggest discussing and adjusting (if needed) the API there. |
That is I suppose a place to start the conversation, but I think it has issues, such as:
@certik , what is the API for the list that LFortran uses? |
During the GSoC project we have seen that the implementation is actually
quite fast. But of course, there may be better ways to achieve the
functionality.
Op di 18 okt. 2022 om 03:33 schreef Brad Richardson <
***@***.***>:
… There's an implementation in #491
<#491>. It needs some work
(mainly docs and tests) to get it through the finish line. I suggest
discussing and adjusting (if needed) the API there.
That is I suppose a place to start the conversation, but I think it has
issues, such as:
- It's a linked list, and the interface seems (to me) to imply that.
And @certik <https://github.com/certik> indicated we probably
shouldn't do that.
- It's unlimited polymorphic. My suspicion is that we would rather
have lists of a specific type to avoid the overhead and clunky usage of
class(*) things.
@certik <https://github.com/certik> , what is the API for the list that
LFortran uses?
—
Reply to this email directly, view it on GitHub
<#68 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN6YR7OCG2GSDEQSKRATTLWDX457ANCNFSM4KCFV36A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I think there are a few different conversations here (lists vs. vectors and API vs. implementation). Some comments:
|
Let's keep the discussion here to have it in one place, it's all related.
|
OK, your terminology is confusing to me, but we can pick any that most people here agree on. For this conversation, we can just call them "linked list" and "list", as you suggest. Based on your description, as I understand it, your list is the same as a dynamic array. All elements of the same type, cheap indexing and prepending/appending, slow inserting. The difference is not just an implementation detail. They're different data structures. I initially opened this thread to ask whether there's a desire for a structure with fast insertion (a linked list). I was motivated by the fact that there are many home-cooked implementations on the internet of Fortran linked lists, so it's possible that the community would benefit from a stdlib implementation. For me personally, your list is a more useful structure. Considering the option of having both structures (linked list and list), do you think it would be a good design if they had the same (to the extent possible) API? |
There is also need for a list holding type :: wrapped_class
class(*), allocatable :: raw
end type wrapped_class And deal with the class dispatch themselves. In this case stdlib could directly provide such a wrapper class. |
Related note: In the initial release of generics in Fortran, one will not be able to write And you cannot even write an "outer" template to provide the polymorhic case, because you cannot even define the wrapper type within the template. Again - there is no way to write And no, I'm not happy about this, as well over half of my uses of containers are polymorphic. The majority of the subgroup prefers to wait for the initial set of features to be implemented so that we can more accurately plan subsequent extensions. I.e., so that we worry about the things that matter most after experience is gained. |
We can discuss the terminology at our next Fortran meeting. :) @tclune, we have our former GSoC student (@ansharlubis) working on the generics in LFortran. If you know about anybody else who can help, please let me know. While we are at this: there does not seem to be many good examples of the syntax. Here is our issue for this: lfortran/lfortran#929 where I linked all examples that I was able to find so far. If you create more examples, that would be a huge help. |
@certik The only examples that properly use the syntax as it appears in the formal syntax paper are: Until now, the syntax has been a (slowly) moving target, and maintaining the examples takes time. If/when the syntax paper is approved, we can start producing "final" examples and deleting the ones with obsolete syntax. |
I see, thanks @tclune! |
About terminologyIt may be worth noting that Wikipedia explicitly states that a list can be implemented either as a linked list or a dynamic array (plus a generic statement about the former being more frequent for lisp dialects, where the terms "list" and "linked list" are often conflated.) https://en.m.wikipedia.org/wiki/List_(abstract_data_type) To be fair the first picture you see is a depiction of a linked list, so I totally get that it can be misleading. I've also being taught in bachelor that a "list" is a "linked list". In general, I wouldn't want to give too much importance to Wikipedia per sé, but I like that is more or less language agnostic, and we do not necessarily want to follow conventions from either the C++, or python community or anything else, rather use terms that would be familiar for people with different backgrounds and, even more importantly, would not result confusing after googling them. The C++ case gives an important lesson in my opinion, as
Relevant quote from stack overflow:
|
I think this is exactly what you'll want to do, it would make easy for client code to switch from one to the other, according to different needs / requirements changes / benchmarks.
I may just comment that I know that in the codebases that I'm working on during the PhD there is currently a homebrew implementation of a dynamic array (for sparse rows...) Older people told me that that was initially implemented as a linked list and used in many performance critical parts of the code. As they eventually scaled up the requirements, crafted a massively parallel algorithm for the iterative solver and started profiling the MPI performance they immediately spotted these linked lists as the main bottleneck and changed to dynamic arrays to give the very same "list functionality" but with acceptable performance. |
As for They would be inherently inferior for performance, but provide a whole new level of flexibility... this should suffice to qualify them as a totally separated thing, targeting very different use cases. Matlab calls such things1 "cells". How do you feel about such a name? Alternatives? Footnotes
|
Since the terminology varies between communities. I suggest being explicit and descriptive such that the name reflects the properties of the structure. Perhaps those would be:
|
@milancurcic, is the difference that I know a few algorithms which require a |
For what is worth, the first time I heard the term dynamic array is from Milan above, and I wouldn't know what it means without this thread, since Fortran arrays are also "dynamic" (in my mind). But I don't have a better term for it right now (besides just List), so we can use DynamicArray, to mean the "size+capacity+pointer" implementation (like Thanks @bellomia for the comments! Yes, in LFortran's ASR (intermediate representation) we use the term List, and indeed it is independent of how it is actually implemented. So in order to be able to represent both linked list and size+capacity+pointer implementation, we could just add a flag to the type, such as "implementation=LinkedList | DynamicArray". So the API could indeed be exactly the same for both. We can use "linked_list" and "dynamic_array" for implementations and then somehow alias "list" to mean "dynamic_array", as I think "list" is easier for people to understand what it means. Just "list" and "array". |
Great, yes, I'm not so concerned about the final terminology in stdlib because we can arrive there through consensus in issues and PRs; but more that we are on the same page in this thread when we discuss these structures so that we know what each name refers to. |
@ivan-pi As I understand it, the time complexity (fast insertion vs. fast indexing) and the type of elements are orthogonal. A linked list has fast insertion and growing and shrinking but slow indexing. A list/vector has fast indexing and fast (amortized) growing and shrinking but slow insertion. A linked list with But a regular list/vector can, I think, also be implemented with And either structure with fixed (intrinsic) types is easy to consume elements from (no So, I think that choosing the specific structures to implement requires deciding along both dimensions, time complexity of operations, and type-flexibility vs. ease of use. |
Problem
Linked list is one of the essential data structures beside an array. It allows you to add, insert, or remove elements in constant time, without re-allocating the whole structure.
Fortran doesn't have a linked list. There are 3rd party libraries, but no obvious go-to solution. Fortran stdlib should have a linked list. I would use it.
Examples
What kind of data can the linked list hold?
There's various levels of capability we could pursue:
API
I don't know, something like this?
The text was updated successfully, but these errors were encountered: