Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bind to source items by indicies #550

Open
atifaziz opened this issue Dec 20, 2018 · 6 comments
Open

Bind to source items by indicies #550

atifaziz opened this issue Dec 20, 2018 · 6 comments

Comments

@atifaziz
Copy link
Member

atifaziz commented Dec 20, 2018

I'd like to suggest adding an operator that will return items of a sequence at given sequence of zero-based positions or indicies. It will traverse the source items and an indicies sequence in order and return the items at the corresponding indicies. Duplicate indicies are allowed. For most optimal (unbuffered) operation, indicies would be given in increasing order.

The signature would be as follows:

public static IEnumerable<TResult>
    BindByIndex<T, TResult>(this IEnumerable<T> source, IEnumerable<int> indicies,
        int lookBackSize,
        Func<int, TResult> missingSelector,
        Func<T, int, TResult> resultSelector)

Simpler overloads could be added.

If indicies are in ascending order then lookBackSize is immaterial. If indicies are unordered then lookBackSize will be the size of an internal scrolling buffer of source items that enables looking back.

The missingSelector argument allows projection of a result item when an index is not found, such as when it is less than zero or greater or equal to the number of items in source.

Suppose the following:

var xs = new[] { "foo", "bar", "baz", "qux" };
var result = xs.BindByIndex(indicies, lookBackSize, i => "?", (s, _) => s);

The table below shows what result will contain for given inputs of indicies and lookBackSize:

# indicies lookBackSize result
1 [2, 3] 0 [baz, qux]
2 [1, 3] 0 [bar, qux]
3 [0, 2, 3] 0 [foo, baz, qux]
4 [0, 2, 3] 0 [foo, baz, qux]
5 [0, 1, 1] 0 [foo, bar, bar]
6 [3, 1, 2] 0 [qux, ?, ?]
7 [3, 1, 2] 4 [qux, bar, baz]
8 [3, 1, 2] 1 [qux, ?, baz]
9 [-1, 1, 2, 10] 0 [?, bar, baz, ?]

Things to note:

  • In example 5, indexing and returning duplicate items is supported.
  • In example 6, items for indicies 1 and 2 are missed because look-back is not possible after 3 (lookBackSize is 0) but example 7 fixes that.
  • In example 8, the item for index is missed since lookBackSize is 1 and so a look-back that far from index 3 is no longer possible.

Example

const string csv = @"
    # Generated using https://mockaroo.com/
    id,first_name,last_name,email,gender,ip_address
    1,Maggee,Hould,[email protected],Female,158.221.234.250
    2,Judas,Vedekhov,[email protected],Male,26.25.8.252
    3,Sharity,Desquesnes,[email protected],Female,27.224.140.230
    4,Della,Conant,[email protected],Female,229.74.161.94
    5,Sansone,Hardson,[email protected],Male,51.154.224.38
    6,Lloyd,Cromley,[email protected],Male,168.145.20.63
    7,Ty,Bamsey,[email protected],Male,129.204.46.174
    8,Hurlee,Dumphy,[email protected],Male,95.17.55.115
    9,Andy,Vickarman,[email protected],Male,10.159.118.60
    10,Jerad,Kerley,[email protected],Male,3.19.136.57
    ";
    
// Parse CSV into rows of fields with commented lines,
// those starting with pound or hash (#), removed.

var rows =
    from row in Regex.Split(csv.Trim(), "\r?\n")
    select row.Trim() into row
    where row.Length > 0 && row[0] != '#'
    select row.Trim().Split(',');

// Split header and data rows

var (header, data) =
    rows.Index()
        .Partition(e => e.Key == 0, 
                   (hr, dr) => (hr.Single().Value, from e in dr select e.Value));
                   
// Locate indicies of headers

var bindings = Enumerable.ToArray(
    from h in new[] { "id", "email", "last_name", "first_name", "foo" }
    select Array.FindIndex(header, sh => sh == h));
    
// Bind to data using 

var objects =
    from row in data
    select row.BindByIndex(bindings, bindings.Length, i => null, (f, _) => f)
              .Fold((id, email, ln, fn, foo) => new
              {
                  Id        = int.Parse(id),
                  FirstName = fn,
                  LastName  = ln,
                  Email     = new MailAddress(email),
                  Foo       = foo,
              });

foreach (var obj in objects)
    Console.WriteLine(obj.ToString());

Output

{ Id = 1, FirstName = Maggee, LastName = Hould, Email = [email protected], Foo =  }
{ Id = 2, FirstName = Judas, LastName = Vedekhov, Email = [email protected], Foo =  }
{ Id = 3, FirstName = Sharity, LastName = Desquesnes, Email = [email protected], Foo =  }
{ Id = 4, FirstName = Della, LastName = Conant, Email = [email protected], Foo =  }
{ Id = 5, FirstName = Sansone, LastName = Hardson, Email = [email protected], Foo =  }
{ Id = 6, FirstName = Lloyd, LastName = Cromley, Email = [email protected], Foo =  }
{ Id = 7, FirstName = Ty, LastName = Bamsey, Email = [email protected], Foo =  }
{ Id = 8, FirstName = Hurlee, LastName = Dumphy, Email = [email protected], Foo =  }
{ Id = 9, FirstName = Andy, LastName = Vickarman, Email = [email protected], Foo =  }
{ Id = 10, FirstName = Jerad, LastName = Kerley, Email = [email protected], Foo =  }
@atifaziz
Copy link
Member Author

atifaziz commented Dec 21, 2018

A prototype of the implementation is shared in PR #552 because it felt too large to share in a comment here.

@fsateler
Copy link
Member

For the ordered case, or when loopbackCount is large enough, this appears to be identical to:

var orderedIndices = indices.ToList(); // Or perhaps HashSet? It might be more efficient.
var result = source.Index().Where(pair => orderedIndices.Contains(pair.Key)).Select(pair => pair.Value);

Is this the same? If so, then this leaves only the case where loopbackCount is too small as the only new functionality. What would be a use case for this?

@atifaziz
Copy link
Member Author

What would be a use case for this?

Was the comprehensive example (binding to CSV headers) I posted in the opening comment not sufficient?

Is this the same?

Yes, except if you look at the implementation in #552, it's rather non-trivial. Also, it doesn't require hash look-ups in the ordered case.

@fsateler
Copy link
Member

What would be a use case for this?

Was the comprehensive example (binding to CSV headers) I posted in the opening comment not sufficient?

Well, I could use a Dictionary<string, int> to store the positions and then skip the new method:

// same code as in opening until the header and data are split

var headerPositions = (from key in new[] { "id", "email", "last_name", "first_name", "foo" }
                       let position = header.IndexOf(key) 
                       select new KeyValuePair<string, int>(key, position)
                      ).ToDictionary();

string GetValue(string[] row, int pos) {
    if (pos < 0) {
        return null;
    }
    return row[pos];
}

var objects =
    from row in data
    select new  {
      Id        = int.Parse(GetValue(row, headerPositions["id"])),
      FirstName = GetValue(row, headerPositions["first_name"]),
      LastName  = GetValue(row, headerPositions["last_name"]),
      Email     = new MailAddress(GetValue(row, headerPositions["email"])),
      Foo       = GetValue(row, headerPositions["foo"]),
    };  

I don't see the advantage of the new method over this approach.

Is this the same?

Yes, except if you look at the implementation in #552, it's rather non-trivial. Also, it doesn't require hash look-ups in the ordered case.

And why would requiring hash lookups be a problem?

@atifaziz
Copy link
Member Author

atifaziz commented Dec 22, 2018

So you want to repeat all those headers twice? Okay, I know you're going to say that we could put them in constant strings so it's not a huge deal, but still, you've got a lot of ceremony going on like GetValue that's embodied within this proposal.

And why would requiring hash lookups be a problem?

It wouldn't. The idea here was that the indicies as well as the source are themselves query comprehensions and neither may have been materialised in advance. I guess I should have known better than to post a naïve example. 😉 That said, I exactly expected (and thus appreciate) this kind of challenging and feedback so please keep charging on!

@viceroypenguin
Copy link
Contributor

Implemented and released in SuperLinq v4.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants