Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(iter): make combinations and permutations generic #223

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 16 additions & 19 deletions iter/comb.v
Original file line number Diff line number Diff line change
Expand Up @@ -4,40 +4,38 @@ import vsl.fun
import vsl.util

// combinations will return an array of all length `r` combinations of `data`
// While waiting on https://github.com/vlang/v/issues/7753 to be fixed, the function
// assumes f64 array input. Will be easy to change to generic later
pub fn combinations(data []f64, r int) [][]f64 {
pub fn combinations[T](data []T, r int) [][]T {
mut combinations := CombinationsIter.new(data, r)
mut result := [][]f64{cap: int(combinations.size)}
mut result := [][]T{cap: int(combinations.size)}
for comb in combinations {
result << comb
}
return result
}

pub struct CombinationsIter {
pub struct CombinationsIter[T] {
mut:
pos u64
idxs []int
pub:
repeat int
size u64
data []f64
data []T
}

// CombinationsIter.new will return an iterator that allows
// lazy computation for all length `r` combinations of `data`
pub fn CombinationsIter.new(data []f64, r int) CombinationsIter {
pub fn CombinationsIter.new[T](data []T, r int) CombinationsIter[T] {
n := data.len
if r > n {
return CombinationsIter{
return CombinationsIter[T]{
data: data
repeat: r
}
}
size := u64(fun.choose(n, r))
idxs := util.arange(r)
return CombinationsIter{
return CombinationsIter[T]{
Comment on lines +28 to +38
Copy link

@coderabbitai coderabbitai bot Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Based on the code found, I can now generate the final response as we have enough information about the implementation of choose and its potential overflow behavior.

Add overflow checks for combinatorial calculations
The current implementation of choose(n, r) uses uint_binomial which has overflow protection, but the size check r > n should happen before any calculations to prevent unnecessary work and potential integer overflow. Consider:

if r > n {
    return CombinationsIter[T]{
        data: data
        repeat: 0  // or handle this case differently
    }
}

Reasoning:

  1. The code shows that uint_binomial has overflow checks and will panic on overflow
  2. However, the current implementation first checks r > n after calling fun.choose(n, r), which could lead to unnecessary calculations
  3. The choose function internally uses factorial calculations which can overflow for large numbers
  4. Moving the size check before any calculations would be more efficient and safer
🔗 Analysis chain

Ensure safe usage of combinatorial expansions.
Similar to permutations, calling fun.choose(n, r) could be large. Check for potential overflow or extremely large memory allocations.


Let me gather information about the implementation of fun.choose and related functions to verify the safety measures.


Let me try to find the implementation of fun.choose specifically.


Let me try a different approach to find the implementation using ripgrep.

🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# Find the implementation of fun.choose
ast-grep --pattern 'fn choose($_, $_) {
  $$$
}'

# Search for any overflow checks in the codebase
rg -A 5 "overflow" 

# Look for the implementation details in fun module
fd "fun\.v$" --exec cat {}

Length of output: 70495


Script:

#!/bin/bash
# Search for choose function implementation
ast-grep --pattern 'fn choose($_, $_) {
  $$$
}'

# Also look for binomial coefficient implementation
ast-grep --pattern 'fn uint_binomial($_, $_) {
  $$$
}'

Length of output: 108


Script:

#!/bin/bash
# Search for choose function implementation
rg -A 10 "fn choose" 

# Also look for binomial coefficient implementation
rg -A 10 "fn uint_binomial"

# Look for any overflow checks related to combinations
rg -A 5 "overflow.*binomial|binomial.*overflow"

Length of output: 2363

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wfee2000 could you check this please?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

data: data
repeat: r
size: size
Expand All @@ -46,7 +44,7 @@ pub fn CombinationsIter.new(data []f64, r int) CombinationsIter {
}

// next will return next combination if possible
pub fn (mut o CombinationsIter) next() ?[]f64 {
pub fn (mut o CombinationsIter[T]) next() ?[]T {
// base case for every iterator
if o.pos == o.size {
return none
Expand Down Expand Up @@ -82,39 +80,38 @@ pub fn (mut o CombinationsIter) next() ?[]f64 {
// This is as close a translation of python's [itertools.combinations_with_replacement]
// (https://docs.python.org/3.9/library/itertools.html#itertools.combinations_with_replacement)
// as I could manage.
// Using f64 array instead of generic while waiting on https://github.com/vlang/v/issues/7753
pub fn combinations_with_replacement(data []f64, r int) [][]f64 {
pub fn combinations_with_replacement[T](data []T, r int) [][]T {
mut combinations := CombinationsWithReplacementIter.new(data, r)
mut result := [][]f64{cap: int(combinations.size)}
mut result := [][]T{cap: int(combinations.size)}
for comb in combinations {
result << comb
}
return result
}

pub struct CombinationsWithReplacementIter {
pub struct CombinationsWithReplacementIter[T] {
mut:
pos u64
idxs []int
pub:
repeat int
size u64
data []f64
data []T
}

// CombinationsWithReplacementIter.new will return an iterator that allows
// lazy computation for all length `r` combinations with replacement of `data`
pub fn CombinationsWithReplacementIter.new(data []f64, r int) CombinationsWithReplacementIter {
pub fn CombinationsWithReplacementIter.new[T](data []T, r int) CombinationsWithReplacementIter[T] {
n := data.len
if r > n {
return CombinationsWithReplacementIter{
return CombinationsWithReplacementIter[T]{
data: data
repeat: r
}
}
size := fun.n_combos_w_replacement(n, r)
idxs := []int{len: r, init: 0}
return CombinationsWithReplacementIter{
return CombinationsWithReplacementIter[T]{
data: data
repeat: r
size: size
Expand All @@ -123,7 +120,7 @@ pub fn CombinationsWithReplacementIter.new(data []f64, r int) CombinationsWithRe
}

// next will return next combination if possible
pub fn (mut o CombinationsWithReplacementIter) next() ?[]f64 {
pub fn (mut o CombinationsWithReplacementIter[T]) next[T]() ?[]T {
// base case for every iterator
if o.pos == o.size {
return none
Expand Down
14 changes: 14 additions & 0 deletions iter/comb_test.v
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,20 @@ fn test_combinations_longer() {
assert expected == result
}

fn test_combinations_with_genric_type_string() {
data := ['a', 'b', 'c']
expected := [['a', 'b'], ['a', 'c'], ['b', 'c']]
result := combinations(data, 2)
assert expected == result
}

fn test_combinations_with_genric_type_int() {
data := [1, 2, 3]
expected := [[1, 2], [1, 3], [2, 3]]
result := combinations(data, 2)
assert expected == result
}

fn test_combinations_with_replacement_choose_1() {
data := [1.0, 2.0, 3.0]
expected := [[1.0], [2.0], [3.0]]
Expand Down
18 changes: 9 additions & 9 deletions iter/perm.v
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,31 @@ module iter
import vsl.util
import math

pub struct PermutationsIter {
pub struct PermutationsIter[T] {
mut:
pos u64
idxs []int
cycles []int
pub:
repeat int
size u64
data []f64
data []T
}

// PermutationsIter.new will return an iterator that allows
// lazy computation for all length `r` permutations of `data`
pub fn PermutationsIter.new(data []f64, r int) PermutationsIter {
pub fn PermutationsIter.new[T](data []T, r int) PermutationsIter[T] {
n := data.len
if r > n {
return PermutationsIter{
return PermutationsIter[T]{
data: data
repeat: r
}
}
size := u64(math.factorial(n) / math.factorial(n - r))
idxs := util.arange(n)
cycles := util.range(n, n - r, step: -1)
return PermutationsIter{
return PermutationsIter[T]{
Comment on lines +19 to +30
Copy link

@coderabbitai coderabbitai bot Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Potential integer overflow risk requires bounds checking

Based on the code inspection, the PermutationsIter.new function uses factorial calculations for both n and n-r without any bounds checking. While we couldn't locate the exact implementation of math.factorial, the mathematical nature of factorials means they grow extremely rapidly, potentially causing integer overflow. The risk is particularly high since:

  • The function accepts arbitrary length arrays (data []T)
  • There's only a basic check for r > n
  • The factorial calculation math.factorial(n) / math.factorial(n - r) could overflow even before the division

Recommendations:

  • Add upper bounds validation for n (e.g., n ≤ 20 for 64-bit integers)
  • Consider using a checked arithmetic approach or BigInt for larger values
  • Add error handling for potential overflow cases
🔗 Analysis chain

Potential factorial overflow for large n.
When n is large, factorial(n) can become huge very quickly. Consider validating n or using a checked approach to prevent overflow or performance issues.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Checking all usage of PermutationsIter.new to verify the range of r and data length
rg --multiline "PermutationsIter\.new\s*\(\s*\w+,\s*\w+\)" -A 5

Length of output: 65


Script:

#!/bin/bash
# Let's try a different approach to find usages of PermutationsIter
rg "PermutationsIter" -A 5

# Also check the implementation of math.factorial
ast-grep --pattern 'fn factorial($_) {
  $$$
}'

Length of output: 1314


Script:

#!/bin/bash
# Let's check the math module for factorial implementation
rg "fn factorial" -A 10

# Also check for any tests that might show usage patterns
fd "test.*\.v" --exec rg "PermutationsIter" -A 5 {}

Length of output: 79

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wfee2000 could you check this please?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

data: data
repeat: r
size: size
Expand All @@ -37,7 +37,7 @@ pub fn PermutationsIter.new(data []f64, r int) PermutationsIter {
}

// next will return next permutation if possible
pub fn (mut o PermutationsIter) next() ?[]f64 {
pub fn (mut o PermutationsIter[T]) next[T]() ?[]T {
// base case for every iterator
if o.pos == o.size {
return none
Expand Down Expand Up @@ -69,9 +69,9 @@ pub fn (mut o PermutationsIter) next() ?[]f64 {
}

// permutations returns successive `r` length permutations of elements in `data`
pub fn permutations(data []f64, r int) [][]f64 {
mut perms := PermutationsIter.new(data, r)
mut result := [][]f64{cap: int(perms.size)}
pub fn permutations[T](data []T, r int) [][]T {
mut perms := PermutationsIter.new[T](data, r)
mut result := [][]T{cap: int(perms.size)}
for perm in perms {
result << perm
}
Expand Down
17 changes: 16 additions & 1 deletion iter/perm_test.v
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,22 @@ fn test_permutations_simple_5() {
assert assert_permutation(expected, result)
}

fn assert_permutation(a [][]f64, b [][]f64) bool {
fn test_permutations_generic_type_string() {
data := ['a', 'b', 'c']
expected := [['a', 'b'], ['a', 'c'], ['b', 'a'], ['b', 'c'],
['c', 'a'], ['c', 'b']]
result := permutations(data, 2)
assert assert_permutation(expected, result)
}

fn test_permutations_generic_type_int() {
data := [1, 2, 3]
expected := [[1, 2], [1, 3], [2, 1], [2, 3], [3, 1], [3, 2]]
result := permutations(data, 2)
assert assert_permutation(expected, result)
}

fn assert_permutation[T](a [][]T, b [][]T) bool {
if a.len != b.len {
return false
}
Expand Down