Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement StrList #110

Open
tushushu opened this issue Jan 27, 2022 · 2 comments
Open

Implement StrList #110

tushushu opened this issue Jan 27, 2022 · 2 comments
Labels
benchmark To improve or test the benchmark performance. design enhancement New feature or request

Comments

@tushushu
Copy link
Collaborator

tushushu commented Jan 27, 2022

See the discussion here. The Rc<str> may be the solution.
When there are many duplicated strings in the Vector, the &str type is much economic than String type, because the same strings can share one heap object.
Look at the example using String below:

fn main() {
    let v: Vec<String> = vec!["foo".to_string(), "foo".to_string(), "bar".to_string(),];
    let a = v[0].as_ptr() as * const() as usize;
    let b = v[1].as_ptr() as * const() as usize;
    let c:f32 = 1.8;
    let _d = &c as * const _ as usize;
    println!("v is {:?}", v);
    println!("a is {}, b is {}", a, b);
    println!("_d is {}", _d);
}

The output is:

v is ["foo", "foo", "bar"]
a is 94313169717792, b is 94313169717824
_d is 140721848803820

The addresses of a and b are different, which implies that they are two different objects. And the address of _d is much smaller than a or b, which implies that _d is using stack memory and a and b are using heap memory.

And look at the &str example below:

fn main() {
    let v: Vec<&str> = vec!["foo", "foo", "bar",];
    let a = v[0].as_ptr() as * const() as usize;
    let b = v[1].as_ptr() as * const() as usize;
    let c:f32 = 1.8;
    let _d = &c as * const _ as usize;
    println!("v is {:?}", v);
    println!("a is {}, b is {}", a, b);
    println!("_d is {}", _d);
}

And the output is as below:

v is ["foo", "foo", "bar"]
a is 94078134587479, b is 94078134587479
_d is 140729855478556

So they are actually pointing to the same object.

@tushushu tushushu added enhancement New feature or request benchmark To improve or test the benchmark performance. design labels Jan 27, 2022
@yingmanwumen
Copy link
Contributor

Does the implementation look like :

use std::{collections::HashSet, fmt::Display, rc::Rc};

struct StrList {
    value: Vec<Rc<str>>,
    _map: HashSet<Rc<str>>,
}

impl StrList {
    pub fn new() -> Self {
        Self {
            value: Vec::new(),
            _map: HashSet::new(),
        }
    }

    pub fn append(&mut self, item: String) {
        if let Some(e) = self._map.get(item.as_str()) {
            self.value.push(e.clone());
        } else {
            let rc: Rc<str> = Rc::from(item.as_str());
            self.value.push(rc.clone());
            self._map.insert(rc);
        }
    }

    pub fn display_mem(&self) {
        let v: Vec<_> = self.value.iter().map(|x| x.as_ptr() as *const _).collect();
        println!("{v:?}");
    }
}

impl Display for StrList {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        writeln!(f, "{:?}", self.value)
    }
}

fn std_read_line() -> String {
    let mut input = String::new();
    std::io::stdin().read_line(&mut input).unwrap();
    input.trim().into()
}

pub fn main() {
    let mut sl = StrList::new();
    sl.append(std_read_line());
    sl.append(std_read_line());
    sl.append(std_read_line());
    sl.display_mem();
    println!("{sl}");
}

and running:

$ ./main
Hello
is
Hello
[0x5646c831fac0, 0x5646c831fae0, 0x5646c831fac0]
["Hello", "is", "Hello"]

@tushushu
Copy link
Collaborator Author

@yingmanwumen Yes, that would be a better solution. My origin idea is something like this:

struct StrList {
    value: Vec<i32>,
    _map: HashMap<String, i32>,
}

So for ['foo', 'foo', 'bar', 'baz'], the vec is [0, 1, 1, 2] and the _map is {'foo': 0, 'bar': 1, 'baz': 2}. It seems that your solution is more elegant and needs less codes to maintain the StrList.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark To improve or test the benchmark performance. design enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants