Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡ The CLI is slow on Windows #2313

Open
Koncopd opened this issue Jan 2, 2025 · 18 comments
Open

⚡ The CLI is slow on Windows #2313

Koncopd opened this issue Jan 2, 2025 · 18 comments
Assignees

Comments

@Koncopd
Copy link
Member

Koncopd commented Jan 2, 2025

Add a description

Measuring commands runtimes.

before integrating lnschema-core:
Measure-Command { lamin info | Out-Default }
TotalSeconds : 4.2319446
Measure-Command { lamin disconnect | Out-Default }
TotalSeconds : 4.3096474
Measure-Command { lamin connect laminlabs/lamindata | Out-Default }
TotalSeconds : 7.4822514
Measure-Command { lamin connect laminlabs/cellxgene | Out-Default }
TotalSeconds : 8.4347432

after:
Measure-Command { lamin info | Out-Default }
TotalSeconds : 4.3240417
Measure-Command { lamin disconnect | Out-Default }
TotalSeconds : 6.2572492
Measure-Command { lamin connect laminlabs/lamindata | Out-Default }
TotalSeconds : 11.60474
Measure-Command { lamin connect laminlabs/cellxgene | Out-Default }
just hangs

@Koncopd
Copy link
Member Author

Koncopd commented Jan 2, 2025

@falexwolf there is probably a bug in prompting the migration:
Measure-Command { lamin connect laminlabs/cellxgene | Out-Default } just hangs, lamindb_setup.connect("laminlabs/cellxgene") ask for me input, that is probably not propagated in the CLI and this is why it hangs.

added:
ok, it was due to Measure-Command, just lamin connect laminlabs/cellxgene shows the prompt, although in a very confusing form:
image

@Koncopd
Copy link
Member Author

Koncopd commented Jan 2, 2025

@falexwolf as i was afraid, i see a significant degradation of performance. In my opinion this should be checked before merging a PR that is suspected of performance penalty and we should not merge it until we figure out how to improve.

Also could you check on mac please?

@Koncopd Koncopd changed the title lamin CLI is slow on windows lamin CLI is slow on windows, now even slower Jan 2, 2025
@falexwolf
Copy link
Member

Got it.

laminlabs/cellxgene wasn't migrated; so you shouldn't use this:

image

Testing the other ones. I didn't observe any degradation.

@Koncopd
Copy link
Member Author

Koncopd commented Jan 2, 2025

on linux (wsl2) before:
time lamin info
real 0m1.222s
time lamin disconnect
real 0m1.851s
time lamin connect laminlabs/lamindata
real 0m2.223s
time lamin connect laminlabs/cellxgene
real 0m2.325s

after:
time lamin connect laminlabs/lamindata
real 0m4.303s

So even here i see a slight degradation.

@falexwolf
Copy link
Member

So, for instance,

% time lamin connect laminlabs/lamindata
lamin connect laminlabs/lamindata  0.92s user 0.16s system 53% cpu 2.021 total

I'm almost certain this is faster than before. I can make a new environment with an old version of lamindb to check.

Are you sure that you've called this multiple times? The first time there is an additional request to check whether the database needs to be migrated.

@Koncopd
Copy link
Member Author

Koncopd commented Jan 2, 2025

Yes, multiple times.

@falexwolf
Copy link
Member

Why are your numbers so much slower than mine? 🤔

@Koncopd
Copy link
Member Author

Koncopd commented Jan 2, 2025

wsl2 is not real Linux, so maybe this also matters. On windows proper it might be caused by carbon black (Helmholtz monitoring software). Anyways the performance is significantly worse for me now.

And i really think this is important.

@falexwolf
Copy link
Member

falexwolf commented Jan 2, 2025

Ok, obviously I can't benchmark with lamindata anymore. I made an env with lamindb 0.77.3 and there it's much slower:

(ln0773) falexwolf@mbpalex repos % time lamin connect laminlabs/lamindata
! the database (0.78a1) is ahead of your installed lamindb package (0.77.3)
→ please update lamindb: pip install "lamindb>=0.78,<0.79"
! schema module 'ourprojects' is not installed → no access to its labels & registries (resolve via `pip install ourprojects`)
! schema module 'wetlab' is not installed → no access to its labels & registries (resolve via `pip install wetlab`)
! schema module 'bionty' is not installed → no access to its labels & registries (resolve via `pip install bionty`)
→ connected lamindb: laminlabs/lamindata
lamin connect laminlabs/lamindata  2.19s user 0.38s system 16% cpu 15.706 total

So, I'm taking another instance on us-east-1; one with only 2 schema modules:

(ln0773) falexwolf@mbpalex repos % time lamin connect account/instance
→ connected lamindb: account/instance
lamin connect account/instance  0.73s user 0.10s system 49% cpu 1.694 total
(ln0773) falexwolf@mbpalex repos % time lamin connect account/instance
→ connected lamindb: account/instance
lamin connect account/instance  0.72s user 0.10s system 48% cpu 1.705 total
(ln0773) falexwolf@mbpalex repos % time lamin connect account/instance
→ connected lamindb: account/instance
lamin connect account/instance  0.72s user 0.09s system 49% cpu 1.651 total

Here is a comparison with laminlabs/lamindata; one with 4 schema modules:

(py312) falexwolf@mbpalex repos % time lamin connect laminlabs/lamindata
lamin connect laminlabs/lamindata  0.88s user 0.16s system 54% cpu 1.890 total
(py312) falexwolf@mbpalex repos % time lamin connect laminlabs/lamindata
lamin connect laminlabs/lamindata  0.89s user 0.16s system 52% cpu 1.996 total
(py312) falexwolf@mbpalex repos % time lamin connect laminlabs/lamindata
lamin connect laminlabs/lamindata  0.89s user 0.16s system 63% cpu 1.649 total

It's just a tiny bit slower, but this is likely due to the additional schema module. 🤔

@falexwolf
Copy link
Member

The performance is very important; that's why I've benchmarked and optimized it many times. The last time I invested in it is although quite a while back -- maybe about a year; I can dig out the threads that document the data and the various things I did to get things to be OK-fast.

What you experience is abysmal. So, we need to invest into speeding this up again.

Just: the whole topic is independent of whether we maintain the schema in a separate package or not. Think about it the other way: you'll not create a new Python package to speed up your API. You need much better reasons to break out logic into another package. These reasons don't exist for lnschema-core.

@Koncopd
Copy link
Member Author

Koncopd commented Jan 2, 2025

Just: the whole topic is independent of whether we maintain the schema in a separate package or not.

I actually don't think so. Due to how python loads modules, it seems to me it just not possible to disentangle loading just models from everything else now. Even tricks with lazy loading won't help as we need to execute the monkey patching because we don't know anymore when we need the full functionality and when only basic modules.

@falexwolf
Copy link
Member

falexwolf commented Jan 2, 2025

I actually don't think so. Due to how python loads modules [...]

In any Python package, you can dictate which submodules a package loads during import. Hence, whether these submodules live in another Python package or not doesn't matter; does it?

it just not possible to disentangle loading just models from everything else now

And I think that's a pro, not a con: I think it shouldn't on the path to an integrated, cohesive piece of software. A lamindb Record is not a django Model and while it likely will always be inspired by it; I hope we ultimately eliminate Django in one form or the other (in a non-disruptive way).

execute the monkey patching

The monkey patching will be gone very soon and replaced with inheritance. The step after this would be to eliminate Django.

@falexwolf falexwolf changed the title lamin CLI is slow on windows, now even slower ⚡ The CLI is slow on Windows Jan 2, 2025
@Koncopd
Copy link
Member Author

Koncopd commented Jan 2, 2025

In any Python package, you can dictate which submodules a package loads during import. Hence, whether these submodules live in another Python package or not doesn't matter; does it?

In principle yes, but now we can't really separate submodules properly due to monkey patching. Also i don't think it is possible to move models.py to base due to django conventions, right?

@falexwolf
Copy link
Member

falexwolf commented Jan 2, 2025

In principle yes, but now we can't really separate submodules properly due to monkey patching.

Yes, which is a big architectural flaw of the current software architecture. I'm looking forward to when this is gone.

Also i don't think it is possible to move models.py to base due to django conventions, right?

Right now that would be very hard, yes. But in the future we can probably make something like this happen.

@Koncopd
Copy link
Member Author

Koncopd commented Jan 2, 2025

A partial fix is here laminlabs/lamin-cli#101

@falexwolf
Copy link
Member

Thank you!

@Koncopd
Copy link
Member Author

Koncopd commented Jan 2, 2025

I think the degradation of lamin connect was entirely caused by the addition of the line to reload lamindb and not really from lnschema-core integration.

@Koncopd
Copy link
Member Author

Koncopd commented Jan 2, 2025

Now i am checking also if it is still possible to load the models without loading the rest of lamindb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants