-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements bulk_create for create_batch if available #925
base: master
Are you sure you want to change the base?
Conversation
@francoisfreitag bump |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refs #107. The main reason this feature was not implemented seems to be the lack of motivated developers.
Django bulk_create
has several documented limitations, but none seem deal breakers.
The opt-in option is probably the best way to achieve backward compatibility and incremental adoption. Also gives an escape hatch for bulk_create
limitations.
I see no reason to reject the PR, provided the code gets to a mature state and we can get CI on it and good test coverage. I’m actually pretty excited about it.
The idea of collecting "leaf" models and working up the insert chain should work, though we need to be careful of post generation hooks. Perhaps they should be documented as incompatible with bulk_create()
, at least as a first step.
I’m not eager to be introducing more DB-specific code in factory_boy, but that seem unavoidable with this feature and can rely on Django attributes to abstract some of the difficulties.
This patch inspects Django models to describe the relationship between objects, where factory_boy is mostly declarative (SubFactory
, RelatedFactory
and co). I don’t foresee conflicts arising from that yet, and using Django as the source of truth is probably safe.
@francoisfreitag What do you think of separating some of these files into a different PR to make it easier to review later. What do you think? |
Just to confirm, you’re suggesting to open a PR to run the factory boy Django tests against PostgreSQL ? |
@francoisfreitag PR #931 is ready for review. Thanks! |
Hi Javier,
Thanks for splitting it up. I won't be able to take a look at your work
before next week. I'll try to review it quickly then.
May 24, 2022 23:22:32 Javier Buzzi ***@***.***>:
… @francoisfreitag[https://github.com/francoisfreitag] PR
#931[#931] is ready for
review. Thanks!
—
Reply to this email directly, view it on
GitHub[#925 (comment)],
or
unsubscribe[https://github.com/notifications/unsubscribe-auth/AAVBMY2FHNUJLGMVXWD7ZF3VLVBOVANCNFSM5VJKD2HQ].
You are receiving this because you were mentioned.[Tracking
image][https://github.com/notifications/beacon/AAVBMY6ZOKNK67K3YAOAYOTVLVBOVA5CNFSM5VJKD2H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOIO6LM2A.gif]
|
@francoisfreitag When you can, can you start looking at this, I'm slowly cleaning up the tests. My biggest hurtle is this right here -- I find it nasty / dirty. Let me know your thoughts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add #s -- lol i thought i was writing a commit message. The github code integration confused me :/ now i can't delete this
@thedrow Thank you for the review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven’t had much time to look at this lately (and probably won’t until the end of summer), but here’s an idea to address the TODO you asked for feedback on (collecting the built instances). The idea is to have the owner of the list allocate it, and have the step builder cooperate and fill that list.
The suggestion also starts taking care of the incompatibility between post generation hooks and this new feature, which will probably be necessary to land it.
diff --git a/factory/builder.py b/factory/builder.py
index ed39ebc..464a680 100644
--- a/factory/builder.py
+++ b/factory/builder.py
@@ -208,14 +208,18 @@ class BuildStep:
parent_chain = ()
return (self.stub,) + parent_chain
- def recurse(self, factory, declarations, force_sequence=None):
+ def recurse(self, factory, declarations, force_sequence=None, collect_instances=None):
from . import base
if not issubclass(factory, base.BaseFactory):
raise errors.AssociatedClassError(
"%r: Attempting to recursing into a non-factory object %r"
% (self, factory))
builder = self.builder.recurse(factory._meta, declarations)
- return builder.build(parent_step=self, force_sequence=force_sequence)
+ return builder.build(
+ parent_step=self,
+ force_sequence=force_sequence,
+ collect_instances=collect_instances,
+ )
def __repr__(self):
return f"<BuildStep for {self.builder!r}>"
@@ -235,9 +239,8 @@ class StepBuilder:
self.strategy = strategy
self.extras = extras
self.force_init_sequence = extras.pop('__sequence', None)
- self.created_instances = []
- def build(self, parent_step=None, force_sequence=None):
+ def build(self, parent_step=None, force_sequence=None, collect_instances=None):
"""Build a factory instance."""
# TODO: Handle "batch build" natively
pre, post = parse_declarations(
@@ -246,13 +249,6 @@ class StepBuilder:
base_post=self.factory_meta.post_declarations,
)
- # TODO: come up with a better solution
- if parent_step:
- if hasattr(parent_step, 'builder'):
- self.created_instances = parent_step.builder.created_instances
- else:
- self.created_instances = parent_step.created_instances
-
if force_sequence is not None:
sequence = force_sequence
elif self.force_init_sequence is not None:
@@ -275,21 +271,22 @@ class StepBuilder:
kwargs=kwargs,
)
- postgen_results = {}
- for declaration_name in post.sorted():
- declaration = post[declaration_name]
- postgen_results[declaration_name] = declaration.declaration.evaluate_post(
+ if collect_instances is None:
+ postgen_results = {}
+ for declaration_name in post.sorted():
+ declaration = post[declaration_name]
+ postgen_results[declaration_name] = declaration.declaration.evaluate_post(
+ instance=instance,
+ step=step,
+ overrides=declaration.context,
+ )
+ self.factory_meta.use_postgeneration_results(
instance=instance,
step=step,
- overrides=declaration.context,
+ results=postgen_results,
)
- self.factory_meta.use_postgeneration_results(
- instance=instance,
- step=step,
- results=postgen_results,
- )
-
- self.created_instances.append(instance)
+ else:
+ collect_instances.append(instance)
return instance
diff --git a/factory/django.py b/factory/django.py
index 25d6966..541e19b 100644
--- a/factory/django.py
+++ b/factory/django.py
@@ -246,13 +246,12 @@ class DjangoModelFactory(base.Factory):
"is either not set or False." % dict(f=cls.__name__))
models_to_return = []
- instances_created = []
+ instances = []
for _ in range(size):
step = builder.StepBuilder(cls._meta, kwargs, enums.BUILD_STRATEGY)
- models_to_return.append(step.build())
- instances_created.extend(step.created_instances)
+ models_to_return.append(step.build(collect_instances=instances))
- for model_cls, objs in dependency_insert_order(instances_created):
+ for model_cls, objs in dependency_insert_order(instances):
manager = cls._get_manager(model_cls)
cls._refresh_database_pks(model_cls, objs)
@francoisfreitag Do you need any help QA'ing or testing this? |
Hi Tony, Feel free to share any bugs or weird behavior that you find. Because tests aren’t passing, there will certainly be changes to the implementation. When I get some free time, I’ll give it a deeper look. In the meantime, solving the test failures is a very helpful task, to both understand the code and help others understand it. |
Bikeshed: I won't count on it, but it's a long running PR and the commits have built up. It's sometimes easier to read if commits are squashed and it's rebased against master. I'm not sure if there's a policy on that, or if it's more harm than good, though. Either way, I backed up 3666f10 on my fork at pr-925-3666f10 in the event it's ever necessary to look back |
@tony everything went awry after 8ec88e9 , going from 4369b72 makes all the tests pass. I'm going to work on this a little today as i have a little time. @francoisfreitag .. what a mess. :/ |
@francoisfreitag the code you gave me to avoid collecting all the created objects instead of my "half-baked" instance level solution does not work. I also don't see it feasible to send the What else can we do? Ps a good test that demonstrates the issue is Stack of how far `created_instances` would need to go (with room to grow)
|
@francoisfreitag bump |
I’ll be busy until the end of the month / year. I’ll do my best to carve out some time for this issue, but the ask if not for a few minutes, it’ll take much longer to find a correct integration of the collected objects in the existing machinery. |
Is there going to be a rebase / squash? |
@tony does it matter at the moment? Rather keep the history at this time in case I need to bring something back. I'll squash as this comes closer to finish |
@kingbuzzman No worries! Whatever works best for you. To minimize noise, I'll stay out of the way unless you / another maintainer requests a review (earlier mention). (Fine to minimize / hide this comment) |
@francoisfreitag any input? |
I don’t have much time to dedicate to open-source these days, I cannot spend time on this PR for now. The idea sure has merits, but it needs efforts to get the approach right. My understanding is that #925 (comment) needs solving, and without spending some time figuring out a better idea, I won’t be able to make suggestions. |
@kingbuzzman Thanks for keeping this branch alive! |
I honestly don't understand why it passed 3 |
What is the progress on this? I appreciate the introduction of the true bulk create. |
I am also keen for bulk create implementation to be introduced - glad to help |
It looks like something went wrong with the PR and there are lot of unrelated stuff in it. A rebase would be nice but at this point, recreating it will probably be cleaner. (Merging from main branch into feature branches generally ends like similar results to this one.) I'd also recommend that instead of using the main branch in your fork, creating a PR from a feature branch to upstream main gives better/cleaner results. |
I'll rebase it, i dont want to lose the history. I need to fix the tests when i get a chance. I've been really busy with the day job 😇 |
What this PR is trying to accomplish:
create_batch(10)
) models that get created constantly throughout our code base.bulk_create
, we would increase performance, and most importantly; it would save a lot of time.Factory._meta.use_bulk_create
.Whats left to do:
Fix a weird error in examples.NM all green 👍Depends on: