Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language detection fails #48

Open
jakob-o opened this issue Nov 6, 2015 · 15 comments
Open

Language detection fails #48

jakob-o opened this issue Nov 6, 2015 · 15 comments
Assignees

Comments

@jakob-o
Copy link
Contributor

jakob-o commented Nov 6, 2015

Hi,

I am facing an issue when creating blog posts / publishing pages from the page listing in another language. The basic problem is, that get_language() returns the wrong value e.g. for this link
http://localhost:8000/en/admin/cms/page/13/de/publish/?redirect_language=en&redirect_page_id=5. The same applies for blog posts.

To clarify / how to reproduce:

  • setup the cms and with two languages
  • create a page in the primary language
  • create a translation via the page listing (clicking the circle button) and stay in the primary language
  • publish the translation via the page listing (hover over the circle button and click "publish" in the tooltip)

Expected:

  • an added entry in the index for the page in the secondary language

Actual:

  • an added entry in the primary language

I tried to workaround that by inspecting the instance in a custom implementation of the router, unfortunately there is not unified way to do so. The Title model for instance has a language attribute whilst an Article has a language_code. The only solution for now that comes to my mind would be to store all entries in one index and query with a language filter...

Any recommendations / help regarding this are greatly appreciated.

Regards

Jakob

@czpython
Copy link
Contributor

czpython commented Nov 6, 2015

Hello Jakob,

Just to make sure I understand, in your example above you expect the de title to be indexed but instead the en one is the one that gets indexed? Also this only happens only on actions like publish or save() meaning only through the haystack signal handlers and not the manual rebuilding of the index?

@jakob-o
Copy link
Contributor Author

jakob-o commented Nov 6, 2015

EDIT: I expect the de title to be indexed on the de haystack connection, instead it is written to the en connection. The contents are correct however.

EDIT 2: This only happens when publishing. I should read your post, before answering...

Hi,

this is exactly the case. The same applies for a blog post when going to the blog page in the primary langue, lets say en, you go to Add new article... via the menu, then switch to the secondary language tab, e.g. de and then create the blog post. The language router fails to detect that the blog post was created in German and not in English. I think there actually is no easy solution to this, at least none I could think of. Maybe one could leverage the get_language(object) method on the Index somehow, but I am not sure if the router can / should access the index classes...

Thanks for your quick reply

Jakob

czpython added a commit that referenced this issue Nov 6, 2015
@czpython
Copy link
Contributor

czpython commented Nov 6, 2015

So yes this is a problem I'm quite familiar with.

And yes the only solution to this is to leverage get_language() method because it's the only method that knows how to fetch the "active" language for an object.

In the past, I've defined the should_update on my index base class and let it return False to prevent "realtime indexing" as a workaround for this issue.

As far as a real solution comes, I've pushed a potential fix to the fixes/let-object-decide-alias branch.
It's something I've experimented with in the past but I won't push it to master today since I'd like to review it carefully and make sure it has no crazy side-effects :).

Please have a look at the branch and let me know if it works for you, I did a quick test locally and it worked.

@jakob-o
Copy link
Contributor Author

jakob-o commented Nov 6, 2015

Thank you, I will check it out!

@jakob-o
Copy link
Contributor Author

jakob-o commented Nov 6, 2015

I haven't tested the code, yet but from the looks this is what I imagined. However the line

unified_index = connections[alias].get_unified_index()

https://github.com/aldryn/aldryn-search/blob/fixes/let-object-decide-alias/aldryn_search/router.py#L32 looks suspicious to me, since we are trying to find the index of a potentially wrong connection (the alias might differ from the instance language, which is the root cause of this issue). I am no expert on haystack though and am not sure if this could lead to errors. Is there a case where one would to exclude an index in one language but not in another? Maybe a more failsafe way would be iterating over all available connections / aliases if not found with the current alias. I have no clue about the performance impact of such an implementation and whether caching the index would be a viable option or not. It's not pretty either...

What dou you think?

Regards

Jakob

@czpython
Copy link
Contributor

czpython commented Nov 6, 2015

Your suspicion is accurate, haystack allows you to exclude indexes from certain connections and so it is possible that we fail to get an index for a connection but in my opinion it's better than getting the language wrong every time. I think a better solution would require a change in the way haystack triggers the object indexing (signal). Iterating over all connections is not a bad idea but I'm not really a fan of brute forcing it but it might be the way to go until a proper fix can be integrated in Haystack.

I agree that the code is not very pretty because it exposes an internal logic in Haystack, but this is actually found everywhere there :)

Regarding a potential for performance impact, it really depends. Currently there's none.

Accessing a connection is just accessing a dictionary and never touching the search engine, after that we call the get_language() method which simply returns a language for the given instance.
This method is defined by both the TitleIndex and ArticleIndex, neither of the two make any extra queries or requests to get the language from an instance.
That said, if you add some queries in the get_language() method then yes it would result in a performance impact because it would be called once for the router and another one for the indexing.

@jakob-o
Copy link
Contributor Author

jakob-o commented Nov 30, 2015

FYI, works for my use cases so far.

Thank you very much

Jakob

@czpython czpython self-assigned this Nov 30, 2015
@jakob-o
Copy link
Contributor Author

jakob-o commented Jan 14, 2016

Hey @czpython,

what happened to the branch if I may politely inquire? Is there any way this will make it into a release? Realtime indexing is seriously broken without this fix...

Thank you very much

Jakob

@czpython
Copy link
Contributor

Hey @jakob-o,

I meant to post an update here.. I closed the pr #49 (comment) and posted an explanation as to why I decided it's not the way to go.

I have a fix for this issue on cms pages which I will push sometime this week.

As far as other addons like Aldryn Newsblog go, I'll have to think of another solution 😢

@jakob-o
Copy link
Contributor Author

jakob-o commented Jan 14, 2016

EDIT: OK forget that we could use the function that I mentioned, but would there be a possibility to detect the publishing language another way?

Hey @czpython,

how about a middleware that stores the result from get_language_from_request(request, check_path=True) in a thread local and use this in the language router if available?

Regards

Jakob

@jakob-o
Copy link
Contributor Author

jakob-o commented Jan 14, 2016

Another possibility would be a registry, that allows language lookup based on the instance. Something like:

def get_instance_language(instance):
    try:
        lookup = language_lookups[instance.__class__]
    except KeyError:
        return get_language()
    else:
        return lookup.get_language(instance)

@czpython
Copy link
Contributor

@jakob-o Thanks for the suggestions.
I've decided to use a more explicit approach for the addons by creating a base index in https://github.com/aldryn/aldryn-translation-tools that will then provide some functionality to index the correct language for a parler object.

I will be using the add_to_index signal and make sure to ignore haystack's default post_save signal handler as it's too generic.

@jakob-o
Copy link
Contributor Author

jakob-o commented Jan 18, 2016

EDIT: I just realized the code changes in this repo were only for the unpublishing issue... My question remains though.

EDIT2: Created a pull request #61

Hey @czpython,

I had a look at the updated code and am not sure how this would solve the language detection problem. I am fine with it for now however (using a custom router implementing the registry approach). Two small things though:

  1. I think the action argument should be added in the signal's providing_args
  2. Would you mind refactoring the action argument to cms_action or something more distinguishable?

The latter one would spare me a lot of trouble since I am using a custom implementation based on aldryn_search and https://github.com/django-haystack/celery-haystack which is adding an action argument too which causes clashes... sigh

Thank you very much!

Jakob

@czpython
Copy link
Contributor

Hello @jakob-o,

I've not gotten around to implement the language fix yet..

Regarding your points:

  1. I agree :)
  2. I like the name action but can rename to user_action, cms_action is too specific to cms. The idea behind this argument is to be leveraged by other indexes to conditionally update the index like it's being used by the TitleIndex.

@jakob-o
Copy link
Contributor Author

jakob-o commented Jan 18, 2016

How about index_action? I will happily update my PR, you decide the name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants