Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for LRU cache inconsistency crash #7836

Merged
merged 1 commit into from
Jul 10, 2024

Conversation

cafesilencio
Copy link
Contributor

@cafesilencio cafesilencio commented Jun 25, 2024

Description

In response to: https://mapbox.atlassian.net/browse/NAVAND-3208

Put call to cache route data on it's own thread so the synchronize blocks in the caching class will be effective. The caching call is in a method that is being called from a coroutine.

Screenshots or Gifs

@cafesilencio cafesilencio added the skip changelog Should not be added into version changelog. label Jun 25, 2024
@cafesilencio cafesilencio requested a review from a team as a code owner June 25, 2024 17:37
Copy link

Changelog

Features

Bug fixes and improvements

  • Fixed UI jank caused by on-device TextToSpeech player. [#7833](https://github.com/mapbox/mapbox-navigation-android/pull/7833)

Known issues ⚠️

Other changes

Android Auto Changelog

Features

Bug fixes and improvements

  • The app is now considered as the one in active navigation only when an active route is set to MapboxNavigation. Previously it was always considered active. [#7366](https://github.com/mapbox/mapbox-navigation-android/pull/7366)
  • When Android Auto host tells the app to stop active navigation because another app starts navigating, now SDK will enter FreeDrive mode instead of stopping trip session completely. [#7366](https://github.com/mapbox/mapbox-navigation-android/pull/7366)

Copy link

codecov bot commented Jun 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.18%. Comparing base (9b40acb) to head (286310c).

Impacted file tree graph

@@            Coverage Diff            @@
##               main    #7836   +/-   ##
=========================================
  Coverage     74.17%   74.18%           
  Complexity     6262     6262           
=========================================
  Files           856      856           
  Lines         33789    33793    +4     
  Branches       4023     4022    -1     
=========================================
+ Hits          25062    25068    +6     
+ Misses         7170     7169    -1     
+ Partials       1557     1556    -1     
Files Coverage Δ
...ion/base/internal/route/RouteCompatibilityCache.kt 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

@cafesilencio cafesilencio marked this pull request as draft June 25, 2024 19:10
@cafesilencio cafesilencio force-pushed the sb-NAVAND-3208-sync-lru-cache branch from 77025d3 to a89e1c4 Compare June 25, 2024 19:21
@cafesilencio cafesilencio marked this pull request as ready for review June 25, 2024 19:22
@cafesilencio cafesilencio changed the title Changed synchronization object based on documentation in LruCache class. Changed synchronization to semaphore. Jun 26, 2024
@cafesilencio cafesilencio requested a review from a team July 1, 2024 16:51
@@ -17,37 +18,51 @@ import com.mapbox.navigation.base.route.NavigationRoute
object RouteCompatibilityCache {
private val directionsSessionCache = mutableListOf<NavigationRoute>()
private val creationCache = LruCache<DirectionsRoute, NavigationRoute>(3)
private val lock = Any()
private val semaphore = Semaphore(1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cafesilencio, can you elaborate on how this is fixing the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't recreate the issue but I've seen discussions about synchronized blocks not working well with coroutines. Based on the stack trace something isn't correct with using a generic object to lock on to. I thought using a Semaphore would be a stronger locking mechanism in this case. Do you have a suggestion for something different?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both mechanism are the same – they both will lock current thread until lock/semaphore is free. So looks like this fix will improve nothing, IMO.

Copy link
Contributor

@korshaknn korshaknn Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we need to sync it with coroutines, we can make the func suspend and use coroutine specific locks:
'ReentrantLock' / Mutex

it also can be a Semaphore but from kotlinx.coroutines.sync

we can't use classes from java.util.concurrent with coroutines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't use classes from java.util.concurrent with coroutines

Thanks for the clarification. I knew there was some kind of incompatibility but I couldn't remember the details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't use classes from java.util.concurrent with coroutines

There is no such limitation, where did you get that?

Copy link
Contributor Author

@cafesilencio cafesilencio Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@korshaknn might know the details better and can comment. I seem to remember that synchronize doesn't work with coroutines because they can operate on the same thread. If you look at the the stack trace in the reported error it seems evident the synchronize blocks aren't working. There is one call to this class originating in a coroutine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coroutines provide better alternatives, but synchronization still works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean java.util.concurrent with coroutines will block the whole thread, that's why we should use coroutine specific sync classes
Mutex.lock() is a suspending function. It does not block a thread.

Copy link
Contributor

@Zayankovsky Zayankovsky Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't mean we can't use synchronization with coroutines or that it doesn't work

@cafesilencio
Copy link
Contributor Author

Some additional context:

RouteCompatibilityCache::setDirectionsSessionResult is called in MapboxDirectionsSession::setNavigationRoutesFinished which is executed in a coroutine in MapboxNavigation::internalSetNavigationRoutes line 1133. The other functions in RouteCompatibilityCache are not called in a coroutine as far as I can see.

@cafesilencio cafesilencio force-pushed the sb-NAVAND-3208-sync-lru-cache branch 2 times, most recently from 6bebfdd to 3643e05 Compare July 3, 2024 20:35
@cafesilencio cafesilencio changed the title Changed synchronization to semaphore. Fix for LRU cache inconsistency crash Jul 8, 2024
@cafesilencio cafesilencio force-pushed the sb-NAVAND-3208-sync-lru-cache branch 7 times, most recently from fd57277 to a840c74 Compare July 8, 2024 19:49
@cafesilencio cafesilencio requested a review from korshaknn July 8, 2024 20:15
Comment on lines 37 to 46
// A crash was reported due to an LruCache inconsistency.
// The caching implementation is using java.concurrent
// classes for synchronization. This method is getting called
// from a coroutine upstream. Putting this call on it's own thread
// should make the synchronization effective.
Thread {
run {
setDirectionsSessionResultInternal(routes)
}
}.start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really doubt that changes anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that since synchronize is designed to work with threads rather than coroutines I could execute this code on a new thread and the synchronization would be enforced. If I change the class to use suspend methods and a kotlin mutex internally that will cause the upstream usage to change. The methods in this class are ultimately called in MapboxNavigation. If you have suggestions please share them. Based on the stack trace reported, the synchronization blocks seem not to be working correctly. The only explanation I could think of was that one of the methods in this class is originating from a coroutine upstream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any coroutine code is still run on some thread and synchronization works there as usual. The only thing you can't do is call suspend functions inside synchronized blocks, but that is checked during compilation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this looks like an issue with hashCode or equals, which causes the LruCache to run into issues

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option could be to use single thread for all the operations and remove synchronisation at all.
Like here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any coroutine code is still run on some thread and synchronization works there as usual. The only thing you can't do is call suspend functions inside synchronized blocks, but that is checked during compilation

You might be right but several coroutines can be associated with the same thread. So I thought if I put the operation on an actual new thread the synchronization would be more effective.

For me this looks like an issue with hashCode or equals, which causes the LruCache to run into issues

Here is the stack trace from the reported issue. I don't think this is a result of a haschCode/equals problem.

IllegalStateException
android.util.LruCache.sizeOf() is reporting inconsistent results!

android.util.LruCache in trimToSize at line 203
android.util.LruCache in evictAll at line 311
com.mapbox.navigation.base.internal.route.RouteCompatibilityCache in setDirectionsSessionResult at line 38
com.mapbox.navigation.core.directions.session.MapboxDirectionsSession in setNavigationRoutesFinished at line 49
com.mapbox.navigation.core.MapboxNavigation$internalSetNavigationRoutes$2 in invokeSuspend at line 1129
kotlin.coroutines.jvm.internal.BaseContinuationImpl in resumeWith at line 33
kotlinx.coroutines.internal.DispatchedContinuation in resumeUndispatchedWith$kotlinx_coroutines_core at line 256
kotlinx.coroutines.DispatchedTaskKt in resume at line 178
kotlinx.coroutines.DispatchedTaskKt in resumeUnconfined at line 191
kotlinx.coroutines.DispatchedTaskKt in dispatch at line 163
kotlinx.coroutines.CancellableContinuationImpl in dispatchResume at line 474
kotlinx.coroutines.CancellableContinuationImpl in resumeImpl at line 508
kotlinx.coroutines.CancellableContinuationImpl in resumeImpl$default at line 497
kotlinx.coroutines.CancellableContinuationImpl in resumeWith at line 368
com.mapbox.navigation.navigator.internal.MapboxNativeNavigatorImpl$refreshRoute$callback$1 in invoke$lambda-2 at line 267
com.mapbox.navigation.navigator.internal.MapboxNativeNavigatorImpl$refreshRoute$callback$1 in $r8$lambda$S3_XwtbgUvIeW_lYTzEbEd0UjcE
$1$$InternalSyntheticLambda$3$5ce679cfe3c67248c45a37495894744b0b268c2244e2f7d1b83487200930d94f$1 in invoke
com.mapbox.bindgen.Expected in fold at line 81
com.mapbox.navigation.navigator.internal.MapboxNativeNavigatorImpl$refreshRoute$callback$1 in invoke at line 248
com.mapbox.navigation.navigator.internal.MapboxNativeNavigatorImpl$refreshRoute$callback$1 in invoke at line 245
com.mapbox.navigation.navigator.internal.MapboxNativeNavigatorImpl$refreshRoute$2$1 in run at line 282
android.os.MessageQueue in nativePollOnce
android.os.MessageQueue in next at line 342
android.os.Looper in loopOnce at line 182
android.os.Looper in loop at line 357
android.app.ActivityThread in main at line 8194
java.lang.reflect.Method in invoke
com.android.internal.os.RuntimeInit$MethodAndArgsCaller in run at line 548
com.android.internal.os.ZygoteInit in main at line 957

Another option could be to use single thread for all the operations and remove synchronisation at all.

I can look at doing something like that but I can't change the methods in the caching class to suspend methods because upstream the caching class is used MapboxNavigation and would require changing some of those methods to suspend which would break SEMVER.

@cafesilencio cafesilencio force-pushed the sb-NAVAND-3208-sync-lru-cache branch from a840c74 to 286310c Compare July 9, 2024 20:03
@cafesilencio
Copy link
Contributor Author

I replaced the LRUCache with a ConcurrentHashMap and resize as necessary. Please take a look.

Copy link
Contributor

@Zayankovsky Zayankovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, at least it won't crash anymore, since ConcurrentHashMap doesn't do this kind of consistency checks

@cafesilencio cafesilencio merged commit ae64202 into main Jul 10, 2024
39 of 43 checks passed
@cafesilencio cafesilencio deleted the sb-NAVAND-3208-sync-lru-cache branch July 10, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip changelog Should not be added into version changelog.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants