Hey! Signal dev here. We've actually seen delayed notification both before and after the switch to FCM. My guess is that newer Android versions are just becoming more and more aggressive with their battery optimizations.
We send high-priority FCM messages, but the device will still bundle them together in order to deliver them in batches. Even worse, there are times we are delivered an FCM message but network access will be invisibly restricted, further delaying our ability to retrieve the actual message content (for obvious reasons, we can't include the message content in the FCM message itself). We're continuing to try new things, but it'd be nice if there was some official guidelines on how to avoid these FCM pitfalls.
If you receive "actual contents" separately from GCM ping, you are probably being hit by Doze Mode. Doze is disabled for foreground Services, so your best bet is starting a foreground Service (via startForegroundService) _and_ taking wake lock (from within foreground Service, after your app is considered fully foreground).
Historically Android devices used to sleep by entering low-power CPU mode (sometimes complete with low-power radio and WiFi modes). In that mode all apps and kernel are heavily CPU throttled to the point when you can get network timeout because kernel TCP stack can't send packets fast enough. This is what gets disabled when you take a wake lock.
Doze Mode throttles individual apps by moving them into low-priority cgroup. In effect Linux kernel hardly ever schedules your process anymore. Doze Mode is not disabled by wake locks, only by starting a foreground Service.
Both Doze Mode and low-power CPU mode can coexist, leading to effectively 110% loss of CPU time by your process.
Interesting, so our current method is to do just what you said -- start a foreground service and acquire a wakelock. But we only do it if our network request takes longer than n seconds (to avoid always showing foreground notifications). We still hit the issue. Other Googlers have told me I may be running into some race condition that happens with foreground services and doze.
Thank you for the detailed info about how those modes work though -- it certainly de-mystifies the network problems!
> Interesting, so our current method is to do just what you said -- start a foreground service and acquire a wakelock. But we only do it if our network request takes longer than n seconds
This won't work. One of the less documented properties of Doze Mode is it's ability to sever your network connections. It can already be in action before you start downloading message contents. It can also kick in during the download. If you want reliable delivery, you have to take wake lock and enter foreground mode immediately after getting GCM push.
Look up, what is WakefulBroadcastReceiver, and why it used to be necessary. The class itself is deprecated (because implicit broadcasts are largely obsolete), but it shows, how one can miss opportunity to take a wake lock, causing entire application to be caught in deep CPU sleep. Google promises, that GCM will bring you out of Doze Mode, but I am not sure, if that also applies to wake lock. Your app may be sleepy because of failure to timely take wake lock, causing it to miss time window when Doze is temporarily lifted by GCM.
Thank you for your work on Signal. Despite my griping, Signal is still my primary messaging app and I love it.
>there are times we are delivered an FCM message but network access will be invisibly restricted, further delaying our ability to retrieve the actual message content
This is new... If this is happening regularly, maybe you should display a generic "message available" notification? I assume actually opening the app un-restricts network access and everything works fine from there -- it'd be nice to know I have a message waiting in limbo.
We send high-priority FCM messages, but the device will still bundle them together in order to deliver them in batches. Even worse, there are times we are delivered an FCM message but network access will be invisibly restricted, further delaying our ability to retrieve the actual message content (for obvious reasons, we can't include the message content in the FCM message itself). We're continuing to try new things, but it'd be nice if there was some official guidelines on how to avoid these FCM pitfalls.