Mah! My Assumptions Were Invalid! Again!

Last Updated: 2023-09-08 08:00:00 -0500

I set an incredibly light and easy goal for myself going into the first month or two of fall 2023: fix the power management problem that is reducing battery life to about 10% of what my back-of-the-napkin math had me expecting it to be. The culpret seemed immediately obvious: clearly, we are writing to the display too often.

Problem Statement

Put simply, the problem was this: I expect my battery capacity to be a total of 2200 mAh. I also expect that my consumption is 0.30 mA. Quick maths? Something on the order of like 2 weeks battery time. Clearly, no actual math is done. The observed battery time? “I dunno, 8 hours or so.”

After grousing about this for a few minutes I pull out my multimeter, slap the probes across the + and - leads of the battery holder, and note a measurement for, roughly, 300 mA draw. Somehow in my head this registers as “about right” for the observed battery drain, and I formulate a plan: sometime in the next two weeks or so, go over the firmware looking for more efficient ways to use low power modes, because we don’t do any of that currently, apart from entering LPM to pause between game update ticks.

Keep an eye on how I said that measurement worked. It’ll be important later.

The Prime Suspect: The Sharp Memory LCD

I focused on the difference between my expected consumption and the observed consumption and tried to beeline to what I thought was a reasonable explanation. Clearly, the software I used to use to obtain that measurement was only reporting the observed consumption of the microcontroller itself, and the rest of the current draw was down to the LCD itself.

This actually tracks with later experiment to some degree (more on that later as well). The bulk of the device’s standby power consumption is dedicated to running the LCD. The Sharp Memory LCD that we’re using draws power in three ways:

The largest usage by far is running the display itself, which is relatively high load on standby and higher when being written to.
The display’s dedicated controller also draws power, all the time but ESPECIALLY when written to, and;
Traffic thruogh the SPI bus is electrical current, so there is some small uptick through that SPI bus whenever we write. This is partially accounted for in every important way in the “higher when being written to” component of the point above.

Sharp provides a primer on Programming Sharp Memory LCDs that covers this in detail, and somehow between a quick skim of their document and my own observations I convinced myself that the bulk of the issue was going to be found in two places in my own code:

A bug I have suspected for some time in DISPLAY_updatesOnly which is using PREVIOUS_FRAME somehow incorrectly and causing every line of the display to be refreshed on every call of the function, and;
The fact that we always and without fail write to the screen on EVERY game state update tick.

Since it was always the plan, just for somewhere further down the line, to introduce a mode where the display updates are skipped (and possibly even the display itself is powered down outright), I just assumed that since everything I’ve said so far made it make sense to me that the power management would be best improved with display code updates, let’s call that change in early and make it part of the next firmware release.

Step 1: Establish a better baseline measurement

I wanted to attack this problem in tiny increments, in part because I am trying to get out of the habit with this project of introducing a major change all in one flash, and having to play whackabug with it for 3-5 times as long as it took to write the change, and in part because I wanted to be able to say I was confident that the change had the intended effect. My plans for the standby display modes were going to involve making changes to the core operating loop main() in main.c, which is a pretty big change to have to make just a few weeks after you’d listed the development kits for sale. I also don’t have a very accurate or precise way to measure the actual battery life of the device by experiment at the moment. Since we haven’t yet developed the parts of the firmware that allow state to persist through battery changes or power loss events, the best I can really do is slap some batteries in, note the time, and then check on the device throughout the day until I notice it’s dead.

However, it sounded reasonable to me that we’d see a change in current consumption under different regimes so I went ahead and threw my probes back across the battery terminals and got… 4 Amps. After asking myself how it hadn’t caught fire yet - hey, more importantly, why isn’t holding these in place with my thumbs killing me - I came to the sudden rememberance of something I had learned in high school - you can’t take these sorts of measurements in parallel.

In my defense, we’re talking about a topic I covered last maybe 15 years ago, in an area of knowledge I only started using with any kind of regularity about 12 years after that.

Armed with this sudden rememberance, I pull the REB off of the rest of the unit, use some leads from the expansion header and a bit of breadboard to hook up power connections where they’re supposed to go, socket in my multimeter, and take a fresh reading correctly. Result? 6.5 mA, with mild fluxuations.

Step 2: Challenge all previous assumptions.

Taking these measurements gave me my first and most immediate red flag, though it took a few repeats to see it. The very first scene the device enters is the Boot Splash scene. This scene does something really interesting - it is more or less a demo of what I was going to do for the lighter of two low-power screen modes:

Write the splash bitmap to the screen, then immediately write the version number on top of it.
Wait for 5 seconds with the gameplay loop running in the background, but make no actual changes to the display itself.

Since step 1 takes a second or two and then we have 5 seconds after that, what I should have expected to see during that scene was a curve in power consumption, where its high at first and then drops off. And to be fair… I did actually see that. But not nearly as dramatically as I would otherwise have expected. Writing to the screen or not makes a difference of no more than .1-.3 mA. Not insignficant but… not the grand battery-saving difference I would have expected that.

Immediately after noticing this, I had to do some thinking. In my test assembly, I use two 1100 mAh AA NiMH cells. These cells are part of my generic battery rotation for the entire lab. They live inside a device until they die, then they live in a box in my top desk drawer until at least 4 are dead or I’m out of unassigned “full” batteries and then get put in their cradle to charge in the wall. I apply no controls for making sure these batteries are regularly partially discharged and once they’re charged they sit in another box for fresh batteries. Most of this story should have you cringing, as this isn’t really the “right” way to treat NiMH batteries. Storing them dead for indefinite periods of time is particularly damning. Worse: most of these cells are more than 10 years old. None of them are fewer than three.

And here’s the real bastard of it: by even the laziest back of the napkin math, I should have been getting 100 hours or more of usage even if the LED and display were hot the whole time. Not 8.

Step 3: Test New Conclusions

Since the observed power draw and expected battery capacity no longer lined up, I needed a way to test if the battery capacity I was getting was anywhere near the battery capacity I expected to get. In a way this was sort of a formality - the writing was very much on the wall. But, regardless, I figured it would be best to validate my new conclusions.

I took a couple of fresh Alkaline batteries and threw them into PETI, then left it to bake overnight. It happily passed that time, and from a quick measurement and comparison with some voltage curves I’d say we’re back on track for it to have taken over 100 hours to discharge the battery this way.

That figure - 100 hours, is actually pretty low still. Remember, I’m trying to build something not unlike the original generation or two of Tamagotchi toys. Those ran for absolute ages off of single CR2032 cells. I don’t think we’ll ever drive power consumption quite as low as a real tamagotchi, but it can certainly go lower.

Step 4: Re-Build the Plan

I still want to lower power consumption. Yes, it’s no longer an emergency. But some of these power savings are very low hanging fruit.

I intend to implement the following in version 0.4.0 now:

Change the LED alert behaviour to be an intermittent blink with a relatively low duty cycle. Each of the two LED circuits as assembled would draw 14 mA when switched on. This 14 mA figure combined with the ~7mA standby draw of the rest of the damn thing is what was causing the estimated life to be around 100 hours. If each LED lit for 1 second out of every 2, battery life goes to 200 hours (or more, since neither LED is expected to be active for the whole time the device is just sitting out). Increase the amount of time during an alert condition where the LED is off, and you just keep increasing battery life.
Detect a standby mode condition, and turn off the display. If PETI isn’t handled in some arbitrarily large interval of time - say, I don’t know, 15 minutes, an hour, etc - simply switch off the display and stop writing to it. Then, next time a button is pressed, exit that condition and redraw the screen. Yes, it’s not how Bandai did it, but to be extremely honest, Bandai had much less computer or display to deal with. PETI sitting around with its screen off could shoot battery life through the roof, and the player can still be alerted of the pet’s need for attention by blinking the LEDs and sounding the buzzer.
Implement game state saves, where the game state information is periodically passed into FRAM (or possibly lives in FRAM), and give the player the choice on a BOR or when first powering on to reload previous state. This makes the game resistent to battery changes which may or may not be a serious concern in the final version of the game.
Implement the battery low signal. The TPS module that is doing our battery power regulation supports an alarm condition when the input voltage hits some low value - I think I calibrated it in hardware to 1.2V based on what I read about the NiMH discharge curves. When it reaches that threshold it pulls a pin high, which we have wired up so that in theory, PETI could detect it. I want to add support to detect that condition and display an on-screen and LED alert to the player that the battery is getting down low.

Since some Pet evolution paths have relatively long intended lifespans (on the order of a month) and at least one of the puzzle pathways unlocks a potentially infinite lifespan (until the RTC rolls over sometime in the fifth millenium, anyway), we need a way to do battery changes. And an update focused around improvements to power handling seems like the right time to do it.

In an ideal world, I want to finish this all before halloween. None of it is an especially heavy lift (not compared to things like the display refactor), and I am trying to make the labs more of a focus in my day to day life. Solving the game state saves problem also solves one of the final two “mysterious” features for me; the features where I know the thing I want is possible-in-principal but have no idea how to implement them.

The other one is getting support working for the expansion bus proper, which… is much more complex.

If you wanted to show your support financially for Arcana Labs projects like PETI, but don’t need a virtual pet development kit, your best avenue is via my Github Sponsors account or by making a one-time donation to Arcana Labs via Ko-Fi.com or through other avenues detailed here. Supporters also get access to a special patrons-only section of the Arcana Labs Discord Server as well, and new bonuses are soon to be introduced on the github side!