Flash e-bike Part 4 (Battery Management System)
In part 1 of the series, I introduced the background of this particular exercise, and why we're embarking on it. In part 2, I disassembled the bike and identified the major pieces of its systems. In part 3 we looked at the mainboard, and figured out what its primary components were, and how they worked.
In this part, we're looking closely at the battery management system (hereto referred to as "BMS"). I'm not a power system engineer, so we're not going to deep-dive into its electrical systems. Indeed, most of that doesn't matter for our purposes. I am interested in the digital protocol that we can use to communicate with it. There's a good chance that we're going to want to understand how this works so we can use its data.
Writing this post was a struggle because writing is a necessarily linear medium. I need a start point, intermediate steps, and a conclusion. But, the way I make sense of an undocumented stream of binary is definitely not linear. I'll loop back, try something, learn more, begin again, etc. To help present it fairly linearly, I'll use the OSI model as a framework. I realize it's a bit of an antiquated model, but it'll still be useful in this case.
Hardware analysis
The BMS connection to the mainboard is four conductors. The keenest readers among you may have noticed that I skipped a small section and connector in the mainboard post. The connector is on the left margin, hiding under the glare from my work light. It was easy find the ground connection, identify that two of the four pins go to the UART chip, and that the fourth wire is connected to 3.3 volts. What I wasn't sure about was what direction the data and power flow was for each wire.
I decided the best way forward was to devise a test where we evaluate:
- Can the pack can be charged
- Can the charger for my Rad Wagon work, and is appropriate
- Can the pack hold a charge
- What are the physical layer qualities of the protocol between the BMS and the mainboard
To answer these questions, I connected all the components of the bike on my bench. The only omission is the LCD module, because I hadn't removed it from the frame yet. I also configured my oscilloscope to serve as a data logger, so I'd have access to trends over the course or several hours. I used one channel to monitor pack voltage, and another channel was connected to my DC-capable current clamp. You'll notice that the source and return wires for charging go through the clamp, and in opposite directions. This is to increase sensitivity and help reject common-mode noise.
The battery pack was deeply discharged, and I was pretty worried about safety while charging it. My plan was to charge it up to 3.6 volts (all voltages are per cell), and let it sit overnight. Then, check the voltage in the morning. The charger from the Rad Wagon only supplied 1.2 mA while the pack was less than 3.5 volts, and once it got past 3.5 it was charging at closer to 750 mA. At 3.6 volts, I unplugged the charger and let it sit overnight.
The next day, the pack had only lost 0.1v per cell, so I was pretty happy with that. I re-connected the charger and let it get to 4.0v. The next day, I went up to 4.2v. This was a resounding success. The battery only "lost" a small amount of voltage after leaving it alone, and I suspect this is completely natural.
In my opinion, we've answered questions 1-3 in the affirmative.
During these tests, I connected my favorite hardware tool (Saleae Logic Pro 16) to the data pins between the mainboard and the BMS to capture some traces of the mainboard communicating with the BMS. These traces will provide us the data we need to answer question 4. It wasn't difficult to identify that UART serial at 9600 baud was used for the physical layer.
Datalink analysis
Now that we know the physical interface is UART, it should be straight forward to identify the framing of the data. But, for this I have to introduce my ✨favorite✨ software tool. Being trained as a computer scientist, you might be expecting me to say python, or some fancy hex editor, or wireshark or something. Nope. It's a spreadsheet. Take that CSV from Logic, and open it in Numbers...
I had already noticed that all the packets had the same size, so it was pretty easy to write a formula to convert the data from a single column of hex numbers into frames (one frame per row). Then you can highlight a column and look at the summary to see the range of values. Byte 1 of every frame is 0x3A
hex, which is a printable :
character, and the last bytes are 0x0D, 0x0A
, which are a CR
and LF
respectively. Interestingly, the bytes between these ASCII characters is strictly binary (not ASCII printable, except for coincidences).
Network analysis
With the bottom two layers of the model figured out, it's time to figure out the network layer. It might seem silly to talk about the network layer with what appears to be point-to-point serial, but there are some hints that it might be possible to use these BMSs in a multi-drop scenario.
Before we get there, I should share some details I noticed once I disconnected the mainboard and started trying to work with the BMS over a USB UART adapter. (Yes, this is one of those non-linear parts of the story)
Let's compare the oscilloscope traces with the BMS connected to the mainboard:
and with it connected to a USB-UART adapter:
When the BMS wasn't connected to the mainboard, it started acting quite differently. Remember that I mentioned the extra wire that appeared to be connected to power. It turns out that the BMS has to be powered via this wire for the serial interface to work. That was simple enough, but if you look carefully at the top screen capture, you might notice that the signal coming from the BMS looks a little different than the one going to it. The leading edges look odd. Let's zoom in to see it closer:
The serial signal from the BMS has the look of an open-collector output. That is to say that there's a pull-up pulling the signal high, and low-side transistor shorting that to ground. This is common when you want to parallel multiple devices on the same wire, and rely on the fact that the voltage will fall if any of the bus members pull it low. This is exactly how I2C works, for example.
I did a little more digging into the BMS board itself. The pieces fall into place when we notice that there are opto-isolators on each signal on the serial interface (visible in the top-right corner of the board in the photo at the top of the post). The opto-isolators would be vital if you ever needed to put 2 or more of these packs in series. I wonder how the ground management would work; I haven't tried to see if there's continuity between the negative terminal of the battery and the ground of the serial interface. This section is enlarged below.
I haven't tried to reverse-engineer this part of the board, because batteries and BMSs attached to batteries are impossible to de-energize, and I don't feel like breaking stuff today.
There was an interesting effect when I wasn't supplying power. The signal from the BMS has a low-amplitude copy of the signal being sent to it. I suspect what's happening here is that some power is leaking from the input signal and lifting the output. When the input goes low, that supply falls as well. The below image shows a zoomed-in portion of the second oscilloscope capture, when the BMS wasn't powered.
Given that I believe that the serial hardware supports multi-drop systems, and is opto-isolated, is there any support of this in the bytes of the serial protocol?
I think the answer is "Yes... Maybe?"
Let's look at the only commands that the mainboard ever sends to the BMS (these values are all in decimal):
byte 1 | byte 2 | byte 3 | byte 4 | byte 5 | byte 6 | byte 7 | byte 8 | byte 9 |
---|---|---|---|---|---|---|---|---|
58 | 22 | 08 | 01 | 11 | 42 | 00 | 13 | 10 |
58 | 22 | 09 | 01 | 11 | 43 | 00 | 13 | 10 |
58 | 22 | 10 | 01 | 11 | 44 | 00 | 13 | 10 |
58 | 22 | 13 | 01 | 11 | 47 | 00 | 13 | 10 |
58 | 22 | 15 | 01 | 11 | 49 | 00 | 13 | 10 |
58 | 22 | 16 | 01 | 11 | 50 | 00 | 13 | 10 |
58 | 22 | 22 | 01 | 11 | 56 | 00 | 13 | 10 |
58 | 22 | 23 | 01 | 11 | 57 | 00 | 13 | 10 |
These packets aren't in the order the mainboard sends them. I've de-duplicated the packets, and sorted them by byte 3. The mainboard sends the packet with 22
in byte 3 much more often than any other. But what is byte 2? If I were the designer of the protocol, I may very well have byte 2 indicate the address of the endpoint/BMS.
We might as well start keeping track of how far we are in understanding all these bytes. Let's have a scoreboard for the packets to and from the BMS. We'll update it as we work through the post. For now, I'm going to say that we maybe understand byte 2 and mark it with ⚠️. Also, let's assume that the framing bytes are good and mark them ✅ for "known".
source | byte 1 | byte 2 | byte 3 | byte 4 | byte 5 | byte 6 | byte 7 | byte 8 | byte 9 | byte 10 |
---|---|---|---|---|---|---|---|---|---|---|
mainboard | ✅ | ⚠️ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | N/A |
BMS | ✅ | ⚠️ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ |
There is another kind of address present in the protocol, that that's for requestor/responder. If we compare the contents of a few request/response packets in succession, and focus on byte 4, we can see that the request always has 01
, and the response has 02
.
Direction (BMS) | Byte 1 | Byte 2 | Byte 3 | Byte 4 | Byte 5 | Byte 6 | Byte 7 | Byte 8 | Byte 9 | Byte 10 |
---|---|---|---|---|---|---|---|---|---|---|
to | 58 | 22 | 13 | 01 | 11 | 47 | 00 | 13 | 10 | |
from | 58 | 22 | 13 | 02 | 00 | 00 | 37 | 00 | 13 | 10 |
to | 58 | 22 | 16 | 01 | 11 | 50 | 00 | 13 | 10 | |
from | 58 | 22 | 16 | 02 | 149 | 44 | 233 | 00 | 13 | 10 |
to | 58 | 22 | 22 | 01 | 11 | 56 | 00 | 13 | 10 | |
from | 58 | 22 | 22 | 02 | 208 | 10 | 08 | 01 | 13 | 10 |
I'm not certain, but I'm convinced enough to call byte 4 solved, and mark it on the scoreboard.
source | byte 1 | byte 2 | byte 3 | byte 4 | byte 5 | byte 6 | byte 7 | byte 8 | byte 9 | byte 10 |
---|---|---|---|---|---|---|---|---|---|---|
mainboard | ✅ | ⚠️ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | N/A |
BMS | ✅ | ⚠️ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ |
I think, for the moment, we can move on from the Network layer of the OSI model. We'll have to assume it's some kind of address until we know more. If we have an example of a BMS with something other than 22
in byte 2, we'll know for sure.
Application analysis
Now we have to skip a few levels in the model. There are no segments or sessions to speak of, and we don't actually know enough to reverse engineer the data values. I propose that it makes the most sense to skip all the way to the top layer of the stack because byte 3 looks suspiciously like a register index. When the mainboard sends 13
to the BMS, the response also includes 13
in byte 3.
If you simply sort by time, then each byte position of the packets are chaotic:
But, if you sort by time, then byte 4, then byte 3, you'll see smooth value changes when graphed.
This is a subtle clue that there is something in common between packets that share the same value in byte 3.
That's enough for me to conclude that byte 3 is some kind of register index, and check it off in the scoreboard.
source | byte 1 | byte 2 | byte 3 | byte 4 | byte 5 | byte 6 | byte 7 | byte 8 | byte 9 | byte 10 |
---|---|---|---|---|---|---|---|---|---|---|
mainboard | ✅ | ⚠️ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | N/A |
BMS | ✅ | ⚠️ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ |
Presentation analysis
Now that we're pretty sure that we can select individual logical fields that the BMS provides to the user, it's time to understand how that data is presented. In this case, the data we captured contained a lucky event. We had a value grow to the point that it crossed an important threshold: it overflowed a byte.
It's possible to see it in the second graph if you look close, but for convenience, I've selected and copied just bytes 5 through 8 of when register 9 overflows a byte value.
Byte 5 | Byte 6 | Byte 7 | Byte 8 | Total |
---|---|---|---|---|
254 | 134 | 165 | 01 | 34558 |
255 | 134 | 166 | 01 | 34559 |
00 | 135 | 168 | 00 | 34560 |
254 | 134 | 165 | 01 | 34558 |
04 | 135 | 172 | 00 | 34564 |
03 | 135 | 171 | 00 | 34563 |
Now we have all the information we need to conclude that bytes 5 and 6 are related, and whether they're little endian or big endian! So, if we combine them we get the integer value that I've included in the "Total" column. Those look an awful lot like a battery pack voltage (in millivolts, naturally)! 🎉
Let's update the scoreboard!!
source | byte 1 | byte 2 | byte 3 | byte 4 | byte 5 | byte 6 | byte 7 | byte 8 | byte 9 | byte 10 |
---|---|---|---|---|---|---|---|---|---|---|
mainboard | ✅ | ⚠️ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | N/A |
BMS | ✅ | ⚠️ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
Skipping ahead, I'll point out here that I discovered later that bytes 5 and 6 are not just a 16-bit integer, but they're also a 2s-compliment signed number.
Wrapping up the packet format
Ok, we're getting pretty close to cracking this thing wide open. But, what are bytes 7/8 from the BMS and 6/7 from the mainboard? And, why are they misaligned like that? Because we're kinda "working from the back forward" of the packet now, I'm going to move the "N/A" from byte 10 and put in byte 5 of the mainboard to BMS scorecard.
source | byte 1 | byte 2 | byte 3 | byte 4 | byte 5 | byte 6 | byte 7 | byte 8 | byte 9 | byte 10 |
---|---|---|---|---|---|---|---|---|---|---|
mainboard | ✅ | ⚠️ | ✅ | ✅ | N/A | ❌ | ❌ | ❌ | ✅ | ✅ |
BMS | ✅ | ⚠️ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
When considering byte 7 (in the newly unified numbering), there's some interesting symmetry relative to byte 5. Let's zoom way into the sorted graph, and see how they're related.
It's immediately apparent that these two bytes seem to be growing and changing at exactly the same rates; like they're completely redundant. And, indeed, they are. It's my theory that byte 7 is a check byte, checksum, or simplistic error detection mechanism. The trick, though, is figuring out exactly how it's computed. Luckily, the first thing I tried worked.
Essentially, the idea is to figure out which of bytes that make up the packet are included in computation of the check byte. My initial expectation is that the check byte is simply the sum of several bytes in the packet modulo 256 (truncated to a single byte):
\( C = \sum_{n=1}^{6} B(n) \text{ mod } 256 \)
\(C\) is the value of the check byte, \(B(n)\) is the byte at \(n\), and we assume that we'll count byte 1 to byte 6. The only way to describe this process is really "play around with it for a while until the math works". With that complete, we know that byte 7 is exactly equal to:
\( C = \sum_{n=2}^{6} B(n) \text{ mod } 256 \)
Now, we know how the checksum is computed, and we can start writing our own commands to the BMS!
source | byte 1 | byte 2 | byte 3 | byte 4 | byte 5 | byte 6 | byte 7 | byte 8 | byte 9 | byte 10 |
---|---|---|---|---|---|---|---|---|---|---|
mainboard | ✅ | ⚠️ | ✅ | ✅ | N/A | ❌ | ✅ | ❌ | ✅ | ✅ |
BMS | ✅ | ⚠️ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
But, wait, can we? What's up with byte 5 from the mainboard (6 in the scoreboard) and 8?
If we interrogate the captured data from the mainboard to the BMS, byte 5 is always 11
, and byte 8 is always 0
. So, I guess we can just shrug and call them "good enough" for now. I'm going to mark them with the ⚠️ symbol and move on.
source | byte 1 | byte 2 | byte 3 | byte 4 | byte 5 | byte 6 | byte 7 | byte 8 | byte 9 | byte 10 |
---|---|---|---|---|---|---|---|---|---|---|
mainboard | ✅ | ⚠️ | ✅ | ✅ | N/A | ⚠️ | ✅ | ⚠️ | ✅ | ✅ |
BMS | ✅ | ⚠️ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
The only thing left that's bugging me is byte 8 of the packet from the BMS to the mainboard. It's not always 0
, like it is in the other direction. It's set to 1
about 15% of the time. I looked at the data for quite some time to identify the conditions for which it's set to 1
, and the best I came up with is:
"Byte 8 is set to 1 exactly when the value of byte 5 is greater than the value of byte 7"
It's pretty bizarre, and I've never seen anything like it, but it works for 100% of the packets I've seen. Good enough for me; let's check it off the list! (addendum: After writing a program to implement this protocol, I did end up seeing examples where the byte 7 checksum didn't match this theory. I haven't investigated further.)
source | byte 1 | byte 2 | byte 3 | byte 4 | byte 5 | byte 6 | byte 7 | byte 8 | byte 9 | byte 10 |
---|---|---|---|---|---|---|---|---|---|---|
mainboard | ✅ | ⚠️ | ✅ | ✅ | N/A | ⚠️ | ✅ | ⚠️ | ✅ | ✅ |
BMS | ✅ | ⚠️ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Application analysis
Now that we understand the packet format well enough that we can author our own request packets, it's time to start understanding what these registers mean. We have a list of candidates for important registers, because we can assume that the mainboard is requesting relevant values. But what do they mean? And, are there more that might have interesting data?
To help answer the first question, we return again to the log we captured while the mainboard was connected, and the battery was charging. If we chart different values together, we might be able to infer the meaning of some of the registers
The graphs above are the most interesting that I've found from the capture. At this point, we have no idea what any of them refer to, but we can make some guesses. Because register 9 looks the most like pack voltage, I just assume that's what it is. Similarly, register 10 appears to be the charging current register. Later on, I checked the value of register 10 and it appeared to have gone negative, which could imply that it's pack current (both positive and negative).
It's clearly time to write a quick program to log this data directly, rather than trying to capture it via the mainboard communications. This is also the only way we're going to explore the full complement of registers the BMS will even respond to. So, I threw something together quickly in Rust, and uploaded it to gitlab.
Using the bms logger, I captured some data as a charge was wrapping up. I've selected the few interesting registers and provided the snippet of the end of charge below:
unk. 1 | Pack V | Pack I | SoC? | SoC? | unk 2 | unk 2 binary |
---|---|---|---|---|---|---|
2.991 | 42.277 | 1.979 | 92 | 91 | 128 | 0000000010000000 |
2.991 | 42.292 | 1.979 | 92 | 91 | 128 | 0000000010000000 |
2.991 | 42.303 | 1.98 | 93 | 98 | 16608 | 0100000011100000 |
2.989 | 41.878 | 0 | 100 | 98 | 16608 | 0100000011100000 |
2.989 | 41.834 | 0 | 100 | 98 | 16608 | 0100000011100000 |
2.991 | 41.801 | 0 | 100 | 98 | 16608 | 0100000011100000 |
There are two registers that have variation, but for which we don't have a theory about their meaning. In particular, unk. 1
seems to always hover around 2990, or 29.9, or 2.99, etc. It could be 29℃, or 2.9 volts, or something else??? Unclear.
On the other hand unk. 2
appears to be some kind of enum or bit field. Before the battery is done charging it's 128, which is a hint that it might be a bit field with bit 7 set. After the charge completes, it becomes 16608 which keeps bit 7, but adds bits 5, 6, and 14. Much later on, when the battery has discharged itself some, this register went to 192
, which is 0000000011000000
. Bits 5 and 14 are cleared, but 6 and 7 remain set. Very interesting.
Our hypothesis about registers 9 and 10 seem to be further reinforced. Once the pack voltage crosses 42 volts (4.2 volts per cell) charge current goes to zero. Also, another unknown register suddenly snaps to 100. That's likely state of charge! There is another register that closely tracks this state of charge register, but it's always a little (1 or 2 points) lower.
There's another thing we can do now that we have a program that can interact with the BMS protocol. We can scan the possibilities of register addresses, and see what it responds to. I ended up scanning to 50 (arbitrarily), and it stopped responding at 27. There's a possibility that there are more valid registers above 50, but I haven't checked.
Of the 26 that had a response (zero wasn't responded to), only a handful had any values that changed. We already knew about the vast majority of those, because the mainboard was querying them. There were a few static registers that had interesting contents, however. Examples of those are:
- register 19 always returns 3375; a safe per-cell low-voltage threshold (in millivolts)
- register 21 always returns 4200; full-charge per-cell voltage (in millivolts)
- register 25 always returns 3600; nominal cell voltage for Li-Ion (in millivolts)
I've uploaded a csv of the raw data in case you want to dig around.
Conclusion
Thanks for sticking around on this journey. It was a lot. I wanted to break this post into separate parts, but I just didn't think it would work right. This was a super fun exploration, and it had a rewarding ending. If you know anything about "Yoku" batteries, their "Smart BMS", or recognize this protocol, please reach out! My email is on the landing page, or you can use the comments below.
Next steps
The next job ahead of us is understanding the ESC, how it communicates, and whether we'll be able to keep it.