tinyman392
Senior Member
- First Name
- Marcus
- Joined
- May 21, 2018
- Threads
- 14
- Messages
- 3,265
- Reaction score
- 2,082
- Location
- Illinois
- Vehicle(s)
- '18 Civic Type R (RR)
- Thread starter
- #1
So it's been stated that the Civic Type R badge numbers seem to be assigned to cars randomly from the factory. This post isn't to challenge that fact, as it seems that's 100% true (from the data gathered). However, there is a statement floating around that states that the badge numbers have no correlation with production number or year. This is the area I felt were completely false. The data below shows that the badge numbers do correlate with production year and it is certainly possible to predict production year given badge number. So while it is impossible to predict the next badge number for production (they are chosen at random), or predict a badge number given production number, it certainly is possible to correlate badge number with year and even predict it with a good amount of accuracy.
The above paragraph is basically your tl;dr for this incredibly long post.
Background
I joined the CTR forums around March of 2018, but noticed a few things as people started posting badge numbers. The first is that all the previous badge numbers posted (mine included) seemed to be mainly < 10k with a few higher than 10k. Eventually, we began seeing numbers higher than 20k and it seemed like those numbers began flooding in in droves shortly after the first one appeared. Then eventually 30k, and again, they came in full force. We're starting to hit 40k now which seem to be trickling in pretty quickly as well. Also of note is that when we started seeing 20k numbers pop up, numbers < 20k kind of stopped showing up (or slowly began rolling off). Same thing happened when we broke that 30k barrier.
This sort of pattern kind of screams correlation to my eyes. But every post about the badge numbers seems to state that they are completely random and have zero correlation. That isn't the pattern I saw above. During that time, it was only 2018, so data was scarce, I could only see some 2017 data and few 2018 data. Well 2019 eventually came around and now we have some data for that as well and a good number of 2018 data too!
Finding Correlation
I took all of the badge numbers from fk8registry.com and copied them into a file. fk8registery offered the data as a 3-column table containing badge number, color, and year. There are a total of 1023, 1215, and 398 entries for years 2017, 2018, and 2019, respectively (at time of writing). I went ahead computed the following for each year:
The plot above shows the results of the analysis pretty clearly. The larger, colored dots represent the average badge number for each given production year of CTR. The grey line is the standard deviation amongst the badge numbers. The smaller grey dots represent the minimum and maximum badge numbers for each of the given years. It's pretty clear that the badge numbers are increasing, and thus there is a correlation. A correlation is also visible with the maximum badge numbers as well. 95% confidence intervals were not shown on the plot because they wouldn't be visible if shown (ranges between 130-339), suffice to say, they are that tight.
Predicting Model Year from Badge Number
The dataset from fk8registry contains a ton of datapoints available for 2636 FK8 badge numbers at time of writing. When a lot of datapoints like this are available, it may be possible to build a model to predict stuff. In this case, stuff = production year. More specifically, I took the badge numbers for each individual production year and split them up at a 1:4 ratio into testing and training set so 80% of the data would reside in a training set while 20% would reside in a testing set. I could then use SciKit-Learn to build a decision tree model (default parameters, didn't need to tune a thing) to predict production year given only the badge number.
Using the testing set (blind set not used during training), I took a look to see how successful the model was was predicting the exact production year (2017, 2018, or 2019) of the vehicle with said badge number and the model's ability to predict within 1 year (2017/2018 or 2018/2019). It was 95% successful at predicting the exact production year (guessing most prevalent year would be 46%) and 100% successful at predicting within a year (guessing most prevalent 2 years would be 84%). This result shows that it is possible to predict production year given the badge number.
Simulating Badge Numbers
So, how can a set of badge numbers that's drawn from a parts bin produce a strong pattern if they are selected at random? Well, given a random selection of stuff under a set of rules, patterns can, but don't always do, appear. Numberphile on YouTube has an excellent video on this subject through a video entitled Chaos Game. I've embedded it below, it's worth a watch if you've got a few minutes to blow (it is very straightforward, easy to understand, and may blow your mind).
I went ahead and tried to simulate how badge numbers would be placed on CTRs. So to simulate badge numbers being selected for artificial cars, I went ahead and assumed the following were true:
The X-axis shows the production number of a simulated vehicle while the Y-axis shows the badge number of the simulated vehicle. Note how the first 8000 vehicles is indeed uniformly random. However, after the parts bin is refilled, there seems to be a relatively uniform random selection of 8000-16000 with some small selections in the first 0-8000 batch of badges. With each new batch being produced, the badges from the previous batches are selected at a much lower rate, the older the batch, the lower its chance of being chosen.
The Pearson correlation coefficient was computed for the badge number and their production numbers and a pcc = 0.94 was produced with a 2-tailed p-value = 0.0. This basically shows that in a badging system similar to what the CTR most likely goes through, the badge numbers are indeed heavily correlated despite being chose at random from a parts bin. Additionally, the simulation supports the fact that as the model years increase, the badge numbers are expected to as well. Although it is impossible to predict the production number given a badge number, it's very possible to get a general idea of production year from badge number.
Granted the rules above may not directly be what the CTR production does (batch sizes can technically vary and when they are replenished may be different), you'll still get the very dense "rectangles" as each new batch of badges is made and a pearson correlation coefficient > 0.5 (showing correlation). The earlier the parts bin is replenished, the lower the PCC.
The above paragraph is basically your tl;dr for this incredibly long post.
Background
I joined the CTR forums around March of 2018, but noticed a few things as people started posting badge numbers. The first is that all the previous badge numbers posted (mine included) seemed to be mainly < 10k with a few higher than 10k. Eventually, we began seeing numbers higher than 20k and it seemed like those numbers began flooding in in droves shortly after the first one appeared. Then eventually 30k, and again, they came in full force. We're starting to hit 40k now which seem to be trickling in pretty quickly as well. Also of note is that when we started seeing 20k numbers pop up, numbers < 20k kind of stopped showing up (or slowly began rolling off). Same thing happened when we broke that 30k barrier.
This sort of pattern kind of screams correlation to my eyes. But every post about the badge numbers seems to state that they are completely random and have zero correlation. That isn't the pattern I saw above. During that time, it was only 2018, so data was scarce, I could only see some 2017 data and few 2018 data. Well 2019 eventually came around and now we have some data for that as well and a good number of 2018 data too!
Finding Correlation
I took all of the badge numbers from fk8registry.com and copied them into a file. fk8registery offered the data as a 3-column table containing badge number, color, and year. There are a total of 1023, 1215, and 398 entries for years 2017, 2018, and 2019, respectively (at time of writing). I went ahead computed the following for each year:
- The average badge number
- The standard deviation amongst the badge numbers
- The 95% confidence interval for the badge numbers
- The minimum badge number
- The maximum badge number
The plot above shows the results of the analysis pretty clearly. The larger, colored dots represent the average badge number for each given production year of CTR. The grey line is the standard deviation amongst the badge numbers. The smaller grey dots represent the minimum and maximum badge numbers for each of the given years. It's pretty clear that the badge numbers are increasing, and thus there is a correlation. A correlation is also visible with the maximum badge numbers as well. 95% confidence intervals were not shown on the plot because they wouldn't be visible if shown (ranges between 130-339), suffice to say, they are that tight.
Predicting Model Year from Badge Number
The dataset from fk8registry contains a ton of datapoints available for 2636 FK8 badge numbers at time of writing. When a lot of datapoints like this are available, it may be possible to build a model to predict stuff. In this case, stuff = production year. More specifically, I took the badge numbers for each individual production year and split them up at a 1:4 ratio into testing and training set so 80% of the data would reside in a training set while 20% would reside in a testing set. I could then use SciKit-Learn to build a decision tree model (default parameters, didn't need to tune a thing) to predict production year given only the badge number.
Using the testing set (blind set not used during training), I took a look to see how successful the model was was predicting the exact production year (2017, 2018, or 2019) of the vehicle with said badge number and the model's ability to predict within 1 year (2017/2018 or 2018/2019). It was 95% successful at predicting the exact production year (guessing most prevalent year would be 46%) and 100% successful at predicting within a year (guessing most prevalent 2 years would be 84%). This result shows that it is possible to predict production year given the badge number.
Simulating Badge Numbers
So, how can a set of badge numbers that's drawn from a parts bin produce a strong pattern if they are selected at random? Well, given a random selection of stuff under a set of rules, patterns can, but don't always do, appear. Numberphile on YouTube has an excellent video on this subject through a video entitled Chaos Game. I've embedded it below, it's worth a watch if you've got a few minutes to blow (it is very straightforward, easy to understand, and may blow your mind).
I went ahead and tried to simulate how badge numbers would be placed on CTRs. So to simulate badge numbers being selected for artificial cars, I went ahead and assumed the following were true:
- A just in time manufacturing system is set in place which results in badges being produced and placed into a parts bin. When parts in the bin run low, more parts are produced
- Badge numbers are drawn randomly
- Batches of 8000 badges are created
- When the parts bin has 800 (10%) of badges remaining, 8000 more are produced
- Badges are produced sequentially (badge 233 is made before 234, etc.)
The X-axis shows the production number of a simulated vehicle while the Y-axis shows the badge number of the simulated vehicle. Note how the first 8000 vehicles is indeed uniformly random. However, after the parts bin is refilled, there seems to be a relatively uniform random selection of 8000-16000 with some small selections in the first 0-8000 batch of badges. With each new batch being produced, the badges from the previous batches are selected at a much lower rate, the older the batch, the lower its chance of being chosen.
The Pearson correlation coefficient was computed for the badge number and their production numbers and a pcc = 0.94 was produced with a 2-tailed p-value = 0.0. This basically shows that in a badging system similar to what the CTR most likely goes through, the badge numbers are indeed heavily correlated despite being chose at random from a parts bin. Additionally, the simulation supports the fact that as the model years increase, the badge numbers are expected to as well. Although it is impossible to predict the production number given a badge number, it's very possible to get a general idea of production year from badge number.
Granted the rules above may not directly be what the CTR production does (batch sizes can technically vary and when they are replenished may be different), you'll still get the very dense "rectangles" as each new batch of badges is made and a pearson correlation coefficient > 0.5 (showing correlation). The earlier the parts bin is replenished, the lower the PCC.
Last edited: