Fluoxetine,
I think that you have your own ratings is really cool and brings something to the boards.
That said the first rule of programming is still GIGO (not directed to your rankings). So in evaluating any of the programs the issue of the quality of the algorithm is a legitimate topic of conversation.
It may be math, but whose math.
I an not interested in rating teams relative to their efficiency ratings. I am interested in wins and losses. And there are teams that lose even when efficient and others that win despite being inefficient. That interests me.
There are teams that play dude to the level of competition, some always win, some not. The latter should be down graded not the former. Etc.
So lots to disagree with regarding rankings, algorithms, and the computers well beyond bias per se. It depends on what you -- and the program -- value as important.
Similarly which of those factors are used to assess SOS (record, home/N/away, efficiency, margin) may matter. The broader the level of consensual validation, the more likely it is accurate, but only within the range of the algorithms contributing.
Wonder if
@SkilletHead2 has thoughts
Loyal
Thanks for the invite to comment, Loyal!
Way too long an answer below. Short version. It depends strongly on what you are trying to predict (W/Ls or magins), whether you want to rely heavily on last year's results (which are highly reliable but not super valid), and how you want to treat playing weak teams and injuires/illnesses. I have to say that I sometimes look at Kenpom and Sagarin and have a real WTF moment!
Longer version (and happy to answer questions):
I love analytics and probability, and do a fair amount of it in my day job. Whether a team is going to win a game or not is not actually dissimilar to whether a test-taker is going to get the next item on a test right or wrong in a computer-based system. And I work a lot in that arena.
The folks who make the computer models are faced with a number of choices, assumptions, and prior information in setting up their models. They also have to decide just what it is they are trying to predict (win or less, margin of victory -- you can bet on either). They also decide if they are going to start with estimates of ability of the teams, and how much they are going to let this season's wins and losses (and margins) influence changes in those prior estimates (called "priors" in Bayesian stats). An important point here is that there isn't so much rights and wrongs as there are choices, goals, and whether you want to pick the winning team as much as possible or be within the spread as much as possible.
Another factor is recency versus primacy. That means are you more interested in what they've done in the past five games or where they started the season at? And then, and this is what kind of kills everything, what about injuries, a player getting COVID, etc.? Do you factor that in or just go with the data from the scoreboards?
So, here are some of my thoughts. If you play 12 teams who are middling tough and go 6-6 and another team plays 6 patsies and 6 teams in the top 10, and they go 6-6, how do you compare those two? In the testing model I use, you look at the current estimate of your ability and the difficulty of your opponent (or the test item), combine them in a reasonably straightforward mathematical model, and then come up with a probability of getting the next item right. The estimate of ability that I use is dependent on just two pieces of info: how many you got right (total wins), and the ability of each team you played (or how tough the items were). Whenever we see an anomaly, we look at what happened, because highly unusual events should not occur often (a player was sick, an examinee made a lucky guess). We see too many of these, and we get really suspicious (that is to say, cheating).
For me, I think the fairest way to do this is to start out with everybody equal or maybe with just three or so levels of prior estimates of ability. Then, each week, you look at the data, adjust teams' ability estimates, and then continue to re-run the data until it "settles." This is doable, but I'm not sure how many programs do it. I have a buddy who is very good at this and did it for the NFL for a couple of years and had a really nice set of estimates going. I'm not a good enough computer programmer to do that, nor do I want to invest the time in it. Skillethead Jr. is a much better programmer than I am, but he's got a job and a family. But hey, I've got a nephew who just invented a VR game called Gorilla Tag that has over 2 million downloads. Maybe I could talk him into it!