In this episode of Stop Requested, hosts Levi McCollum and Christian Londono speak with Dr. Xuehao Chu, former CUTR Senior Research Associate, about how he revolutionized National Transit Database reporting and potentially saved transit agencies millions.
Dr. Chu explains how he discovered critical flaws in FTA’s universal sampling plans that forced agencies to over-sample while others remained statistically non-compliant. Learn about his development of the customized sampling manual now used nationwide and how his research transformed federal policy.
The conversation covers the evolution from manual sampling to APC certification, clarifying when qualified statisticians are needed and debunking common misconceptions. Dr. Chu discusses data cleansing challenges, the importance of tracking operated versus scheduled trips, and practical advice for agencies navigating NTD compliance. Discover how one researcher’s statistical expertise transformed an entire industry’s approach to data collection.
Stop Requested. Welcome to Stop Requested, the podcast where we discuss everything transit. I’m your co-host, Levi McCollum, Director of Operations at ETA Transit. And I’m your co-host, Christian Londono,
Senior Customer Success Manager at ETA Transit. Welcome back to Stop Requested. Christian, how are you today?
Doing excellent, Levi. How about yourself? Doing very well. It, it’s Friday, and it’s been a good week, uh, a long week.
Um, I’m really happy that we’re having this conversation, uh, today with our, our guest, Dr. Xuehao Chu, who is a former CUTR
Senior Research Associate, an independent consultant, statistician, and overall, you know, expert in National Transit Database.
Dr. Chu, how are you doing? I’m doing very well. Thank you, Levi. Well, this is a real pleasure to have you on. I’m super excited to have this conversation because, uh, Christian and I, as you may know, Dr.
Chu, are both NTD nerds, and we- … love poring through the NTD data on the weekend. That’s, that’s kind of our thing. It’s our jam. Uh, but for those who may not be as immersed in the National Transit Database, can you kinda give a primer on, on what that is and why agencies should care? W- w- how, how did you get to this point where you’re, you’re working with it as well? Well, the, uh, National Transit Database, um, i- i- is, uh
, basically information kind of, uh, uh, central point, uh, for the, uh, uh, Federal Transit Administration, where, um, fund receivers of, uh, certain federal, um, funds to, uh, report, um, all, you know, their transit-service-related data to FTA through the, uh, NTD. In particular, uh, that’s relevant for this conversation, uh, they call the service consumed data, uh, which, uh, include boardings or, uh, on-link passenger trips and, uh, passenger miles traveled, which represent, uh, the cumulative distance, uh, traveled by all boarding passengers.
Now, why they are important, now boarding, uh, the number of boarding typically is measured in terms, or it’s called the ridership, and, uh, typically used to measure the amount of usage a transit service, um, uh, is being used.
But, uh, passenger miles traveled, in this country at least, this metric is used, uh, by FTA in allocating, um, some of the, uh, federal funds, uh, to individual, um, transit agencies or open areas.
Um, so, uh, that’s, that’s kind of, uh, important to, uh, the transit agencies. But in locally, um, even, uh, some states, uh, allocate, um, their state transit funds using, uh, boarding, number of boardings, uh, you know, uh, in their allocation formulas too. So even though at federal level boarding data are not used for funding.
Thank you. Yeah. Well, that’s a, a, a really good explanation. And, uh, you know, I’m glad you’re able to, to, to kind of give us a lay of the land before we get into some of the details of the conversation.
Uh, before we do, I, I just want, I wanna ask because this is, uh, kind of a sticking point I b- know in the industry. Is, is boardings really the right metric to be using to measure, you know, tr- transit success or being able to dole out funding? Is, what is your take on that? Um, well, it, it’s, uh, depends. I actually once did a little bit, um, small, uh, project looking at, uh, you know, whether or not you should use boardings or passenger miles or you should use, uh, uh, boardings and passenger miles linearly in the funding formula or in terms of squared in, in the funding formula.
Uh, you know, depending on the circumstances of individual agencies whether who would benefit most from which way, uh, the, these metrics, uh, enter into, uh, funding formulas. So, uh, it… There’s no single, uh, I guess, answer, uh, depending on who, who you are talking about.
Um, so particular, for example, you know, uh, transit services that provide a lot of long trips, for example, um, or passengers of their service travel long distances, so they prefer passenger mile traveled being used in the formula.
Um, right? ‘Cause then, you know, they would capture a, you know, larger share of the federal funding, for example. So there’s no, uh, simple, uh, answer to your question. I, I see. Yeah, it, it does depend on a lot of different factors, I’m sure. Yes. And, and the geography plays a, a big part in that
. Yes. I g- I can certainly see that. And, you know, I, I failed to mention, uh, Dr. Chu, in your intro, that, uh, CUTR is the Center for Urban Transportation Research.
Uh, that’s in Tampa, if I’m not mistaken, on the, on the campus of University of Southern Florida or South Florida, that is. Yes. So, D- Dr. Chu, could you, could you tell us a little bit more a- about your experience working with CUTR, and, and particularly with your journey with, with NTD, when you started working with NTD data and, you know, how do you get to that point where you were able to propose sampling plans? So could you tell us- Sure. … a little bit about that journey?
Okay. Yes. Um, yes. Um, my initial, um, exposure to, uh, uh, you know, NTD and NTD data was back-… around 1990 when I was still a graduate student at, uh, the
University of California at, uh, UC Irvine when I was a g- you know, helping a, um, a professor there who was doing, uh, transit performance measurement research, uh, using NTD data.
But, uh, my interest and my research, uh, uh, related to, um, the s- statistical nature of, um, uh, and the techniques, uh, for, uh, collecting and then reporting, um, boardings and the passenger miles for NTD, uh, didn’t start until, uh, early 2000s when, um, when, uh, COTA, um,
C-O-T- uh, um, had a, a, a research center, uh, with joint funding from, uh, FDOT, the Florida Department of Transportation. Um, and so
I had a, a series of, uh, research projects, small, but, uh, they found- provided funding. Um, so I started, uh, looking into the, uh, s- so-called the circular sampling plans.
Uh, at that time, um, uh, initially but later in FTA, um, considered those sampling plans as being, uh, approved by FTA.
And, and Doctor Chu- For a ch- yes? … let me interrupt you for a second. So at, at this time, right, like, you’re going back in this timeline.
Yes. And I would imagine that, like, APC sensors either didn’t existed at that moment or they were just not prevalent in transit, so most agencies would have to be doing sampling and just using these statistical, uh, methods for reporting, right? Like, could you tell me a little bit of, of how that landscape looked like?
Oh, yes. Yes, ba- back then, um, back then, certainly. Um, you know, although, uh,
APCs actually were already available at some agencies or being tested, but also, uh,
I think some agencies studying in mid- like, uh, 1980s already studied the kind of using APCs. Mm-hmm. But that, that was just rare, okay, rare cases. Um, and, uh, from c- uh, current standards those, those days’ APC counts probably were considered terrible. Uh, no, I never examined this but
I just imagined the, uh, the technology was, uh, not, not good, uh, enough there then. But, but certainly, uh, very, very few agencies were using APCs, certainly.
Um, so mostly for m- majority, majority of transit agencies, they rely on, um, um, sampling, statist- sampling that collect, uh, uh, you know, trips throughout a year and randomly.
Um, and, uh, they must, uh, make sure that the, the collected amount of data is good enough or large enough to meet FTA’s, uh, statistical requirements, which means that, you know, 10% precision and a, at a 95% confidence levels.
Now, that includes, uh, uh, at that time, includes most, uh, even some… For some agencies, they needed a sample and estimate boardings. Not just passenger mile, but even boardings.
Um, so, and not just the fixed-route, uh, you know, buses and rail, but also even, um, uh, you know, those, uh, demand response services.
Um, uh, in those days, uh, they all rely on, um, sampling, uh, because at that time, FTA also has a se- separate circular, um, for, uh, transit agencies to use, uh, for sampling plans, which are also approved by FTA.
So yes, at that, during those days, everybody was sampling and, uh, but mostly when people were sampling, they would rely on those universal sampling plans offered by FTA.
And so tho- those, uh, plans have some flaws, right? Like, because some, some agencies, uh, maybe that, that would, didn’t work out very well for them or it was a… too cumbersome, right? Like, uh, some of these sampling plans,
I would imagine they will have agencies sampling several trips, uh, a week, uh, throughout- Yeah. … the year and then accumulating all this data to, uh, you know, aggregate and, and, and to kind of come up with some, you know, statistical, uh, numbers for, for the entire system and the operation.
Yes. Um, so could you tell me a little bit about that? Sure. Um, yes. That, that was, uh, kind of the, uh, outcome of a, a research project
I did in terms looking into the statistical nature, uh, and how, uh, the circular method sampling plans were developed, and I find out there are quite a few, uh, short- shortcomings. Um, for some agencies, particularly we are talking about the bus, you know, fixed-route, uh, services, okay? Um, for a lot of transit agencies, using those sampling plans wou- would, uh, mean vastly over-sampling, okay? That means they are collecting too much data for what they need to meet FTA requirements.
But at the same time, for some other agencies, um, they, they actually, uh, did not sample enough, uh, simply because those sampling plans were not customized to the… They do not reflect the underlying varying conditions across different reporters.
Um, so that was, uh, that was contradictory because FTA said that, uh, “What you estimate in the report must meet our 10% precision and a 95% confidence levels.”
But same time, they are providing these universal sampling plans, uh…… that will not meet those, uh, those statistic requirements for a lot of agencies. At the same time, uh, you know, um,
I mean, uh, over- overly burdens some other agencies. Yeah. A- and I would imagine if there was some sort of, uh, you know, sample size or something that, that would not apply to a lot of agencies, it, it would just, uh, make the statistical analysis, uh, futile, right? Like, it, it just makes no sense. They… You cannot have that, that confidence of the data if your, uh, sample plan i- is… it doesn’t align with that, um, you know, confidence level and, and accuracy.
Yeah. You know, the most, uh, kind of, uh, uh, the, the sampling kind of requires the most sampling is, is that, that one, uh, that requires two, uh… for buses. You know, sampling two trips every day.
Um. So, which really means that you need to sample every day. Okay. That’s, that’s kind of cumbersome. You need to do the sampling process every day. Okay.
A- and the same time, you are talking about, you know, if you a- offer your service every day, you are talking about more than 700 trips a year, um, at that time.
So, uh, that’s, that’s- That’s a lot of trips. That’s- Oh, yeah. It’s a lot of trips. Yeah. Especially for those agencies that, uh, also report 100%, uh, counts of boardings. Yes.
It’s a full-time position, right? Oh, yeah. Yeah. I know. That’s, uh… Y- those days, uh, agencies all have their own accountants, you know, for… just for NTD, but also do some accounting for their internal planning purposes. But a lot of their effort is devoted to, uh, NTD, uh, data collection. Yeah. Uh, uh, so, so, um, at what point, uh, you, uh, realized, of course, that those sampling plans were not statistically sound, uh, especially for some agencies? And, and then what happened? How do you go about point that out and then ultimately the FDA changing, uh, what’s the approved sampling plans that we have today? Uh, yeah. Yeah. Um, yeah. Later… After the initial, uh, research and, and the findings, and then I did a couple more research projects, uh, related to sampling and NTD reporting.
And one… Uh, from one research, later research, I developed a customized, uh, uh, method for transit agencies to develop sampling plans that, uh, were similar to, uh, what are offered in the circular plans, but, but they are customized to local conditions.
So, so as a result, uh, um, uh, they, uh… for those agencies they actually, you know, used, used those, uh, that, uh, uh, customized approach would reduce their sampling requirement quite a bit significantly.
Um, so that was when, uh, when the stage of, uh, the, um, the process that, uh, eventually led to the, uh, NTD sampling, um, plan. But later, um, I realized that, uh, these, uh, these, uh, circular type of sampling plans also have problems, which is, um…
I, I call those plans as, uh, interval-based, ’cause it’s a sample every day, every second day, every third day, every fourth day kind of thing.
And one problem is that, uh, because you are sampling a lot of, uh, a lot of days and on each day of sampling, the sample size is small. Okay. So when you, uh, use y- your own previous sample data to develop a customized sampling plan, there’s a lot of rounding in the, in the, uh, daily sample size. For example, if your data analysis tells you the minimum sample size is 1.05, uh, let’s say every second day, but in order to meet the requirements and to be practical, you need to sample at least two every day, every other day, instead of 1.05.
So as a result, there’s a lot of rounding up and… uh, which end up with a much higher annual sample size than what you need. Even with customized approach. So later, a- another research effort, I, I realized that we need to do, uh, sampling, uh, on a, a longer interval.
I call them period-based sampling plan, which is like weekly, which is not too… still bad.
Or monthly and even quarterly basis. Um, uh, those, those period-based sampling plans, um, offers, uh, much more advantage, uh, to agencies when, when they take, uh, uh, advantage of those customized approach for, uh, reducing their sampling burden.
And, uh, around that time, um, I, I don’t know through what channel, um, Carter, including me, and, uh, uh, some other, uh, colleagues, uh, went to, uh, FTA, uh, the, uh, NTD office, uh, did our, uh, uh, couple… Two or three, uh, different presentations on the research we’ve been doing related to NTD, including, uh, in my sampling, uh, statistic work.
Um. And, um, at that time, there was no- This, this, this was in 2002, right? Yeah. Go ahead. This, this is, uh, when you went with Carter to the- Uh, uh, well, initial- … 15- Yeah. Initially studied the 2002.
Um, I forgot, uh, when was the, uh, uh, presentations, but yeah, like within, uh, one year or two around that time.
Uh, yes. Um. And, uh- So, so you, you go see him in… I’m imagining, picturing this, this is, uh, in Washington, uh, where the FDA office is? Yes. Uh, and, and then, uh, you’re coming to them and doing this presentation and pretty much showing there’s some, uh, statistical flaws, right? With the plan- Yes. … and, and the impact. Yes. Uh-… that it-
Yes. … of course it has for all the purposes they’re using this data for- Yeah. … and also that, that some of the data is not statistically valid, so it’s, it’s kind of like bad data.
So, what’s their response at that point when, when they’re seeing this presentation? Well, to tell you, I don’t under- remember anymore. Um, but, uh…
But, uh, but I think, um, if, if I remember anything, I don’t think they were quite a bit r- kind of at least at that time during the presentation, um, there was an obvious kind of reaction. Uh, maybe they were expecting. You know, they knew already that, uh, those, uh, universal sampling plans were not good statistically for all agencies. So, you know, once the federal government adapts something, you know, as a policy, they don’t really change easily.
Um, so yeah. So, even though I, I’m… My guess is they knew, uh, the problems, but, but i- i- it wouldn’t be that easy for them to change from what’s the circular method, circulars off of versus something later they adopted. So, what happened then? So, uh, did you only present the flaws, uh, with the sampling? Or was it like, “This is what we learned, and this is what we wanna point out that is deficient, and also this is what we believe could be a better way of sampling.” Uh, uh, is that how that came to be?
Or was it more like they asked for you guys to provide a proposal of what would be- Oh, no. … a better sampling plan? No.
Um, ’cause at that time, you know, I already did my re- later research, offer the solutions, including in a customization, peer-based sampling plan. I also evaluated the different sampling techniques and how much, um, samplings, uh, sample size may be reduced, um- Right. … so for various different sampling techniques. So, you know, uh, those, those kind of mostly I discussed, offered. So, they didn’t directly invite us or offer this, uh, you know, research funding, uh, “Oh, go ahead and with the money, go develop a, uh, a sampling, uh, uh, you know, a rep- sampling, um, manual for us.” No, they didn’t do that. Uh, but that was something later. I, uh, got a, uh, another, uh, research project through, uh, you know, the joint, uh, the research center with FDOT, um, which was to develop the, uh, the NTD sampling manual.
Oh, wow. Okay. So then- Yes. … that’s how it, it, it was created for the, for the entire country, right? Well, yeah, funded by FDOT, but eventually, yes, eventually, uh- Spread throughout the country. …
the, uh, NTD, uh, FTA’s NTD office got involved. And, uh, and, you know, I sent, uh, the draft, uh, for them to review. They even actually, um, when I was developing it, they, um, you know, suggested I write the manual in, in a nontraditional so-called a- academic fashion. So, we ended up with a Q&A format in, in terms of writing the, um, the sampling manual.
So that, uh, that was, that, that was… And then once they review, that they feel comfortable, um, so it was sent to them back, back in early, I think, 2009.
Mm-hmm. Um, but, you know, it was sitting there for a couple years. They, uh, started, uh, adopting the manual, uh, in 2011, um, uh, on a, on a trial basis, I think. And they adopted it formally in 2013, I think. Oh, wow. And, and what is it like for you seeing, uh, you know, your work, uh, being a- adopted by the
FDA and, and widely used across the country by different agencies? Uh, I mean, that new manual and, and their reduction even for some agencies in the amount of sampling and effort, uh, but at the same rate increasing, right, the, the, the accuracy and reliability of the data, it has to be very, uh, fulfilling, right? Like, you know, something to be proud of. So, so how did, what is that like for you? Oh, yes. Yes. I, uh, uh, I always enjoyed, um, y- you know, working with transit agencies in terms of h- helping them in finding the best sampling plan for them, which means that, uh, you know, minimize their sampling burden, but at the same time meet FTA’s statistic requirements. And I certainly find very much joy in doing that. And including, you know, developing this, uh, sampling manual that potentially can, can be, uh, uh, helpful to allow more agencies than I, when I do my consulting work, which is one-on-one kind of thing.
Um, but the manual can be used by more agencies at the same time. Mm-hmm. Yeah, and, you know, you really have a lot of influence over the industry in, in that way. And, you know, your work has spanned for several decades now.
Uh, you know, I’m really curious, just going back to some of the how was it before, and, you know, think back to the NTD when you first started,
I, I imagine that it was probably a, a pain to be able to get some of the, the NTD data.
Um, I, I think agencies were reporting, uh, on paper at that point and would have to mail in their r- report. Is that correct? Well, the history, of course, uh, it’s, it’s been, uh, it’s quite long. Um, actually, I don’t have, uh-… exact, uh,
I don’t r- remember the exact mechanism that the agency submits, um, but certainly there was, uh, certainly the web-based a- ap- approach was not available. And, uh,
I, at one point probably Excel-based submission. But even earlier, i- it’s probably, I don’t know exactly, yeah, probably paper, uh, paper-based, uh, uh, reporting, so it’s a messy, uh, process. Uh, also involves probably more manual, uh, entry of data and, uh, which means more likely errors in the, uh, the, uh, NTD data too, um, that, uh, eventually submitted. So, uh, nowadays everything’s so simple. Um, but not only that but also once the data submitted,
FTA nowadays, you know, uh, ha- has made the data available, accessible so easily, um, uh, especially the monthly data at least. Okay. The manual annual data also they have it, uh, made available online, uh, every, uh, every, like, around toward the end of, uh, of next year, I guess, they, uh, make the, uh, data from the previous year available online too.
Um, so yes, things are much simpler not only for submitting the data for ind- individual agencies, but also users of, of the NTD data in terms of access to such data. Yeah. And as a researcher, y- you know, I have to imagine that it, it’s quite a bit different from, uh, my experience with it or Christian’s experience with submitting the data to the National Transit Database.
Wh- what was that experience like, uh, previously? Uh, how did you, uh, c- you know, come to find the NTD data? Whe- where did you get it? Um, you know, what, what did you do with it then? Wh- what were you able to do? I’m sure that, you know, it, that it’s changed as to what you actually submitted. I-
I’m sure, like, boardings and lightings, some of those data may have- have been on there. But, uh, you know, I’m, I’m curious what it looked like then. Well, uh, but if you are talking about, you know, sampling and the sample data and how the sample data used, uh, in, um, getting to, uh, the estimated, uh, annual total boardings and, uh, passenger miles, when, when people use the circular method, you know, the circular, um, the circulars had a paper kind of table. I mean, uh, you know, uh, printed tables wh- where they, they suggest, uh, you know, every, e- every, every, every time you, you, uh, collect the data, uh, uh, like from every second day and collect data and, and enter the data i- i- into the table and they accumulate, uh, until the, uh, year is over and then you get the annual total bo- uh, sample data. So it’s all manually in a kind of larger paper format. Of course, when agencies actually do, they actually, uh, in, in, uh, later, of course, Excel, but early on is what’s called, uh, um, what’s the, uh, Excel, uh, not Excel, the another, uh, software.
Um, so yes, it was very cumbersome, uh, in terms of how, uh, how they, uh, w- work with, um, with the sample data to, uh, to come to what they need to report to NTD.
Um, so we, we, you know, later we are gonna talk about APC. So I think FTA has been making their best efforts to simplify and reduce the, you know, the reporting burden for agencies when, when they see possible, I guess.
And you mentioned APCs or automatic passenger counters. We’ve, you’ve covered that quite a lot in our discussions here on the podcast. But, uh, for the listeners who may not be entirely familiar with the a- automatic passenger counters, they’re the devices, uh, previously and I think maybe s- to some degree still are infrared, now moving to more camera or, or optical, uh, APC sensors that count people as they walk on, onboard the transit vehicle or, or walk off.
Uh, how have APCs changed the way that agencies sample? You know, I would, I would think that, you know, you get your APC certified and there’s, there’s really no more sampling i- if that’s the case, right, as far as, uh, your passenger miles traveled go? Or do
I have that wrong? Um, well, um, that, that depends on the agency what they choose. Uh, if you, if you read into the, um, the new FTA’s new policy on using APC data for NTD reporting,
FTA actually allows you to, uh, use, uh, APC data in different ways. Okay. Depending, uh, part of your condition, uh, agencies, uh, conditions, but also depending on the preferences.
Um, for example, uh, if, if agencies want, they can still sample.
Okay. They, they, uh, they sample trips and, and then use the APC data and then, and then do estimation, uh, from the sampled, uh, trips.
there are, there are Um, considerable statistical shortcomings of doing that, but that’s FTA allows, uh, that sampling approach. The difference is that the data collection part is through the APCs rather than manual ride checking or through cameras, onboard cameras.
Um, but, uh, the other extreme of this, uh, using a sample data for using APC data is, is the, um, reporting 100% boardings or passenger miles using APCs…. which, um, um, in most cases, uh, the agencies need
100% coverage of APCs on their fleets. And more importantly they, they need to have at le- more than 98% of the operated trips having valid APC data available for reporting, so that they can simply scale the, um, the 90 or whatever percent of, uh, APC data available to represent 100%.
But they can report that as, uh, 100% boarding, you know? It, uh, NTD Form D10 has, uh, 100%, uh, method, um, for both boardings and passenger miles. Now in between, which represents the vast majority of cases, uh, what th- this means is that, uh, uh, you know, the reporter will take all available valid APC data, which may mean, you know, let’s say 70% of trips with valid APC data, or 95%, or even lower than
50%, whatever. Um, then, in that case, the FTA requires the agency to have a, um, qualified statistician certify a method so that the agency can use this method to fill, um, in the gap, uh, left, uh, out by the, uh, those trips without valid APC data so that, um, so that they can scale or, or expand the available
70% or whatever, 50% of trips with valid APC data to represent, uh, you know, system-wide, um, ridership or passenger miles.
Um, so there’s a range of methods that’s available to, uh, transit agencies. And, um, they can choose based on their preference or their, you know, circumstance. And then you’re able to f- fill in or estimate the, the rest, right? That if you only have- Yes. … 70%, you tossed out
30%- Yes. … you still need to make up for that, that part that you threw out. Sure. So, you’re using the data that you have to inform what the other 30 would possibly look like- Yes. … or as close as possible. Sure. And that, you know, there, uh, there may be different ways of doing this, uh, filling or expansion or scaling up, uh, the available valid APC data to represent, uh, the entire system level, uh, ridership.
Um, certainly you, you don’t want to simply add up, you know, the boardings and passenger miles from the 70% of trips for which you have valid, uh, APC data.
If you do that, you’re gonna end up with significantly underrepresented U.S., uh, ridership and passenger miles data. And, uh, certainly, um, agencies don’t want that. FDA- FTA doesn’t want that either.
So, you need to expand. And, um, in most cases, uh, the typical method of, uh, doing expansion of, uh, using all available valid APC data is to break, to stratify, whatever, you know, to divide your service into granular sections, you know, pieces of your, um, your service and then calculate averages within each small sections of the other stratified, uh, portion of your service.
Calculate the average boardings per trip and average passenger miles per trip within each cell, and then multiply those averages by the total number of trips you actually operated for the same cell.
And then you add up, um, sum up all the expanded, uh, you know, uh, the, uh, boardings, passenger miles across all the defined cells, uh, sections of your service, to represent your annual total boardings and passenger miles.
Um, now, why do that? Why, why agency FTA requires, uh, the involvement of statistic- uh, a qualified statistician in certifying such method?
The main concern FTA has is the following. Because, um, we don’t know, the agency don’t know, nobody knows really the pattern of, uh, the trips that, uh, the agency end up with no valid APC data. We, we, you know, in pre- pre-, uh, you don’t know what will happen.
Eh, the, the trips with no A- APC data may be concentrated on certain routes or certain time periods.
You know, it may not be randomly or even, uh, representative or distributed across in your services. Just give you a simple example.
The typical conc- uh, perception is that, uh, when buses are crowded or, or certain stops with a lot of, uh, boardings occurring, the counts, APC counts typically are less accurate, okay, than other, other cases.
If that were true, um, then th- intuitively, probably that’s true. But if that actually is true, then…
And if you, you’re not careful in how you use the available APC data, for example, if you calculate an average, okay, system-wide average for all the 70% of trips with average, uh, with valid APC data, so you’re calculating the average system-wide and expand that over average by, you know, system-wide total number of trips operated, you’ll get an annual total, right, boardings and passenger miles.
The problem is that the average you calculate at the system level is likely to be biased because the pattern of missing, uh, APC data is not randomly distributed or representative or distributed across the agencies…. um, in- in the example I mentioned, uh, uh, you know, uh, crowded trips are, are more likely to have bad data.
So during the cleansing process of the raw APC data, those trips are more likely to be, uh, thrown out. So as a result, the crowded trips are, are less represented in the valid data you use for calculating average, so you may end up with lower averages than actually is.
So the expanded, uh, annual total of boardings and passenger miles as a result will be lower too, if, you know, that- for that case.
So FTA wants a qualified statistician to look into the pattern of, uh, individual transit agencies in terms of the pattern of missing APC data and come up with a method that, that can take care of, um… or not take care of, uh, that would, uh, uh, most likely to mitigate any potential bias due to such non-random, uh, pattern of missing APC data in, you know, in the, um, expanded annual totals. So you’re speaking there to the
APC, uh, uh, you know, the… and, and the sampling plan. Christian and I have been doing some work with some agencies recently
Uh, where we’ve had to get their APC certified, and- Yes. … a l- a lot of folks, I think, have the, uh, misunderstanding that a qualified statistician is needed for doing the APC certification. Can, can you explain and maybe demystify- Oh, yeah. Sure. … some of that for us?
Sure. Yeah, sure. Um, yes. Now just, just, uh, briefly mention it that the older, uh, FTA policy on using APC data for NTD reporting did require a, uh, random sampling, uh, plan for collecting, uh, manual data across whole year, okay, whole report year.
Um, and then because the sampling required, uh, uh, involved, random sampling involved, so at that time, uh, a qualified statistician was needed to come up with a sampling plan for the agency to, to select the trips for, uh, you know, getting a, a f- f- uh, you know, APC system certified by FTA.
But with the new policy since, uh, formally adopted in 2019, random, uh, trip selection or annual coverage are not required anymore.
random, So as a result, um, agencies can select, um, the sample of trips for getting their APC system certified by FTA.
Um, does not require, uh, you know, certification or involvement by a, a qualified statistician uh, versus, uh, the method for them to expand, um, you know, all valid, uh, APC data for ongoing reporting or espe- annual reporting to generate annual total boardings and the passenger miles if they do require such a method to be certified by a, a qualified statistician. So yes, there is a significant difference. Now let me just mention, uh, one thing though.
Uh, when agencies, um, do their, uh, certification process, do manual data collection comparing the manual data with matching APC data, they need to prepare a, a… call it a certification report and submit to the NTD validation analyst, uh, for FTA approval.
A portion of the certification, uh, report they prepare needs to describe, uh, in a kind of brief format what are the, uh, expansion method they gonna use to generate the annual total boardings and passenger miles. So in that regard, so the certification process does involve a little bit in terms of what, uh, the, uh, qualified statistician, uh, eventually, you know, produce or generate for them and certify for them in terms of the expansion method. But otherwise, no, qualified statistician is not required for getting the APC system certified.
That, that’s good to know. And, and it’s been evolving. Um, more agencies are looking after that certification. It’s, it’s my opinion or my observation from what I’m seeing, you know, more agencies are getting to that level of certification and, and of course doing the recertification, uh, when it’s needed, like this year. This is the year agencies around the US, they’re completing their, uh, recertification.
Yes. So when it comes to NTD reporting, um, it could be for APC certification or it could be for the sampling, uh, methods, uh, for NTD reporting, what would you say are some of the common mistakes, uh, that, you know, agencies, uh, you know, incurred or, or you observe, and then what are some best practices and, and recommendations that we can, uh, give to transit agencies and transit professionals that deal with NTD reporting? Well, that’s, that’s a that’s, that’s a quite involved question.
Um, well, um, if, if, if you mention sampling, you’re talking traditional, uh, sampling with manual data collection, yes. One, one certainly, uh, um…
One, one problem… One issue I still continue to see, uh, in, uh, in transit agency who, who are still using, uh, traditional sampling method for estimating passenger miles typically, they are still way over S- uh, sampling. Uh, just, uh, uh, year, last year, year ago, some, uh, one agency I was, uh, dealing with, they are still sampling two trips every day for their bus service.
Um, you know, um, so I, I, I was really, uh …
I was really, really surprised, truly surprised when I, I learned that. Um, so, uh, yes, uh, uh, no, that, that was unbelievable. Um, the problem is that, you know, peop- staff at individual agencies who, who are involved in such NTD issues, uh, you know, they are typically new to the process, okay? And they, they just take what the previous person was doing, okay?
And this is ’cause they are new to a process. So, the- there’s no way that, uh, they, they, you know, would, uh, kind of, uh, be aware of, uh, new process or, or, or less burden pro- uh, process, you know, things like that.
So, I, I can’t explain, uh, uh, understand why that’s the, uh, that’s the case. Um, um, now in terms of, um, in, in terms of APCs, um, well, it, um, you know, the most difficult part I, I find is, uh, really, i- is the cleansing, you know.
FTA has no requirements, uh, if how agencies cleanse their raw APHIS data, okay?
Right. And it all vary, you know, with the vendors. If vendors are involved, the APCs vendors are involved in cleansing the APC data, they all, they have their own procedures or if, if we are, um, reporting software, uh, involved in the cleansing, they, you know, they have their own processes.
Um, they all, they all vary. And I try to, uh, avoid, uh, getting involved in, in such things because just it’s- … too messy.
Too, too messy, um, yeah. There’s no standards for that o- out there just yet, huh? Yeah. No. For how, how the vendors handle the reporting and, and the processing of the APC that ultimately power and generate the reports that agencies use to submit the numbers, right? Like if the agency just trust the number that comes out from, you know, the systems after certification and, and submit it.
Yes. You know that, that actually, what you just said also relates to how n- uh, uh, the agency or reporter implements, uh, a certified expansion, uh, method for ex- generating, you know, annual totals from all valid APC data.
Uh, you know, depending on the size of agency whether or not an agency has technical staff, uh, uh, available. Uh, you know, a minimum they can, you know, use, uh, Excel, uh, based implementation just to, you know, implement an expansion method in, in Excel environment, you know, uh, and, and that they, you know, download, uh, put all valid APC data in a year, whatever on the months, uh, basis, uh, into an Excel, uh, file. And, and just, uh, calculate the averages and expand the average based on the certified expansion method. That, you know, that’s the basic, uh, um, approach. But, you know, it’s not ideal, okay? Because you need to do it, uh, every year and all that.
Um, I’ve seen some bigger agencies and, uh, who have, you know, technical staff is to program the certified expansion method into some scripts. And then, uh, by the time they need to, uh, do reporting a- and expand the, uh, every APC data, just run the scripts and, and, uh, do that, you know, which more, requires more technical staff. Right. The m- the, the, the third option, which is I think will, will be ideal is a reporting software that will implement the, um, the expansion method as part of the reporting software.
Then, you know, it’s more like a, you know, push button kind of approach for the agency, right? “Oh, give me the master ridership, you know, for the, for July 2025.” You know, “Give me, uh, annual total, uh, boardings, passenger miles for fiscal year 2024.” You know, so anyway, you have a range of, um, uh, possible ways of implementing the expansion method and, uh, exact approach taken certainly depends on the individual agencies. Yeah. In, in general, we see the, the need for more, uh, standardization for a lot of these things, uh, just to help us all, you know, uh, using the same methodology, right?
Mm-hmm. Uh, because even we talk about funding formula and the importance of accurate, uh, NTD reporting and the importance of reporting for agencies when it comes to, uh, you know, federal funding.
So, so the, it’s, it’s important that we start using more standardized methodologies because, you know, it, it almost, uh, tells me that even different methodologies might even be more advantageous or not to agencies when it comes to reporting numbers that will give them slightly higher funds.
So, definitely it’s important for agencies to take a look at, at their processes, where they’ve been, where they are, and any updates or opportunities to, uh, to make it better.
Uh, I was gonna ask you, I, I’ve noticed, uh, speaking with different agencies, there’s oth- you know, some agencies in communities that are growing that are getting to that point where they have to, uh, start reporting as full reporters.
Um, so just as a last question, I wanted to ask you, what would you tell a small agency that is just starting with NTD reporting? Um, you mean they never reported before or they just changed from a reduced reporter to a full reporter? Yes. And, and maybe related to, uh, you know, sampling and enhancer requirements, uh, you know, that they might see it as- Sure. … something complex and, and, you know, what would you tell them? Well, yeah. Certainly, that’s, that’s, uh, that’s, uh, probably is seen as a, as a big kind of, um-… tech, uh, difficult task for such a case, especially new, you know, uh, new agencies or new to, uh, NTD reporting.
But, but those, those are rare, right? Um, you know, entirely new agencies probably are new, but y- you know, uh, they… of course, the- they… it’s possible that, uh, they… o- currently, they only have, let’s say, demand res- response service, but they… later, they, uh, they, they will, to local needs, they may a- add a bus service, you know? That, that kind of new in terms of a bus service, then, uh, then certainly that’s, that’s possible. Now, in terms of, uh, um, yeah, difficulties, issues they, they would face, um. Certainly one thing they, they need to, uh they would face is, uh, the choice between
, following, uh, traditional, uh, sampling approaches and manual data collection or, or video-based, uh, data collection a- and estimation, versus, uh, you know, a more techy related like APCs.
Um, so that’s something they, uh, they need to, uh, um, to choose, uh, probably initially, um, uh, you know, but, but wha- whatever they choose, um, for sampling, they need to start with a sampling plan that’s, uh, kind of approved or meet FTA requirements. And, uh, for really new, uh, reporters of a service, they can go to the sampling manual and use, uh, the, uh, appropriate, uh, they, uh, called ready-to-use sampling plans, uh, in, in the manual for bus or for, for rail, for demand response, um, for one year, and then, uh, once you have collected the manual data for one year to future… For future report years, you can, uh, use your previous sample data to develop a customized sampling plan, um, using the, uh, NTD sampling template. That’s one approach.
Um, so that’s something they, they need to face. Pick a sampling plan certainly for new, uh, reporters, um, if, if we’re using the sampling plan approach.
Now, for APCs, um, um, i- it’s, it’s more involved. Uh, uh, l- let me just mention one thing. I, I don’t know if you guys agree or not, but I’ve seen, um, and actually certified expansion methods for a few, uh, agencies who are reduced reporters.
For re- reduced reporters, they only need to report annual total boardings, okay? No monthly boardings, no passenger miles whatsoever. No annual daily averages even, uh, for annual reporting. So only annual total boardings.
For me, i- if NTD reporting is the, uh, only purpose of an APC system, then I’m not sure, um, it makes sense to use APCs for reporting annual total boardings only, simply because you need to do, um, triannual certification of the APC system, okay? You need to do the data collection every third year, and then write the certification report, getting FTA approval every third year, and then you need to have an expansion method certified by a qualified statistician. Although that’s not, uh, every third year, but at least initially.
If you just use your farebox, uh, uh, as- assuming your fareboxes are good a- and you can get a good, uh, data from your fareboxes, uh, in terms of boardings, with no certification requirement, uh, no periodic recertification, no expansion method, just use, uh, you know, farebox counts from, uh, throughout the year to accumulate annual totals, and, uh, much simpler, um, but of course, if, if the agency needs, you know, APC data for, for own purposes and they want to make sure the APC data from their, uh, their system, uh, are good enough for their own planning purposes, well then they may want to, uh, go through the FTA requirements, uh, for APC certification.
That serves both as the NTD reporting purposes and the local planning purposes. Um, but if just for NTD reporting purposes, I, I don’t see it’s worthwhile for reduced reporters.
Uh, I would agree with that. Uh, if you’re a reduced reporter, uh, believe that means that you’ve got under 30 vehicles that you’re operating in max service.
Uh, correct me if I’m wrong there. For system wide, yeah. System wide. System wide, yes. Mm-hmm. Yes. Yes. Y- yeah. So that’s actually VAMS, that’s Vehicles Available Max Service, of 30. Yes. Then, yeah, that, that doesn’t really make a lot of sense, right? Because you’re adding a lot of extra burden on your- Oh, yeah. … agency staff and- A lot. … you’re probably already pretty limited anyway. Exactly. Uh, one person, one, one staff person already s- you know, serves multiple roles, you know, uh, within, uh, for those small agencies.
Uh, w- so, Dr. Chu, I, uh, worked at an agency a-… Actually, uh, I believe this is when I first met you when I was at over in, in Southwest Florida.
Yes. You know, we had APCs and- Mm-hmm. … we, we used them for, you know, planning amenities and, and such- Sure. … uh, you know, internal purposes, but we never got them certified.
Uh, I think partly because people said that they were just inaccurate, right? They were- Yes. … infrareds and you can’t really trust them. That’s, that’s- Yes. … what was kind of going around at that time.
Yes. Um, d- wh- what do you… What are the drawbacks of doing it that way? Um, you know,
I can imagine some, but I’m curious to hear what you would, you would think some of the drawbacks are regarding, you know, just using that data in any capacity and, and not having it certified, uh, you know.
Oh, oh, using it anyway, but at least you mean talking about local, for local purposes? Yeah, for, for internal purposes, not neces- you know, of course, you wouldn’t be- Yeah. … able to report it or you, you shouldn’t-
Yeah. Yes, right. … report it, right, to the- Yeah. … National Transit Database- Yeah. … to the S10 or anything like that. Yeah. Um…Well, yeah, that- that’s- that’s, um…
Well, le- first let me s- say that, uh, it’s not really uncommon for an- Mm-hmm. … AP system never gets certified, uh, uh, good enough to, uh, to get FTA certification. Um,
I, I believe, uh, uh, Sarasota, uh, is another case. They’ve, they have had a APC for many years, I think.
Well, not, uh, I don’t know in the, in the last couple of years, but they, uh, didn’t… They, uh, were sampling for many, many years when, uh, when they had APCs.
Now, in terms of, um, having a APC system and, uh, but cannot get, uh, the APC system certified for NTA reporting purposes, but still use the data from the AP system for local purposes, you know, that- that really depends. Uh, and, um, also, you know, if the agency can do some validation of their own with their own procedures, not necessarily, uh, FTA is a process, for example, you know, for getting the system certified by FTA, you need to get, uh, um, uh, the difference between the manual and the APC data to be less than 5% by, uh, for boardings, well, mostly for boardings, for local planning purposes, um, for a sample, um, of certain size.
Uh, now, the agency may say, “Well, I cannot meet that requirement, but maybe for my local purposes, maybe 10% is good enough if the, uh, if the- the difference for a good enough sample, uh, is, you know, for me, for my local purposes.”
Well, say, you know, they can do that. Um, but if the difference from, uh, uh, in a reasonable sample is 25%, certain property don’t want to do that, uh, for any purpose.
Uh, but- but- but i- it’s not within the boundary of FTA’s requirement, but, you know, not far and, uh, good enough for local agency purposes, then I think that’s fine.
Um, but certainly, I- I think I- I agree with you to ex- some extent that, uh, they, agencies, they ha- they have not been able to have their system certified by FTA, but want to use the AP system data for their own purposes. At least they want to know in th- through some procedure the- the level of accuracy of what they are getting.
Um, then, um, then they, uh, they, you know, they- they can feel more comfortable anyway in using data, you know, if, you know, they know something, uh, on that. Yeah. I- I appreciate that answer and, uh, additional context, Dr. Zhu, and this has really been a fantastic conversation. I- I love talking about this topic. I definitely not nearly as an expert as you are, but, uh, it- it is, it’s always so enlightening to have a con- a conversation with you about the National Transit Database because every time
I learn something new. So I- I wrote down a few takeaways here. Uh, just, you know, add on to these if- if you feel that I didn’t capture it all, but
I’ve got- Sure. … uh, you know, APC certification, uh, does not require qual- qualified statistician, but if you’re expanding on the data due to invalid trips, let’s say-
Mm-hmm. … then that would require a qualified statistician if you’re using, you know, some, uh, plan that’s, uh, that’s not directly from the FTA manual, right? If there, if there’s some change into the procedure, uh, then you’re gonna need a qualified statistician.
Be careful of over-sampling, uh, because you’ve- you’ve seen that in your, uh, in your work. Um, NTD has no specification for cleansing of, uh, the APC data.
Mm-hmm. And if you’re a new reporter, you’ve- you’ve got to pick, right? You’re a new full reporter that is, you- you need to pick a sampling plan or if you’re going with your, uh, with an APC, a more technologically advanced, procedure of being able to- to capture the information.
uh, Anything else you want to add to that? Well, let me, uh, let me, uh… Instead of calling it a takeaway, let me just mention one thing I think is very important in, um, using APC data for NTD reporting, which seems like, uh, consultants or reporters themselves don’t really pay enough attention, which is the following. Uh, uh, as I mentioned earlier, in order to expand the averages you calculate from all valid APC data, you need to expand this average by, um, all trips you actually operate, okay?
Um, you know, depending on how you stratify your service, um, uh, you know, so you need to, uh, have s- the count of all operate trips for each of the- of the cells you define for expanding your APCs.
Typically, f- with my experience in- in- in the last, uh, few years, since 2019, most agencies don’t really have a, uh, good handle, um, on, uh, tracking and recording all trips operated.
Um, sometimes they don’t even have an answer in terms of total number of trips operated. R- uh, uh, uh, you know, and if- if you ask them for individual trips, it was, you know, individual trip information for all trips operated, they- they don’t really have a clue.
Um, but really that information on all trips operated is necessary and is- is required to expand all valid APC data. So, and sometimes agencies may just use the scheduled trips, but that’s a no-no because you’re gonna overstate the expanded annual boardings and passenger miles.
Um, so that’s- that’s a very critical, I think, uh, shortcoming in the current process of, uh…… of using APC data for NTD reporting. Um, I’ve seen that, uh, in many reporters that I have dealt with.
Yeah. A- and that’s an excellent point that y- you know, you need to be able to differentiate what’s scheduled versus what is actually operated. And- Yes. … oftentimes there’s a delta between those two things.
Well, Dr. Chu, this has been a, a great conversation. where can people learn more about you and your work, you know, your long history working on, Uh, uh, NTD r- related research, um, and also your consultancy? Well, yeah. I, um,
I kind of left a, the, a full-time job, uh, from CADL at the University of South Florida, you know, for about, uh, seven years now.
Since then, uh, I’ve been doing, you know, some consultant c- consulting work, uh, related to the topic we are talking about. Uh, sampling plans or APC expansion method, and serving as a qualified statistician for agencies, um, for a few years. Um, um, yes. And, uh, I’m still doing, uh, work on that when, uh, people a- are interested in, um, and, uh, believe that I can help with them. And, um…
So I don’t really have a website. Um, so, uh, if they’re interested, uh, sometimes just… They just get my contact from other agencies’ reporters or, you know, search my name, uh, maybe some things may, may come up.
Um, so I don’t have a website, so I don’t know what, uh, to, uh… oh, maybe, uh, email is easy. Uh, it’s ntd.certification@gmail.com. So…
Excellent. So, uh, they can reach you at ntd.certification@gmail.com? Yes.
Okay. Excellent. Well, Dr. Chu, thanks again. Really appreciate your time today. Thanks for g- you know, giving us this, uh, this great background and, and also some helpful advice for transit agencies that are interested in, in those sampling plans. Well, thank you, Levi and Christian, for the opportunity.
Thank you, Dr. Chu. Thank you. And to our listeners, we’ll be back next Monday. Thank you again for listening. Thank you.