White paper · Metodologia editorial
How TubeLens evaluates videos
There are dozens of metrics for ranking YouTube videos — almost all of them reward popularity. TubeLens uses none of them. Every video receives three independent classifications (the TLR system): the Lupometer (editorial quality in 4 states), the Suggested Age tier (5 levels of suitability derived from 12 content signals), and the Editorial Seal (3 seals on disclosure, sourcing and opinion/fact separation). This page details what the AI looks at, how it weighs evidence, and why it lands on a verdict. No engagement signal (views, likes, subscribers) enters. Everything public, auditable, no black box.
Methodology last updated: May 2026.
What does NOT enter the score
To be a real alternative to YouTube's algorithm, we deliberately decided what stays out of the score.
Engagement signals excluded
- Views — popularity isn't quality.
- Likes and dislikes — manipulated metrics (bots, brigading; public dislike count removed by YouTube in 2021).
- Subscriber count — channel authority does not imply individual video quality.
- Comments — engagement, not content.
- Thumbnail and title — can mislead; what matters is what the video delivers.
- Watch time — algorithm metric, optimized for retention, not quality.
Why exclude all of this?
These metrics reward what goes viral. TubeLens's thesis is precisely that going viral isn't the same as being good — sensationalism, clickbait, and rage bait score high on all of them. Deep educational content from small channels scores low. Incorporating any of these into the score would turn us into a mirror of the algorithm — we'd lose our reason to exist.
Goodhart's Law: when a measure becomes a target, it ceases to be a good measure. If likes became part of the score, creators would optimize for likes — which YouTube already incentivizes. It would add nothing.
So what does TubeLens analyze?
Only the content — the transcript of what was said, read critically across the 4 dimensions described in section 2. Nothing else.
1. Analysis pipeline
Every analysis goes through 4 deterministic stages. The same video processed today and a month from now produces the same output structure — the only thing that can change is what we publish here.
- 1
Transcript extraction
The URL is normalized and the video is resolved by its 11-character ID. The transcript is fetched through dedicated infrastructure that extracts captions from any YouTube video with CC enabled. When captions exist in multiple languages we prefer pt → en → es. We do not use audio or speech recognition — only the text the channel already published as CC.
- 2
Prompt engineering and output contract
The transcript is sent to the model inside a strict prompt that defines the criteria, the 28 possible labels, the score anchors (0/5/10), and the requirement to cite evidence from the transcript for every label assigned. The output is constrained by a structured schema that rejects anything malformed — no free prose, no missing fields.
- 3
Model analysis
A state-of-the-art generative AI model processes the transcript with controlled output: low temperature to reduce variability across runs and a strict schema to enforce uniform structure. The model receives title, channel, transcript language, and the full text, truncated at 30,000 characters when needed (preserving the start of the delivery).
- 4
Post-processing and storage
The composite score is computed in code (we don't trust the model to add numbers), the seal is assigned by fixed bands, and everything is written in a single database transaction. Analyses are globally cached by video_id — videos analyzed before don't burn fresh tokens.
2. The 4 evaluation criteria
Every video is scored 0 to 10 across four dimensions. The anchors are fixed and public: 0 = absent, 5 = average YouTube content, 10 = exceptional. The weights determine how they combine into the final score.
Information density
Weight 30%How much useful content per minute. Penalizes repetition, excessive recap, long monologue intros, stretched outros, and any device that inflates watch time. A 30-minute video that fit in 8 will be marked down even if the core content is solid.
Anchors
0— Almost no useful information; mostly filler
5— Some filler, watchable at 1.25x
10— Every minute carries new information; recaps are short
Clarity
Weight 30%Structure, didactics, organization of ideas. Looks at whether there is a clear thread, examples when needed, terms defined before use, and logical progression. Technically correct but poorly explained content loses here.
Anchors
0— Chaotic, no structure, scattered ideas
5— Watchable with effort; structure is implicit
10— Clear structure, well-timed examples, explicit definitions
Credibility
Weight 30%Sources, verifiable claims, absence of sensationalism. Checks whether the author cites papers, links, real data with provenance, distinguishes opinion from fact, and qualifies claims. Catastrophizing, absolute certainty on contested topics, and charlatanism kill the score here.
Anchors
0— Unsourced claims, sensationalism, charlatanism
5— Mixes fact and opinion without clear separation
10— Well-sourced, qualifies claims, transparent about limits
Originality
Weight 10%Original analysis vs consensus rehash. Penalizes videos that simply repackage what is already circulating without adding analysis, primary data, or a fresh angle. Recognizes when the author brings primary research or an uncommon perspective.
Anchors
0— Repeats consensus with no angle of its own
5— Recombines known information with a personal touch
10— Primary analysis, uncommon angle, original research
3. Composite score and seals
The weighted average of the 4 dimensions produces a number between 0 and 10. That number maps to one of 5 seals. The bands are fixed; there is no editorial override.
Formula
score = densidade × 0.30
+ clareza × 0.30
+ credibil. × 0.30
+ originalid × 0.10Bands
| Score | Seal | Meaning | |
|---|---|---|---|
| 9.0 – 10.0 | Exceptional | Exceptional — reference on the topic | |
| 7.5 – 8.9 | Recommended | Recommended — worth your time | |
| 6.0 – 7.4 | Acceptable | Acceptable — useful, but better exists | |
| 4.0 – 5.9 | Weak | Weak — likely a waste of time | |
| 0.0 – 3.9 | Avoid | Avoid — misinformation or filler |
4. The 28 detected signals
Independent of the score, the model looks for 28 patterns in the content. Each detected signal comes with an intensity 1 to 5 and a justification citing a transcript excerpt as evidence. Signals not detected are omitted — there is no "default answer".
Negative signals (red flags)
Neutral / descriptive signals
Positive signals (green flags)
5. Primary categories
Each video is categorized into up to 3 primary categories with confidence 1-5, plus a free-form subcategory. These feed the ranking and channel-page filters.
6. Channel ranking — Bayesian average
The channel ranking does not use a simple average. A channel with 2 videos at score 10 should not beat a channel with 20 videos at score 9.2 — that would be statistically unfair. We use Bayesian smoothing with the global mean of the period as prior.
Formula
C × M + n × x
score_canal = ─────────────────
C + nParameters
- M = global average of all scores in the period/category
- n = number of videos for the channel in the period
- x = simple average of the channel in the period
- C = prior weight (5)
5.0 threshold rule
Channels with Bayesian score above 5 only enter the "best" list. Below 5, only the "worst" list. Exactly 5 stays out of both.
Minimum videos
Channels with fewer than 3 analyzed videos do not enter the ranking — sample too small for any statistical claim.
Shorts are excluded
YouTube Shorts (videos up to 60 seconds) do not enter rankings or channel aggregates. The 4-dimension rubric — density, clarity, credibility, originality — doesn't fit content under a minute. Individual Shorts analyses remain available on the analysis page; exclusion applies to rankings only.
7. Known limitations
The AI is not infallible and we do not hide that. The main limitations are:
- Analysis is based exclusively on the text transcript. We do not see images, charts, slides, or body language.
- Subtle satire without disclosure can be misclassified as sensationalist or misinformation.
- We do not run live fact-checks against external sources. Credibility is judged by internal consistency, claim qualification, and the author's own source citations.
- Transcript quality affects the result. Videos with low-quality auto-generated captions tend to receive more conservative scores.
- The model may have residual bias in label weighting — we audit periodically and publish updates on this page.
Appeals and review process
TubeLens analyses are editorial opinions grounded in a public methodology. Channel owners may contest any seal, score, or detected signal.
Who can contest
Anyone can report a factual error. Channel owners — verifiable through their YouTube account — have priority in the process and the right to a personalized response.
What can be contested
- Final seal (Exceptional/Recommended/Acceptable/Weak/Avoid).
- Composite score (0–10) or individual scores across the 4 dimensions.
- A specific detected signal (e.g. classified as sensationalist when it isn't).
- A justification that cites a transcript excerpt (interpretation error).
- Assigned primary category.
How to contest
Send an email to support@inosx.com including:
- Link to the contested analysis.
- Specific item in dispute (seal, score, signal, citation).
- Argument and — if possible — transcript excerpt supporting your position.
- Channel identification, if you're the owner.
Timeline and process
We respond within 5 business days. Review is done by a human, not the original AI. Possible outcomes:
- Re-analysis: the video is reprocessed and the result may change (up or down).
- Public annotation: we keep the analysis but add a note explaining the contestation and outcome.
- Removal: rare, reserved for serious factual error or content removed from YouTube. We keep an internal audit of what was removed.
- Maintenance with justification: if the methodology was applied correctly, we keep the analysis and respond with detailed justification.
Process principles
- Transparency: every outcome is public (on the analysis itself when applicable).
- No retaliation: contestation does not lower the channel, remove it from rankings, or change future analyses other than through the content of new videos.
- Good faith: we assume good faith from the contester. Repeated requests on the same point without new arguments are archived after initial response.
- Process separate from analysis: the AI does not review contestations — it is always human review, precisely to avoid reinforcing the model's biases.
TLR · TubeLens Editorial Rating
Public inspirations, independent classification
TLR — our three-axis system (quality, suggested age tier, editorial standards) — was distilled from established public principles: the International Age Rating Coalition (IARC) questionnaire for age suitability, and the standards of the U.S. Federal Communications Commission (FCC §73.1212) and Federal Trade Commission (FTC Endorsement Guides) for sponsorship disclosure and advertising truthfulness. TubeLens is not affiliated with, endorsed by, or certified by any of these organizations. All classification is editorial, derived, and independent.
IARC · FCC · FTC
Leituras
The methodology above is ours, but we did not invent the criteria. They echo an editorial and academic tradition — Goodhart, Kahneman, Pariser, Bellingcat. We document the readings that support every methodological decision.
See editorial bibliography →