How TubeLens evaluates videos
Every analysis follows a deterministic pipeline with public criteria and auditable formulas. This page explains exactly what the AI looks at, how it weighs evidence, and why it lands on a verdict. Nothing here is a black box.
Methodology last updated: May 2026.
1. Analysis pipeline
Every analysis goes through 4 deterministic stages. The same video processed today and a month from now produces the same output structure — the only thing that can change is what we publish here.
- 1
Transcript extraction
The URL is normalized and the video is resolved by its 11-character ID. The transcript is fetched through dedicated infrastructure that extracts captions from any YouTube video with CC enabled. When captions exist in multiple languages we prefer pt → en → es. We do not use audio or speech recognition — only the text the channel already published as CC.
- 2
Prompt engineering and output contract
The transcript is sent to the model inside a strict prompt that defines the criteria, the 28 possible labels, the score anchors (0/5/10), and the requirement to cite evidence from the transcript for every label assigned. The output is constrained by a structured schema that rejects anything malformed — no free prose, no missing fields.
- 3
Model analysis
A state-of-the-art generative AI model processes the transcript with controlled output: low temperature to reduce variability across runs and a strict schema to enforce uniform structure. The model receives title, channel, transcript language, and the full text, truncated at 30,000 characters when needed (preserving the start of the delivery).
- 4
Post-processing and storage
The composite score is computed in code (we don't trust the model to add numbers), the seal is assigned by fixed bands, and everything is written in a single database transaction. Analyses are globally cached by video_id — videos analyzed before don't burn fresh tokens.
2. The 4 evaluation criteria
Every video is scored 0 to 10 across four dimensions. The anchors are fixed and public: 0 = absent, 5 = average YouTube content, 10 = exceptional. The weights determine how they combine into the final score.
Information density
Weight 30%How much useful content per minute. Penalizes repetition, excessive recap, long monologue intros, stretched outros, and any device that inflates watch time. A 30-minute video that fit in 8 will be marked down even if the core content is solid.
Anchors
0— Almost no useful information; mostly filler
5— Some filler, watchable at 1.25x
10— Every minute carries new information; recaps are short
Clarity
Weight 25%Structure, didactics, organization of ideas. Looks at whether there is a clear thread, examples when needed, terms defined before use, and logical progression. Technically correct but poorly explained content loses here.
Anchors
0— Chaotic, no structure, scattered ideas
5— Watchable with effort; structure is implicit
10— Clear structure, well-timed examples, explicit definitions
Credibility
Weight 30%Sources, verifiable claims, absence of sensationalism. Checks whether the author cites papers, links, real data with provenance, distinguishes opinion from fact, and qualifies claims. Catastrophizing, absolute certainty on contested topics, and charlatanism kill the score here.
Anchors
0— Unsourced claims, sensationalism, charlatanism
5— Mixes fact and opinion without clear separation
10— Well-sourced, qualifies claims, transparent about limits
Originality
Weight 15%Original analysis vs consensus rehash. Penalizes videos that simply repackage what is already circulating without adding analysis, primary data, or a fresh angle. Recognizes when the author brings primary research or an uncommon perspective.
Anchors
0— Repeats consensus with no angle of its own
5— Recombines known information with a personal touch
10— Primary analysis, uncommon angle, original research
3. Composite score and seals
The weighted average of the 4 dimensions produces a number between 0 and 10. That number maps to one of 5 seals. The bands are fixed; there is no editorial override.
Formula
score = densidade × 0.30
+ clareza × 0.25
+ credibil. × 0.30
+ originalid × 0.15Bands
| Score | Seal | Meaning |
|---|---|---|
| 9.0 – 10.0 | exceptional | Exceptional — reference on the topic |
| 7.5 – 8.9 | recommended | Recommended — worth your time |
| 6.0 – 7.4 | acceptable | Acceptable — useful, but better exists |
| 4.0 – 5.9 | weak | Weak — likely a waste of time |
| 0.0 – 3.9 | avoid | Avoid — misinformation or filler |
4. The 28 detected signals
Independent of the score, the model looks for 28 patterns in the content. Each detected signal comes with an intensity 1 to 5 and a justification citing a transcript excerpt as evidence. Signals not detected are omitted — there is no "default answer".
Negative signals (red flags)
Neutral / descriptive signals
Positive signals (green flags)
5. Primary categories
Each video is categorized into up to 3 primary categories with confidence 1-5, plus a free-form subcategory. These feed the ranking and channel-page filters.
6. Channel ranking — Bayesian average
The channel ranking does not use a simple average. A channel with 2 videos at score 10 should not beat a channel with 20 videos at score 9.2 — that would be statistically unfair. We use Bayesian smoothing with the global mean of the period as prior.
Formula
C × M + n × x
score_canal = ─────────────────
C + nParameters
- M = global average of all scores in the period/category
- n = number of videos for the channel in the period
- x = simple average of the channel in the period
- C = prior weight (5)
5.0 threshold rule
Channels with Bayesian score above 5 only enter the "best" list. Below 5, only the "worst" list. Exactly 5 stays out of both.
Minimum videos
Channels with fewer than 3 analyzed videos do not enter the ranking — sample too small for any statistical claim.
7. Known limitations
The AI is not infallible and we do not hide that. The main limitations are:
- Analysis is based exclusively on the text transcript. We do not see images, charts, slides, or body language.
- Subtle satire without disclosure can be misclassified as sensationalist or misinformation.
- We do not run live fact-checks against external sources. Credibility is judged by internal consistency, claim qualification, and the author's own source citations.
- Transcript quality affects the result. Videos with low-quality auto-generated captions tend to receive more conservative scores.
- The model may have residual bias in label weighting — we audit periodically and publish updates on this page.