TubeLens Methodology — how we evaluate YouTube videos

1. Analysis pipeline

Every analysis goes through 4 deterministic stages. The same video processed today and a month from now produces the same output structure — the only thing that can change is what we publish here.

1
Transcript extraction
The URL is normalized and the video is resolved by its 11-character ID. The transcript is fetched through dedicated infrastructure that extracts captions from any YouTube video with CC enabled. When captions exist in multiple languages we prefer pt → en → es. We do not use audio or speech recognition — only the text the channel already published as CC.
2
Prompt engineering and output contract
The transcript is sent to the model inside a strict prompt that defines the criteria, the 28 possible labels, the score anchors (0/5/10), and the requirement to cite evidence from the transcript for every label assigned. The output is constrained by a structured schema that rejects anything malformed — no free prose, no missing fields.
3
Model analysis
A state-of-the-art generative AI model processes the transcript with controlled output: low temperature to reduce variability across runs and a strict schema to enforce uniform structure. The model receives title, channel, transcript language, and the full text, truncated at 30,000 characters when needed (preserving the start of the delivery).
4
Post-processing and storage
The composite score is computed in code (we don't trust the model to add numbers), the seal is assigned by fixed bands, and everything is written in a single database transaction. Analyses are globally cached by video_id — videos analyzed before don't burn fresh tokens.

2. The 4 evaluation criteria

Every video is scored 0 to 10 across four dimensions. The anchors are fixed and public: 0 = absent, 5 = average YouTube content, 10 = exceptional. The weights determine how they combine into the final score.

Information density

Weight 30%

How much useful content per minute. Penalizes repetition, excessive recap, long monologue intros, stretched outros, and any device that inflates watch time. A 30-minute video that fit in 8 will be marked down even if the core content is solid.

Anchors

0— Almost no useful information; mostly filler

5— Some filler, watchable at 1.25x

10— Every minute carries new information; recaps are short

Clarity

Weight 25%

Structure, didactics, organization of ideas. Looks at whether there is a clear thread, examples when needed, terms defined before use, and logical progression. Technically correct but poorly explained content loses here.

Anchors

0— Chaotic, no structure, scattered ideas

5— Watchable with effort; structure is implicit

10— Clear structure, well-timed examples, explicit definitions

Credibility

Weight 30%

Sources, verifiable claims, absence of sensationalism. Checks whether the author cites papers, links, real data with provenance, distinguishes opinion from fact, and qualifies claims. Catastrophizing, absolute certainty on contested topics, and charlatanism kill the score here.

Anchors

0— Unsourced claims, sensationalism, charlatanism

5— Mixes fact and opinion without clear separation

10— Well-sourced, qualifies claims, transparent about limits

Originality

Weight 15%

Original analysis vs consensus rehash. Penalizes videos that simply repackage what is already circulating without adding analysis, primary data, or a fresh angle. Recognizes when the author brings primary research or an uncommon perspective.

Anchors

0— Repeats consensus with no angle of its own

5— Recombines known information with a personal touch

10— Primary analysis, uncommon angle, original research

3. Composite score and seals

The weighted average of the 4 dimensions produces a number between 0 and 10. That number maps to one of 5 seals. The bands are fixed; there is no editorial override.

Formula

score = densidade × 0.30
      + clareza    × 0.25
      + credibil.  × 0.30
      + originalid × 0.15

Bands

Score	Seal	Meaning
9.0 – 10.0	exceptional	Exceptional — reference on the topic
7.5 – 8.9	recommended	Recommended — worth your time
6.0 – 7.4	acceptable	Acceptable — useful, but better exists
4.0 – 5.9	weak	Weak — likely a waste of time
0.0 – 3.9	avoid	Avoid — misinformation or filler

4. The 28 detected signals

Independent of the score, the model looks for 28 patterns in the content. Each detected signal comes with an intensity 1 to 5 and a justification citing a transcript excerpt as evidence. Signals not detected are omitted — there is no "default answer".

Negative signals (red flags)

Pseudo-scientificConspiracy theoristSensationalistClickbaitAlarmistMisinformationCovert advertisingCharlatanismFanboy/haterDoomscroll baitDogmaticRage baitDiscriminatoryPolarizingFiller/repetitive

Neutral / descriptive signals

Opinion piecePromotional (disclosed)SpeculativeSatirePersonal storytellingControversial topic

Positive signals (green flags)

Well-sourcedBalancedDidacticOriginalTransparentRigorousIn-depthUp-to-date

5. Primary categories

Each video is categorized into up to 3 primary categories with confidence 1-5, plus a free-form subcategory. These feed the ranking and channel-page filters.

EducationTechnologyBusinessEntertainmentNews & PoliticsHealth & WellnessScienceLifestyleSportsCultureQuestionable content

6. Channel ranking — Bayesian average

The channel ranking does not use a simple average. A channel with 2 videos at score 10 should not beat a channel with 20 videos at score 9.2 — that would be statistically unfair. We use Bayesian smoothing with the global mean of the period as prior.

Formula

              C × M  +  n × x
score_canal = ─────────────────
                 C  +  n

Parameters

M = global average of all scores in the period/category
n = number of videos for the channel in the period
x = simple average of the channel in the period
C = prior weight (5)

5.0 threshold rule

Channels with Bayesian score above 5 only enter the "best" list. Below 5, only the "worst" list. Exactly 5 stays out of both.

Minimum videos

Channels with fewer than 3 analyzed videos do not enter the ranking — sample too small for any statistical claim.

7. Known limitations

The AI is not infallible and we do not hide that. The main limitations are:

Analysis is based exclusively on the text transcript. We do not see images, charts, slides, or body language.
Subtle satire without disclosure can be misclassified as sensationalist or misinformation.
We do not run live fact-checks against external sources. Credibility is judged by internal consistency, claim qualification, and the author's own source citations.
Transcript quality affects the result. Videos with low-quality auto-generated captions tend to receive more conservative scores.
The model may have residual bias in label weighting — we audit periodically and publish updates on this page.

Transcript extraction

Prompt engineering and output contract

Model analysis

Post-processing and storage

Information density

Clarity

Credibility

Originality

Negative signals (red flags)

Neutral / descriptive signals

Positive signals (green flags)

5.0 threshold rule

Minimum videos