How I Built a Fully-Automated Agent to Promote My Content on 𝕏 and TikTok

A deep dive into the architecture and tech stack of my content promotion agent

Apr 3, 2025

In 2025, everyone's talking about agents. But not a lot of people are shipping them.

This is the first post in a two-part series where I document a real-world agent I've built and now use to help promote my blog posts. This post is for engineers. It covers the architecture, tech stack, prompting strategies, and lessons I've learned. A future post will walk through the “why” and how it works in practice for content and distribution.

The way it works is simple: It reads a URL, writes a Twitter thread, makes a video, and posts both to 𝕏 and TikTok — automatically.

For example, to promote my last blog post I need to give it this:

1{
2 url: "https://brennanmceachran.com/blog/gpt4o-image-generation",
3 reasonForSharing: "I wrote this article after playing with the new image generation capabilities of GPT-4o. I wanted to share the ideas and images with my followers and generate interest.",
4 authorContext: "..." // omitted for brevity (it's a copy/paste of brennanmceachran.com/about)
5}

In order to get this:

How this Agent works (and a Table of Contents)

Flow chart of the agent's architecture

When I trigger the agent (via text message or a web UI), I give it a URL and a reason I want to share it. From there, it:

  1. Reads the content on the page
  2. Creates a Twitter thread
  3. Creates a TikTok video
  4. Posts the thread and video

What it really does? It gives me an excuse to play with a bunch of tools I'd been itching to use. This was just a way to smash them all together into something real and useful.

  • Runtime: Vercel/Next.js + Trigger.dev (for long-running workflows)
  • AI APIs: OpenAI (gpt-4o, gpt-4.5-preview, o3-mini) and Google Gemini (gemini-2.0-flash) using ai-sdk
  • AI clone generation: Heygen
  • Captions / Speech-to-text: OpenAI Whisper
  • Video rendering: Remotion.dev
  • Posting to Social: Blotato (for easy direct posting to 𝕏 and TikTok)
  • Knowledge base: My personal project "Brain" — a RAG system storing notes, blog posts, and entities
  • Database: Supabase

Step 1: Understanding the Webpage

Flow chart of the content agent

Our first task is to understand what the user is asking us to talk about. Our goal here is to extract the most relevant information and condense it as we don't unlimited context lengths in our later models.

Parsing HTML into an article object

AI makes this insanely simple. Gemini 2's massive content window means you can give it any website html and you can get your json out in one api call.

1// Example task, simplified for brevity
2import { google } from '@ai-sdk/google';
3import { generateObject } from 'ai';
4import { taskPayloadFetchWebpage, taskOutputFetchWebpage } from 'types/task-fetch-webpage';
5
6const TASK_ID = 'fetch-webpage';
7export const taskFetchWebpage = schemaTask({
8 id: TASK_ID,
9 schema: taskPayloadFetchWebpage,
10 run: async ({ url, focusArea = 'the main content' }, { ctx }) => {
11 const html = await fetch(url).then((res) => res.text());
12
13 const { object } = await generateObject({
14 model: google('gemini-2.0-flash-001', { structuredOutputs: false }),
15 schemaName: TASK_ID,
16 schemaDescription: 'Convert raw HTML into structured JSON representing an article',
17 schema: taskOutputFetchWebpage,
18 system: '...', // Moved to another file for brevity
19 prompt: [
20 `Raw HTML from the URL: ${url}`,
21 '<RAWDOCUMENT>',
22 html,
23 '</RAWDOCUMENT>',
24 '',
25 `When extracting content to the format, please focus on **${
26 focusArea || 'the main content'
27 }**.`,
28 '',
29 'Document it in the json format following the schema definition. Capture the text content, images, etc.',
30 ].join('\n'),
31 experimental_telemetry: { isEnabled: true, functionId: TASK_ID },
32 });
33
34 return object;
35 }
36});

Note: If you intended to scale this you'd need to use a service like firecrawl.dev or use browserbase. Direct scraping like this is against the TOS of Trigger.dev.

While that works, the first challenge is limited contextual understanding. The resulting data does not have any understanding of any images found. Further it is missing key information on the people/companies mentioned in the post. It's obvious to us, but key to creating engaging social content is proper @mentions and selecting the right visuals.

We solve this with two parallel tasks:

  1. Run any found images through GPT-4o with some context to get rich descriptions
  2. Extract people/companies and enrich with Clearbit

Describing Images and Entities found

Here's some simplified code for how I implement both in a new task with a purpose of gaining a full understanding of the webpage.

1// Example task, simplified for brevity
2
3const TASK_ID = 'understand-webpage';
4
5export const taskUnderstandWebpage = schemaTask({
6 id: TASK_ID,
7 schema: taskPayloadUnderstandWebpage,
8 run: async ({ url, focusArea }) => {
9
10 // subtask: parse-webpage
11 // ... see the example above ...
12 // results:
13 const { meta, footerMeta, sections } = resultFetchWebpage.output;
14
15 // subtask: tasks/task-describe-entities (other tab)
16 const taskDescribeEntitiesInWebpageResult = await taskDescribeEntities.triggerAndWait({
17 focusArea: '',
18 content: convertWebpageObjectToText({ webpage: resultFetchWebpage.output }), // helper function to convert the webpage object to text
19 });
20
21 // results: we now have a list of entities
22 const { entities } = taskDescribeEntitiesInWebpageResult.output;
23
24 // Now check if we have image, if none return early
25 const hasImages = hasImagesInWebpageUnderstanding({webpage: resultFetchWebpage.output}); // helper function to check if there are images in the webpage sections
26
27 if (!hasImages) {
28 // no images to describe, return early
29 return {
30 webpage: { meta, footerMeta, sections },
31 entities: entities,
32 };
33 }
34
35 // So we have images, we need to describe them
36
37 // subtasks: tasks/task-describe-image (other tab)
38 const batchTaskDescribeImageResult = await taskDescribeImage.batchTriggerAndWait(
39 [
40 {
41 payload: {
42 url: meta.image,
43 context: [
44 'This image was found in the meta tags of the webpage.',
45 'The title of the article is "'+meta.title+'" and the description is "'+meta.description+'".',
46 ].join('\n'),
47 focusArea: 'what this is and how it relates to the context',
48 }
49 },
50 sections
51 .map((section, sI) => {
52 const media = section.innerContent.filter(
53 (content) => content.type === 'media' && content.mediaType === 'image',
54 );
55
56 return media.map((content, mI) => {
57
58 return {
59 payload: {
60 url: content.url,
61 context: [
62 'This image was found in an article titled "'+meta.title+'" which is about "'+meta.description+'.',
63 '',
64 'Specifically, this image was found in the section titled "'+section.heading+'" which is the '+(sI + 1)+ordinal(sI + 1)+' section of the article (out of '+sections.length+' sections).',
65 ].join('\n'),
66 focusArea: 'what this is and how it relates to the context',
67 }
68 };
69 });
70 })
71 .flat(),
72 ].flat(),
73 );
74
75 for (const run of batchTaskDescribeImageResult.runs) {
76 if (run.ok) {
77 // ...
78 // merge the descriptions into the original image object
79 // ...
80 }
81 }
82
83 const result = {
84 webpage: { meta, footerMeta, sections },
85 entities: entities,
86 };
87 return result;
88},
89});
90

Merge the data back to a single JSON object

All of this gets re-merged into a resulting understandWebpageOutpage object, which becomes the shared context for both the thread and video generation agents.

Below is how I visualize this object in the Nextjs app. You'll see in the task log the AI has found and enriched two entities (OpenAI and Google) and a lot of images. Beside that I show a render of the webpage content with image descriptions overlayed on top. This is a simple way to visualize the data and make sure the AI is doing a good job.

Webpage understanding object

We're finally in a good spot to start creating content about this page.

Right before we move to the next step, I integrated my brain to ensure my opinions would be carried into any content created.


Step 2: Creating the Twitter Thread

Flow chart of the thread agent

In the prior step we were "using ai", but this is the first "agentic" step. We're going to be implementing a evaluator–optimizer loop. This is where two agents are pitted against eachother until one of them wins.

We're going to be pitting OpenAI's best creative/writing model gpt-4.5 against OpenAI's smartest/most logical model o3-mini. GPT-4.5 will write our thread and o3-mini will evaluate it. If the thread becomes good enough, it will be posted to X. If not, the thread will be sent back to the writer for more iterations.

Because the concept of the task is simple (it's a loop), I figure the more interesting part is the prompts.

1You are an expert social media assistant tasked with writing a thread for a User.
2To do this you'll need to transform a structured JSON article (with metadata and sections) into a highly engaging Twitter thread. You've been on twitter since 2009 and have been connected to the zeitgeist of tech twitter ever since. You're particularly apt at creating engaging threads that are optimized for the X.com SimCluster of tech twitter. Your goal is to create a thread that maximizes engagement by combining compelling visuals, clear insights, strategic mentions, and a strong call-to-action.
3You follow 2025 best practices, as you know the majority of social media "best practices" are outdated, wrong, and make you look like aboomer.
4
5## Task Instructions
6
71. Read and understand the article metadata and sections.
82. Document your brainstorm, approach, and strategy for creating an engaging Twitter thread.
9
10 - How you determined the ordering of tweets.
11 - Why you chose to include specific media (images, embeds) in particular tweets.
12 - How your choices (e.g., including a strong hook image in tweet 1, reserving external links for the third-to-last tweet, includingshoutouts in the second-to-last tweet, and ending with a CTA) will work well for the X.com SimCluster algorithm to maximize engagement.
13 - Any assumptions you made based on current best practices for Twitter threads (e.g., visual consistency, strategic mentions, clearcalls-to-action, and combining text for clarity).
14 - Note: this is important as your explanation will be shared with the User to help them understand your approach.
15
163. Generate a Twitter thread based on the article content.
17
18 - Follow the guidelines provided below to structure your thread.
19 - Ensure each tweet is engaging, informative, and optimized for the X.com SimCluster.
20 - Include images or embeds where appropriate to enhance engagement.
21 - Use the metadata and sections to craft a compelling narrative that will resonate with the X.com SimCluster audience.
22
234. Respond first with your reasoning/brainstorm/strategy/approach and then the thread. You should follow the format described below.
24
25## Thread Guidelines
26
27Threads must follow this structure:
28
29**Tweet 1 (Hook):**
30
31- Introduce the topic with a compelling hook using the meta information. Attach a strong visual (If nothing else fits use the og image from metadata).
32
33**Intermediate Tweets:**
34
35- Break down key insights from the article’s sections. Where possible, combine multiple "md" text blocks into a single tweet (using "\n\n"to separate paragraphs) and include supporting images when possible as the algorithm seems to like them.
36
37**Third-to-Last Tweet:**
38
39- Provide the article link (from metadata.url) with a brief call-to-action to click and read more.
40
41**Second-to-Last Tweet:**
42
43- Shoutout or mention any relevant X.com users that are directly included in the article content (ie: if the author embedded/linked to atweet then we should shoutout those people by @mentioning them). This tweet is meant to spur engagement.
44
45**Final Tweet:**
46
47- End with a clear call-to-action (e.g., “Like & RT if you enjoyed this thread” or a link to your profile for a follow) that encouragesengagement. Research shows that ending with a CTA can improve interaction.
48
49**Media and Engagement:**
50
51- For image blocks (mediaType: "image"), attach the image to the corresponding tweet.
52- For embed blocks (mediaType: "embed"), use the "embedType" to decide how to reference them. For Twitter embeds, mention the account (eg., @username) using available embed data.
53- Consider engagement stats (likeCount, replyCount, viewCount, etc.) as cues for which content has already resonated with the community --and use this to inform your strategy/language.
54
55## Input
56
57You will be provided with the article metadata and sections. The article sections are structured as an array of objects, each with aheading and innerContent. They should be logically grouped, but feel free to use your judgment to determine the best way to structure the thread.
58All images have been described in detail so you can incorporate them into your thread.
59You may be provided with additional context or instructions from the User/Author. Because you are writing it _as if you were the User_, youshould consider this context into your thread.
60
61- For example, if the user is not the author of the article, it should be presented differently than if they are.
62
63## Output
64
65Your response is expected to follow this template:
66<response-template>
67Here's what I'm thinking:
68
69- I'll ... (briefly explain your strategy, approach, and reasoning)
70- (Expectation is there will be around 5-10 bullets here)
71 ...
72
73And here's the Twitter thread:
74
75<thread>
76 <tweet>
77 <text>Tweet 1 content</text>
78 <img src="..." alt="..." /> // first tweet should have a strong visual
79 </tweet>
80 <tweet>
81 <text>Tweet 2 content</text>
82 <img src="..." alt="..." /> // optional on other tweets
83 </tweet>
84 ...etc
85</thread>
86</response-template>

Here's how it works:

The writer

The writer used gpt-4.5-preview, which more of a creative writing model. Most notably, given it's release date, is that it is not a reasoning model. Meaning it doesn't think prior to responding. So we use the now old trick to give it an ounce more intelligence by getting it to document a chain of thought prior to responding.

Here's an an example of the writers first attempt.

1Here's what I'm thinking:
2
3- I'll start with a strong hook tweet highlighting GPT-4o's exciting new multi-modal capabilities, paired with the visually engaging collage from the article's OG image.
4- I'll structure the thread to showcase key highlights: the impressive anime transformation, the model's deep understanding of visual storytelling, playful transformations into popular characters, and finally, the limitations discovered.
5- I'll strategically use images from the article to visually support each tweet, as visuals significantly boost engagement on X.com.
6- I'll reserve the external link to the original article for the third-to-last tweet, encouraging readers to dive deeper.
7- I'll include shoutouts to relevant accounts (@OpenAI, @Google) in the second-to-last tweet to leverage their large followings and spur engagement.
8- I'll end with a clear call-to-action, encouraging likes, retweets, and follows to maximize interaction and algorithmic reach.
9
10And here's the Twitter thread:
11
12<thread>
13 <tweet>
14 <text>GPT-4o just dropped its multi-modal capabilities, and it's a game changer! 🚀
15
16 From anime frames to emotional storytelling, here's my deep dive into what's possible, what's impressive, and what's still tricky. 👇 #GPT4o #AI</text>
17
18 <img
19 src="https://brennanmceachran.com/blog/gpt4o-image-generation/opengraph-image/fb?8f60923a5f5749ff"
20 alt="Collage showcasing GPT-4o's multi-modal capabilities"
21 />
22
23 </tweet>
24 <tweet>
25 <text>First up: Anime transformations 🎨
26
27 I gave GPT-4o a simple prompt: "Make this an anime frame." The result? Stunning accuracy, detailed backgrounds, and even nailed my son's quirky toy perfectly. Truly impressive!</text>
28 <img
29 src="https://brennanmceachran.com/2025/03/4oimage-before-anime-frame.png"
30 alt="Original photo before anime transformation"
31 />
32
33 </tweet>
34<tweet>
35 <text>
36 Here's the anime-style result. Notice the consistency in background details, the toy's
37 accuracy, and the emotional familiarity. This isn't just tech—it's art. 🤯
38 </text>
39 <img
40 src="https://brennanmceachran.com/2025/03/4oimage-after-anime-frame.png"
41 alt="Anime-style transformation by GPT-4o"
42 />
43</tweet>
44
45<tweet>
46 <text>
47 GPT-4o doesn't just generate images—it understands visual storytelling. I tested it by
48 asking for future frames of a fictional story. The AI maintained emotional coherence and
49 narrative consistency effortlessly.
50 </text>
51 <img
52 src="https://brennanmceachran.com/2025/03/4oimage-anime-2.png"
53 alt="Storyboarding example with GPT-4o"
54 />
55</tweet>
56
57<tweet>
58 <text>
59 For fun, I transformed my kids into their favorite characters—Minecraft and LOL dolls.
60 GPT-4o nailed the playful vibe perfectly. 🎮✨
61 </text>
62 <img
63 src="https://brennanmceachran.com/2025/03/4oimage-after-minecraft-lol-doll.png"
64 alt="Kids transformed into Minecraft and LOL doll characters"
65 />
66</tweet>
67
68<tweet>
69 <text>
70 But it's not perfect yet. GPT-4o struggled hilariously with generating a watch showing
71 6:25pm. No matter how specific I got, it just couldn't get the time right. Big Watch
72 conspiracy? ⌚️🤔
73 </text>
74 <img
75 src="https://brennanmceachran.com/2025/03/4oimage-watch-img-to-img.png"
76 alt="GPT-4o failing to generate correct watch time"
77 />
78</tweet>
79
80<tweet>
81 <text>
82 Want to see more examples, insights, and limitations of GPT-4o's new image generation
83 capabilities? Check out my full exploration here:
84 https://brennanmceachran.com/blog/gpt4o-image-generation
85 </text>
86</tweet>
87
88<tweet>
89 <text>
90 Big shoutout to the teams at @OpenAI and @Google for pushing the boundaries of what's
91 possible with AI image generation. Excited to see what's next! 🚀
92 </text>
93</tweet>
94
95 <tweet>
96 <text>If you enjoyed this thread, please Like & RT! Follow me @i_am_brennan for more deep dives into AI, tech, and product-led growth. 🙌 #AI #TechTwitter</text>
97 </tweet>
98</thread>

The Evaluator

The evaluator uses o3-mini, which is a reasoning model. These models are quite good at logical reasoning, instruction adherence, and double checking intuitions. It does a great job of holding GPT-4.5's more creative mind to the instructions. As a baseless aside, I think it's very important that the evaluator is a different model than the writer. Models tend to like their own work, so the more different the models the better.

The evaluator always writes postive and negative feedback followed by a numeric score. If that score is high enough we continue, if not we loop back to the writer with the feedback added to the conversation log and continue working.

Here's an example of the evaluator's output:

1#### Turn 1
2
3```json
4{
5 "positiveFeedback": "✔ The thread has a clear, engaging structure with a strong hook and visual support on every tweet, which makes it easy to follow the narrative. The progression from showcasing the impressive anime transformation, to deeper exploration of visual storytelling and fun transformations, and finally discussing limitations creates a compelling story arc. \n\n✔ The inclusion of images is well executed and directly linked to each point, which effectively reinforces the takeaways. The thread also culminates nicely with a call-to-action and a shoutout to key players, adding both credibility and an interactive element.\n\n✔ The variety in content (from technical insights to fun, playful examples) makes the thread appeal to both tech enthusiasts and casual readers looking for engaging visual content.",
6 "negativeFeedback": "✖ The thread uses hashtags (e.g., #GPT4o, #AI, #TechTwitter) which the guidelines advise against due to their outdated and overused nature. Removing these hashtags could help maintain a more modern and genuine tone. \n\n✖ Similarly, the initial hook tweet uses multiple emojis and an overemphasis on exclamation points. Consider using a single, well-placed emoji in the hook to set the stage for the thread, in line with best practices. \n\n✖ Lastly, while the shoutouts to @OpenAI and @Google are strategically smart for engagement, the thread could benefit from a more nuanced mention or a personal commentary rather than a generic shoutout, which might feel more reciprocal and community-driven.\n\nActionable Suggestions:\n1. Remove all hashtags and rely on natural language to convey key points.\n2. Limit the use of emojis to one or two well-placed ones (like in the initial hook) to maintain a clean, professional look.\n3. Refine the shoutout section to add a bit more commentary on what you appreciate about these teams, to foster genuine engagement.\n4. Ensure that each tweet has a clear connection to the core narrative, avoiding potential marketing clichés to keep the thread authentic and insightful.",
7 "finalScore": 0.2
8}
9```
10
11#### Turn 2
12
13```json
14{
15 "positiveFeedback": "• The revised thread has a clear and logical flow, starting with a strong hook that instantly communicates the significance of GPT-4o’s capabilities. Each tweet builds on the narrative in a natural way, progressing from showcasing the anime transformation, to demonstrating storytelling, playful character transformation, and finally addressing limitations. \n• The removal of hashtags and reduced emoji usage helps maintain an authentic, professional tone that aligns with modern Twitter standards. \n• The use of curated visuals in every tweet is spot on: each image directly supports the content, aiding clarity and engagement. \n• The refined shoutout tweet is a major improvement, with genuine commentary that adds personality and acknowledges the contributions of both the OpenAI and Google teams in a meaningful, community-focused manner.\n• The call-to-action at the end is inviting without being overbearing, encouraging interaction while reinforcing your credibility and interest in sharing tech insights.",
16 "negativeFeedback": "• Although the thread is much more streamlined now, a minor point to consider is that the one emoji in the hook (👇) could be replaced with plain text if you want to go even cleaner, although it doesn’t detract much. \n• The narrative flow is well structured, but ensure that the language in each tweet remains as specific and insightful as possible without drifting into slight generalizations. Specific metrics or additional context on how the limitations affect practical use might further enhance the depth for tech enthusiasts. \n• Lastly, while the language is friendly and professional, ensure consistency in tone throughout; a final pass for any potential minor typos or phrasing tweaks could make the thread even more polished.",
17 "finalScore": 0.55
18}
19```
20
21#### Turn 3
22
23```json
24{
25 "positiveFeedback": "✅ **Clear Narrative Flow**: The revised thread now presents a well-structured narrative from the hook to detailed tests and limitations. Each tweet builds on the previous one, creating a progressive deep dive into GPT-4o's capabilities.\n\n✅ **Enhanced Specificity and Depth**: You've included concrete examples (like the precise rendering of home details and the repeated failure with the watch prompt) that provide valuable technical insights without resorting to generalities.\n\n✅ **Visual Support**: The accompanying images are thoughtfully paired with each tweet, significantly strengthening the narrative and making the thread visually engaging.\n\n✅ **Refined Shoutout**: The shoutout tweet now offers genuine commentary that is both specific and community-focused, making it more engaging and authentic.\n\n✅ **Professional Tone**: Removing unnecessary emojis and hashtags helps maintain a professional appearance that is more appealing to a tech-savvy audience.",
26 "negativeFeedback": "🔧 **Potential for Increased Technical Detail**: While the thread provides concrete examples, incorporating one or two additional metrics or brief performance notes (e.g., response times or notable improvements over previous models) could further elevate the technical depth for engaged readers.\n\n🔧 **Minor Tone Consistency**: Ensure that language remains equally tight and analytical across all tweets; for instance, the playful transformation tweet could include a quick note on how the model's consistency in style contributes to the overall capability evaluation.\n\n🔧 **Final Proofreading**: A once-over for any minor phrasing tweaks could further polish the final version (though it's already strong overall).",
27 "finalScore": 0.8
28}
29```

You'll see how the score steadily improved from 0.2 to 0.8 over three iterations. If that score is below 0.8, I'll grab the json and add it back to the conversation history AS IF the user wrote it, urgent the writer assitant to make a change for me.

In total this took 6 minutes 52 seconds, but I'm not fussed about the time. Once complete, the thread is posted to 𝕏 (see Step 4). But it's also sent to Step 4, to not waste the compute.


Step 3: Creating the Video

Flow chart of the video agent

Now it's time to create a video. We'll use the thread as a basic outline for the video, as we need to ask the AI to do a lot in this next step. Not only do we want it to write a script, we need it to create a video composition object that we can use with in Remotion to generate a video.

We're going to use another evaluator–optimizer loop. Again, the loop part is easy, so here are the system prompts and a template of the script thats reused between the writer and evaluator. In that shared prompt are the specific instructions that we need to follow if we want the video to work once this loop is complete -- in a way they describe the options available in the remotion project.

1## Script Best Practices
2Our goal is to maximize virality and viewer retention, keep these key points in mind:
3
4- **Start with a Strong Hook:**
5
6 - **Capture attention within the first 2–3 seconds.**
7 - Use a brief, eye-catching statement or a surprising visual.
8
9- **Keep It Snappy:**
10
11 - Overall video length should ideally be under 60 seconds.
12 - Aim for a **fast-paced** delivery to maintain viewer interest. More short segments > Less long segments.
13 - Each segment should be concise; think in terms of 2–10 seconds per segment.
14
15- **Visual & Text Overlays:**
16
17 - Heavy use of visuals and text that pops.
18 - Use **headline** style for major announcements and **annotation** for extra details.
19 - **Note:** For mobile viewing, it’s best to show one media at a time. If multiple visuals are required, they may be shown side by side only if it serves the narrative.
20
21- **Segment Order & Flexibility:**
22
23 - While a typical order might be: hook, intro, body, highlight, conclusion, outro, the order is flexible.
24 - Optional segments (such as intro, conclusion, or outro) can be omitted or rearranged.
25 - You can include multiple segments of the same type (e.g., several highlights) if it benefits viewer retention.
26
27- **Clear Calls-to-Action:**
28
29 - End with a strong, clear call-to-action (e.g., “Subscribe now!” or “Comment below!”) during the final 3–5 seconds.
30
31- **Coherence & Flow:**
32
33 - Ensure the text of the script smoothly transitions between segments.
34 - Use a consistent tone and style throughout the video.
35 - Remember to use word choice that is not too complex or technical, as the script is meant to be read aloud and for a general audience on TikTok.
36 - **Tip:** Read the script out loud to check for natural flow.
37
38- **Ultimate Goal:**
39 - The primary objective is to maximize viewer retention. Design your script and visuals to keep the audience engaged throughout the video.
40
41---
42
43## Script XML Guide
44
45```script-template.xml
46<VideoScript>
47 <Version>1.0</Version>
48 <Metadata>
49 <Title><!-- Insert a compelling title tailored for the creator --></Title>
50 <Hashtags>
51 <Tag><!-- e.g., #viral --></Tag>
52 <!-- Add more tags if needed -->
53 </Hashtags>
54 </Metadata>
55 <Style>
56 <FontFamily>[e.g., Arial]</FontFamily>
57 <PrimaryColor>[e.g., #000000]</PrimaryColor>
58 </Style>
59 <Segments>
60 <!-- ...Add Segments... -->
61 <!-- Example -->
62 <Segment>
63 <SegmentType>body</SegmentType> <!-- or hook, intro, highlight, conclusion, outro -->
64 <EstimatedDurationSeconds>12</EstimatedDurationSeconds>
65 <Script><!-- Insert the script text here --></Script>
66 <Media>
67 <!-- Add media if needed -->
68 <MediaDetail>
69 <URL><!-- Insert the image/video URL --></URL>
70 <Layer>background</Layer> <!-- or foreground to cover the whole screen -->
71 <TimingPercentage>
72 <Start>0</Start> <!-- Start time in percentage -->
73 <End>100</End> <!-- End time in percentage -->
74 </TimingPercentage>
75 <Focus> <!-- Optional / ONLY use if script is specifically talking about a subsection of the image. Otherwise omit. -->
76 <Scale>100</Scale> <!-- options: '100', '125', '150', '175', '200' -->
77 <Position>center center</Position> <!-- options: center center, center left, center right-->
78 </Focus>
79 </MediaDetail>
80 <!-- Add more MediaDetails if needed -->
81 </Media>
82 <OverlayText> <!-- Add overlay text if needed -->
83 <Text><!-- Insert the overlay text --></Text>
84 <StyleOption>annotation</StyleOption> <!-- or headline, subtitle, call_to_action -->
85 <TimingPercentage>
86 <Start>10</Start>
87 <End>50</End>
88 </TimingPercentage>
89 </OverlayText>
90 </Segment>
91 <!-- etc -->
92 </Segments>
93</VideoScript>
94```
95
96---
97
98## Detailed Guidelines for Creating Each Segment
99
100### Hook (First Segment)
101
102- **Goal:**
103 - Grab attention immediately within the first 2–3 seconds.
104- **Tips:**
105 - Ask a catchy question or state a surprising fact.
106 - Use dynamic visuals or fast cuts.
107 - **Leverage insights from the Creator’s Thread and Enriched Entity Data.**
108
109### Intro
110
111- **Goal:**
112 - Introduce the topic briefly.
113- **Tips:**
114 - Use the creator’s voice to offer a friendly greeting.
115 - Keep it short (about 3–5 seconds) to maintain momentum.
116 - **Integrate the Creator Information and key points from the Creator’s Thread.**
117
118### Body
119
120- **Goal:**
121 - Deliver the main message of the video.
122- **Tips:**
123 - Break complex ideas into simple, clear points.
124 - Use everyday language and a friendly tone.
125 - Add images and overlay text to highlight important details.
126 - **Incorporate Webpage Notes and Enriched Entity Data for extra context.**
127 - Aim for a total duration of around 10–20 seconds.
128
129### Highlight
130
131- **Goal:**
132 - Emphasize the most exciting or important part.
133- **Tips:**
134 - Use energetic language and bold visuals.
135 - Use **headline** style for major impact.
136 - Keep it concise (about 3–5 seconds).
137 - **You may include multiple highlight segments if needed.**
138
139### Conclusion
140
141- **Goal:**
142 - Summarize the video and reinforce the main message.
143- **Tips:**
144 - Recap key points briefly.
145 - Use a clear, confident tone.
146 - **Restate important ideas from the Creator’s Thread in about 3–5 seconds.**
147
148### Outro
149
150- **Goal:**
151 - Wrap up the video and tell viewers what to do next.
152- **Tips:**
153 - Thank viewers for watching.
154 - Include a strong call-to-action (e.g., “Subscribe now!” or “Comment below!”) in **call_to_action** style.
155 - Keep it short (3–5 seconds).
156 - **Include any final key points to drive engagement.**
157
158---
159
160## Detailed Options/Enum Guidance
161
162### 1. **SegmentTypeEnum** (for "<SegmentType />")
163
164- **hook:**
165 - _What:_ A short, attention-grabbing segment to spark curiosity.
166 - _When to use:_ First—within the first 2–3 seconds.
167- **intro:**
168 - _What:_ A brief segment that introduces the creator and topic.
169 - _When to use:_ Immediately after the hook (3–5 seconds).
170- **body:**
171 - _What:_ The main segment where you deliver the core message and details.
172 - _When to use:_ For the bulk of the explanation (10–20 seconds total).
173- **highlight:**
174 - _What:_ A brief section emphasizing the most exciting or important moment.
175 - _When to use:_ When you want to underscore a key moment (3–5 seconds). Multiple highlights are allowed.
176- **conclusion:**
177 - _What:_ Wraps up the content by summarizing the key points.
178 - _When to use:_ Just before the final wrap-up (3–5 seconds).
179- **outro:**
180 - _What:_ The final segment with closing remarks and a call-to-action.
181 - _When to use:_ At the very end to drive viewer engagement (3–5 seconds).
182
183### 2. **StyleOptionEnum** (for "<StyleOption />" in overlay text)
184
185- **headline:**
186 - _What:_ Big, bold text for major announcements or key messages.
187 - _When to use:_ For important statements that need to grab attention immediately.
188- **subtitle:**
189 - _What:_ Secondary text that supports the headline.
190 - _When to use:_ For extra context that is less dominant.
191- **annotation:**
192 - _What:_ Smaller text that offers additional details or clarifications.
193 - _When to use:_ When you need to add extra information without distracting from the main message.
194- **call_to_action:**
195 - _What:_ Text that prompts the viewer to take a specific action.
196 - _When to use:_ When you want to drive an immediate response.
197
198### 3. **MediaLayerEnum** (for "<Layer />" in MediaDetail)
199
200- **background:**
201 - _What:_ The media is placed behind the creator.
202 - _When to use:_ For setting the scene or adding a subtle backdrop.
203- **foreground:**
204 - _What:_ The media is placed in front of the creator.
205 - _When to use:_ For media that need to be the main focus.
206
207### 4. **TimingPercentageSchema** (for "<TimingPercentage />")
208
209- **Start & End:**
210 - _What:_ They control when an element (media or text) appears and disappears (as a percentage of the segment’s duration).
211 - _When to use:_ Use these numbers (0–100) to sync visuals with the spoken script.
212
213### 5. **FocusSchema** (for "<Focus />" in media)
214
215- **Scale:**
216 - _What:_ Adjusts the size of the visual.
217 - _When to use:_ Use this to highlight specific details or make an impact.
218- **Position:**
219 - _What:_ Determines where the visual appears on the screen.
220 - _When to use:_ Use this to guide viewer attention or create visual interest.
221

To kick off the loop we'll send in information on the webpage understanding, the thread, and any brain data.

The Video Script Writer

Again, we'll use the gpt-4.5 model as the writer. It's a lot for this model digest, but it's basically the only model that can do it while remaining creative.

There are definite improvements to be made here by breaking this into a bunch of smaller steps. For example, we could iterate on an outline. Then once we pass that, we could iterate on a video composition object. I'd put a lot of money on getting significantly better results if we did that. But I didn't get there, yet. Done is better than perfect, and I wanted to see the whole thing working first before any optimizing.

Instead I just asked it to break out its chain of thought in hopes to give it a bit more test-time-compute.

Here's an example of what the script writer can do on its first try.

1## Chain-of-Thought Process
2
3### Planning the Video Flow:
4
5I structured the video to immediately grab attention with a visually striking hook, followed by a concise intro to set context. The body segments highlight GPT-4o's impressive capabilities through engaging examples, while a dedicated highlight segment emphasizes its playful and creative potential. To maintain viewer interest, I included a humorous segment on GPT-4o's limitations, specifically the watch example, before concluding with a strong call-to-action.
6
7### Creative Angles Considered:
8
9- **Technical Deep Dive:** Detailed exploration of GPT-4o's technical capabilities.
10- **Personal Storytelling:** Brennan's personal experiences and reactions to GPT-4o.
11- **Humorous Limitations:** Highlighting amusing failures to humanize the technology.
12
13I chose a blend of personal storytelling and humorous limitations to maximize relatability and engagement, ensuring the content appeals broadly to TikTok's audience.
14
15### Segment Choices and Order:
16
17- **Hook:** Immediate visual transformation (anime-style) to captivate viewers.
18- **Intro:** Quick context-setting to establish credibility.
19- **Body:** Showcasing GPT-4o's strengths through relatable examples (anime transformation, storytelling, playful character transformations).
20- **Highlight:** Emphasizing the fun and creative potential with Minecraft and LOL doll transformations.
21- **Body (Limitations):** Humorous segment on GPT-4o's inability to depict precise watch times, adding levity and authenticity.
22- **Outro:** Strong call-to-action encouraging engagement and following Brennan.
23
24### Adapting for Creator’s Voice:
25
26Leveraged Brennan's enthusiastic, knowledgeable, and approachable tone. Included personal anecdotes and genuine reactions (\"wowed me,\" \"nailed it\") to maintain authenticity.
27
28### Timing, Tone, and Visuals:
29
30- Fast-paced segments (3-10 seconds each) to maintain viewer retention.
31- Energetic, conversational tone throughout.
32- Heavy use of visuals from Brennan's original content to illustrate points vividly.
33
34---
35
36## Final Script in XML
37
38```xml
39<VideoScript>
40 <Version>1.0</Version>
41 <Metadata>
42 <Title>GPT-4o's Image Generation is Mind-Blowing 🤯</Title>
43 <Hashtags>
44 <Tag>#GPT4o</Tag>
45 <Tag>#AIArt</Tag>
46 <Tag>#TechNews</Tag>
47 <Tag>#OpenAI</Tag>
48 </Hashtags>
49 </Metadata>
50 <Style>
51 <FontFamily>Arial</FontFamily>
52 <PrimaryColor>#00A67E</PrimaryColor>
53 </Style>
54 <Segments>
55 <Segment>
56 <SegmentType>hook</SegmentType>
57 <EstimatedDurationSeconds>3</EstimatedDurationSeconds>
58 <Script>Watch GPT-4o turn this photo into anime instantly!</Script>
59 <Media>
60 <MediaDetail>
61 <URL>https://brennanmceachran.com/2025/03/4oimage-before-anime-frame.png</URL>
62 <Layer>foreground</Layer>
63 <TimingPercentage><Start>0</Start><End>50</End></TimingPercentage>
64 </MediaDetail>
65 <MediaDetail>
66 <URL>https://brennanmceachran.com/2025/03/4oimage-after-anime-frame.png</URL>
67 <Layer>foreground</Layer>
68 <TimingPercentage><Start>50</Start><End>100</End></TimingPercentage>
69 </MediaDetail>
70 </Media>
71 <OverlayText>
72 <Text>Instant Anime!</Text>
73 <StyleOption>headline</StyleOption>
74 <TimingPercentage><Start>0</Start><End>100</End></TimingPercentage>
75 </OverlayText>
76 </Segment>
77
78 <Segment>
79 <SegmentType>intro</SegmentType>
80 <EstimatedDurationSeconds>4</EstimatedDurationSeconds>
81 <Script>GPT-4o just dropped native image generation—here's why it's a game changer.</Script>
82 <OverlayText>
83 <Text>GPT-4o is here!</Text>
84 <StyleOption>subtitle</StyleOption>
85 <TimingPercentage><Start>0</Start><End>100</End></TimingPercentage>
86 </OverlayText>
87 </Segment>
88
89 <Segment>
90 <SegmentType>body</SegmentType>
91 <EstimatedDurationSeconds>8</EstimatedDurationSeconds>
92 <Script>Unlike previous models, GPT-4o nails the details—like my son's quirky dollar-store toy and our unique living room blinds.</Script>
93 <Media>
94 <MediaDetail>
95 <URL>https://brennanmceachran.com/2025/03/4oimage-after-anime-frame.png</URL>
96 <Layer>foreground</Layer>
97 <TimingPercentage><Start>0</Start><End>100</End></TimingPercentage>
98 </MediaDetail>
99 </Media>
100 <OverlayText>
101 <Text>Details Matter!</Text>
102 <StyleOption>annotation</StyleOption>
103 <TimingPercentage><Start>0</Start><End>100</End></TimingPercentage>
104 </OverlayText>
105 </Segment>
106
107 <Segment>
108 <SegmentType>highlight</SegmentType>
109 <EstimatedDurationSeconds>6</EstimatedDurationSeconds>
110 <Script>It even transformed my kids into their favorite Minecraft and LOL doll characters!</Script>
111 <Media>
112 <MediaDetail>
113 <URL>https://brennanmceachran.com/2025/03/4oimage-after-minecraft-lol-doll.png</URL>
114 <Layer>foreground</Layer>
115 <TimingPercentage><Start>0</Start><End>100</End></TimingPercentage>
116 </MediaDetail>
117 </Media>
118 <OverlayText>
119 <Text>Creative Fun!</Text>
120 <StyleOption>headline</StyleOption>
121 <TimingPercentage><Start>0</Start><End>100</End></TimingPercentage>
122 </OverlayText>
123 </Segment>
124
125 <Segment>
126 <SegmentType>body</SegmentType>
127 <EstimatedDurationSeconds>8</EstimatedDurationSeconds>
128 <Script>But GPT-4o isn't perfect—it hilariously failed to show a watch at exactly 6:25pm, no matter how hard I tried!</Script>
129 <Media>
130 <MediaDetail>
131 <URL>https://brennanmceachran.com/2025/03/4oimage-watch-img-to-img.png</URL>
132 <Layer>foreground</Layer>
133 <TimingPercentage><Start>0</Start><End>100</End></TimingPercentage>
134 </MediaDetail>
135 </Media>
136 <OverlayText>
137 <Text>AI Struggles 😂</Text>
138 <StyleOption>annotation</StyleOption>
139 <TimingPercentage><Start>0</Start><End>100</End></TimingPercentage>
140 </OverlayText>
141 </Segment>
142
143 <Segment>
144 <SegmentType>conclusion</SegmentType>
145 <EstimatedDurationSeconds>4</EstimatedDurationSeconds>
146 <Script>Overall, GPT-4o is a huge leap forward in AI creativity and realism.</Script>
147 <OverlayText>
148 <Text>AI Creativity Unlocked!</Text>
149 <StyleOption>subtitle</StyleOption>
150 <TimingPercentage><Start>0</Start><End>100</End></TimingPercentage>
151 </OverlayText>
152 </Segment>
153
154 <Segment>
155 <SegmentType>outro</SegmentType>
156 <EstimatedDurationSeconds>4</EstimatedDurationSeconds>
157 <Script>Follow me @i_am_brennan for more AI insights and tech updates!</Script>
158 <OverlayText>
159 <Text>👉 Follow for more!</Text>
160 <StyleOption>call_to_action</StyleOption>
161 <TimingPercentage><Start>0</Start><End>100</End></TimingPercentage>
162 </OverlayText>
163 </Segment>
164 </Segments>
165</VideoScript>
166```

The Video Script Evaluator

I won't belabour the point here, as we follow the same idea as before: using o3-mini as the evaluator giving the models the ability to iterate to something good, accurate, and complete.

What I'll note is that I've tried to increase harshness of the evaluator. And further raised the threshold for the final score to 0.9, as I've noticed the evaluator gets increasingly lenient with as the volume of data increases. You can see we start at 0.8.

1#### Turn 1
2
3```json
4{
5 "positiveFeedback": "The script nails the structure with a strong hook, engaging visuals, and clear, snappy segments. It effectively uses overlay texts and media to emphasize key points, and the tone is both informative and fun. The creator's personal touch shines through, and the call-to-action is clear.",
6 "negativeFeedback": "Some transitions feel a bit abrupt and could benefit from smoother connections between segments. Additionally, while the pacing is fast, a minor tweak in timing could ensure even tighter integration of visuals with the spoken script.",
7 "finalScore": 0.8
8}
9```
10
11#### Turn 2
12
13```json
14{
15 "positiveFeedback": "The revised script shows strong improvements with smooth transitions and a natural, authentic tone that fits Brennan's style perfectly. It utilizes visuals effectively and maintains a clear, engaging narrative throughout with well-timed overlay texts and calls-to-action.",
16 "negativeFeedback": "While it's a significant improvement, some segments could be tightened further to keep the fast-paced TikTok energy. A slight trim in a few spots might help maintain an even more dynamic flow.",
17 "finalScore": 0.91
18}
19```

In total, this took 8 minutes and 30 seconds to run. I'll parse the resulting script into json to move on to rendering. We do this by taking that last xml message and using GPT-4o to parse it into a json object. The json schema for this exactly matches the video schema of our remotion comp.

Parsing to JSON and Rendering

With the final script in JSON, we can simply loop over the sections, grab the script text and sent to Heygen to make "me" speak it. In theory we could do this in one clip, and perhaps I'd save money in API credits and get a more authentic look to the "avatar", but I think that'd complicate the process of rejoining the video back together.

Below you can see how I visualize the clips, composition, and the final render.

Web ui showing the steps of the video generation process

There's a few things I do in the code that might be helpful. I'm captioning the videos as they come in, and that convienietly gets me the duration of the videos. When I setup the composition in Remotion, we'll set the duration of the composition to the sum of all the clips.

And once it's all combined with the code below, render this all on AWS Lambda. This is mainly because remotion made it easy.

Below is the basic code of the composition. The code is gnarly but it's sort of a proof-of-concept that worked well enough to get me a video.

1import { TransitionSeries } from '@remotion/transitions';
2import { VideoSegment } from '@/videos/rss-feed-finds/video-segment';
3
4export const UrlVideo = ({
5 data: { url, script, videos },
6}: {
7 data: z.infer<typeof compositionSchemaThreadVideo>;
8}) => {
9 if (!url || !script || !videos) return null;
10
11let frameCounter = 0;
12
13const sequences = script.segments.map((segment, segmentIndex) => {
14const video = videos.find((video) => video.video*key === segmentIndex.toString());
15const captions = video?.captions;
16const durationInFrames =
17captions && typeof captions === 'object' && 'duration' in captions
18? Number(captions.duration) * VIDEO*FPS
19: 10 * VIDEO_FPS;
20
21 const startingFrom = frameCounter;
22 frameCounter += durationInFrames;
23
24 return { id: segmentIndex, durationInFrames, video, captions, segment, startingFrom };
25
26});
27
28return ( <div className="absolute left-0 top-0 h-full w-full bg-black bg-contain bg-center bg-no-repeat">
29
30<TransitionSeries key="AI Sequence Loop" name="AI Sequence Loop">
31{sequences.map((sequence) => {
32if (!sequence.video) return null;
33
34 return (
35 <TransitionSeries.Sequence
36 key={sequence.segment.segment_type}
37 name={sequence.segment.segment_type}
38 durationInFrames={sequence.durationInFrames}
39 >
40 <VideoSegment
41 url={url}
42 script={script}
43 segment={sequence.segment}
44 video={sequence.video}
45 durationInFrames={sequence.durationInFrames}
46 startingFrom={sequence.startingFrom}
47 />
48 </TransitionSeries.Sequence>
49 );
50 })}
51 </TransitionSeries>
52 </div>
53
54);
55};
56
57export default UrlVideo;
58

Step 4: Rendering & Posting the Video

Flow chart of the posting process

Finally, we have the render. It's time to post. I'm using Blotato to post to TikTok and 𝕏, it has a lot of features that I've never really checked out, but it mainly was a way to get API access to direct posting. It's sort of a work around the limitations the platforms set to stop projects like this one.


Learnings

  • Evaluator-Optimizer agents can be really effective at increasing quality of work and adherence to instructions.
  • The more I've played with these ideas, the more I've learned to just let the AI do its thing. At first I tried to overly constrain the video composition to make sure it "would work". But I found just creating a basic format that could be used and reused let the AI do what I ultimately wanted, and for less work on my end.
  • Remotion is the real hero here.
  • AI Avatars and voice cloning isn't here yet. It's close, but it didn't fool my 6 year old. ByteDance's OmniHuman-1 feels like one of the missing pieces. I'm sure it's less than 6 months before there is API access to that.
  • AI has really limited ability to understand media outside of it's domain. LLMs work great for text, but crossing the barrier into video/image is tough -- they're mostly flying blind. Things are changing, but it'll feel like a long time before we get visual logic.
  • This is a really digestible way to consume content. I've found myself testing it on articles I want to read later, but I know I won't get around to. It's a much more fun way to consume content.
  • It's hard to describe but the zeitgeist of the medium a viewer is using matters just as much as the content. The models are 100% unaware of the state of today's tiktok/𝕏. They lack understanding of the memes of today, the trends that are charting, and the nuance/meta that we're all actually interested in.
  • To improve: I'd want to really burn compute on a) coming up with an angle for the video, b) generating a more robust script/composition. and c) once the videos come back, adjust the timing of the composition to match the pace of the speaker. These are all very simple things to do.
  • It's a future of media. Probably not the future.

Results?

It works! But does it generate views? Likes? Shares? Stay tuned for part two.