Emory's Musings: 2025

Monday, March 3, 2025

State of the Union

Below is another letter I sent to my elected representatives. I fed ChatGPT my previous letter, gave it a few words about my concerns this week (the necessity of truth at the State of the Union) and asked it to organize my thoughts. After some minor tweaking, here it is. This doesn't need to be hard, keep the pressure on, and be unrelenting.

Whatever you do: write your elected.

Senator Wyden and Representative Bonamici,

Thank you, once again, for your continued service in these extraordinary times. I write to you this week with grave concerns about the upcoming State of the Union address and the importance of ensuring that voices of logic, reason, and truth are present and heard during this event.

I understand that Senator Wyden has chosen to boycott the address, and while I appreciate the sentiment behind this decision, I must respectfully disagree with the approach. Boycotting the address sends a message, but I fear it is the wrong one. In a time when lies, misinformation, and autocratic tendencies are threatening the very fabric of our democracy, silence—even in the form of absence—can too easily be mistaken for surrender.

The American people need to see their elected officials standing up and speaking out in real time. Every falsehood, every distortion, every dangerous precedent set by this administration must be called out—not later in press releases or carefully worded interviews, but in the moment, as these words are being spoken. I believe it would be far more powerful for you and your colleagues to stand in the chamber and vocally challenge each individual lie. Let the American people see you holding this administration accountable with unwavering courage and conviction.

I understand the weight of decorum and the desire to maintain the dignity of our institutions. But when those very institutions are being undermined, the rules of decorum must sometimes be set aside in favor of defending democracy itself. We cannot allow falsehoods to go unchallenged simply because the setting is formal. By remaining in the chamber and standing up—perhaps even speaking up—you would send a clear and undeniable message: the truth will not be silenced.

This is not a call for reckless disruption, but for principled defiance. Imagine the impact if, as each lie is spoken, a chorus of voices from the crowd responded—not with chaos, but with facts, with truth. Imagine the power of elected officials refusing to let falsehoods stand unchecked. Such actions would not only demonstrate the strength of your convictions but would also inspire the American people to stand up and fight for the truth alongside you.

I urge you to reconsider the strategy of boycott and instead take this opportunity to be a visible and vocal force for accountability. Please stand in the crowd and confront the lies as they are spoken. Let the American people hear you call out this administration’s autocratic overreach for what it is—an attack on our democracy that must not go unchallenged.

Thank you for your time and your unwavering dedication to the principles of justice and truth. I am grateful for your service and hopeful that you will choose to stand and fight, not from the sidelines, but from the very heart of the chamber.

Thursday, February 20, 2025

Write Your Representatives

Below is a letter I sent to my elected representatives. It's been a work-in-progress for about two weeks, but finally came together after the events of this President's Day weekend. Please feel free to copy, paraphrase, mangle, or use as-is.

Whatever you do: write your elected.

Senator Wyden and Representative Bonamici,

I want to begin by thanking you for your service. You have taken on an extraordinarily difficult duty in unprecedented times. I believe it is important to acknowledge that we, the voters, recognize the immense challenges you face. Our democracy, our Constitution, and our very way of life are under assault, and you are fighting against tyranny using the limited tools our Constitution provides. I can only imagine how frustrating it must be to battle against an opposition that disregards the rulebook while you strive to uphold it. In normal times, we could expect nothing more than what you are already giving—but these are not normal times.

While I was unable to attend Senator Wyden’s recent town hall, I did attend Representative Bonamici’s on February 17th. This was just hours after I stood with protesters at the "Not My President's Day" demonstration organized by 50501 in front of our state capitol. At the town hall, I heard echoed many of the same concerns expressed by those protestors. I know you hear these frustrations, and I know you understand that your constituents are scared. You are intelligent, capable leaders, but I also understand that you may not have all the answers. Nevertheless, I want to reiterate what I heard from the people in my own words.

Representative Bonamici repeatedly urged people to continue calling, writing, and emailing, emphasizing that these communications serve as valuable ammunition when dealing with your colleagues across the aisle in D.C. However, I don’t believe that message fully acknowledged the underlying sentiment: many of us no longer believe that is enough. I live in a blue district, represented by a blue congresswoman and a blue senator, and yet my children are coming home from school asking about hateful rhetoric they’ve never encountered before. Neighbors are afraid to leave their homes. Friends worry they may soon lose access to life-saving medication for their children. On darker days, they fear for their children's very safety. I fear for my adult daughter’s future and well-being. Every day, we see images of children torn from their families, of women trapped in a foreign hotel holding signs that beg for their lives, knowing that their imminent deportation means certain death. This is already a life-and-death crisis for millions, and it is only getting worse.

This is our reality. I understand that writing, calling, and donating to the ACLU can make a difference. Indeed, I am writing to you now, so I have not entirely lost hope. But as one commenter at Senator Wyden’s town hall pointed out, the time to for these actions may have already passed. Representative Bonamici faced criticism for not being more forceful at the Department of Education, with some even calling for her to take direct, violent action. I do not fault her for choosing peaceful protest over reckless confrontation. However, that does not mean there are no alternatives. Imagine if she had taken out her phone, started recording, loudly declaring that she was documenting violations of constitutional law, and publicly identifying every officer involved. We must get in their faces. We must make them uncomfortable. Now is the time for peaceful yet unyielding activism. If we continue on the same, well worn path, all is already lost. We must fight fire with fire—without sinking to the opposition’s lowest common denominator.

No battle is too small.

The political losses we have already suffered have caused irreversible damage to international relationships, global stability, and the health and safety of countless people. Some of these wounds will persist for generations. Even if Trump were imprisoned tomorrow, I doubt the damage he has done could be undone in my lifetime or my children’s. It is easy to focus only on the most glaring issues and repeat statements like "chaos is the point," but within that chaos, dangerous precedents are taking root. Consider, for example, the executive order ending the production of the penny. On its surface, this might seem inconsequential, but if left unchallenged, it sets a dangerous precedent—one that allows the executive branch to nullify congressionally mandated responsibilities. If a president can arbitrarily halt currency production, what stops him from declaring that the Department of Education should exist in name only, with zero functional responsibility? Every single battle matters.

There can be no complacency. No action taken by this administration should be allowed to stand unchallenged. Small, insidious precedents pave the way for larger, more dangerous ones. People are already dying. If these precedents become normalized, Congress may lose all power entirely—at which point even more people will die. This is not the time for compromise. This is not the time to choose our battles. Ideally, Congress would bring all legislative work to a standstill until the rule of law is restored. At Representative Bonamici’s town hall, a constituent pleaded with her to block the upcoming Continuing Resolution until the executive branch complies with the law. That seems like the bare minimum. I acknowledge that the minority party has limited tools, but Republicans have mastered the art of obstruction. Learn from them. I know Senator Wyden opposes the filibuster, but when no other tool remains, perhaps petty obstruction is the only available recourse to save lives. Nothing should pass either chamber until the traitors are held accountable. Every inch we cede now enables further atrocities later.

I want to conclude with this: At the town hall, people repeatedly asked what they could do. While I was in Salem, I saw countless protest signs—some humorous, some dark, many outright grim. One recurring theme stood out: references to guillotines and the French Revolution. Take heed—people will not sit idly by while our country is gutted for the benefit of a handful of oligarchs. Democratic leadership is being asked—begged—to provide real, actionable direction. In the absence of strong leadership, people will organize themselves, and history has shown what happens when desperate populations take matters into their own hands. I am not a pacifist; in fact, I firmly believe that opposing fascism—by any means necessary—is patriotic. However, I truly hope it does not come to that. If you genuinely believe there is a nonviolent path to restoring liberty and justice, then now is the time to act. People are ready and willing to do more than send emails and make phone calls—they need to be more involved. I am begging you to help channel this energy productively before it is too late.

We have one chance to get this right. Please, do not waste it.

Friday, January 31, 2025

Thoughts about Narrow AI, ChatGPT, GLaDOS, and DeepSeek

Update (April 18, 2025): It was pointed out to me that using an American cultural reference to challenge a Chinese made LLM may be unfair and biased. At the time of writing this I assumed the training data was comprehensive and lacked cultural bias. Indeed it seems DeepSeek may have used OpenAI training data though we know it was heavily modified as it gets cagey if asked about Tiananmen Square. In my personal opinion the question I chose was fair, but I'll leave this judgement to the reader.

As compared to many of my colleagues and peers, I'm a late adopter. When ChatGPT 3 first exploded into public consciousness I asked it a few technical questions and got embarrassingly wrong answers. The equivalent of being told the sky is green at midnight. That is a sentence, it works in English, it's also entirely wrong. So I shelved the whole thing and laughed uncontrollably every time someone said these tools are coming for our jobs. I've watched YouTube videos where people get ChatGPT to write a video game. The video host helpfully and hopefully provides requirements, requirements that are distilled in a way only someone who knows how to code would be able to simplify… ChatGPT provides responses Then they go back and forth more or less like so:

"Ok, that failed this way, what do I do?"
"Ok, now that worked, let's change this, or add this feature."
"Now this is broken…"

It was painful, a skilled developer could do this in a fraction of the time, yes, they got it done, without necessarily needing to know how to code, but you would have to be willfully ignorant of coding to think this was in any way easier. With some coaxing from the host they usually end up with a passable version of the game. Again, because the host knows what's wrong! They asked the right guiding questions and ultimately wrangle it into a working solution… I frankly think it would be easier to learn to code first.

Then management at work started mandating we use GitHub's Copilot. Yes, mandating, as in, install it or be subject to admonishments from middle management. Copilot is another Large Language Model (LLM), like ChatGPT (well, not really, but close enough for most people reading this). It's specifically targeting developers and instead of only producing human language, it also produces code. It runs as a plugin to your code editor and pops up suggestions as you type. You can also chat with it, ask it to help debug, search for bugs, etc. Generally it's not intrusive, you pause and a few lines of code below your cursor will appear in grey. You can tap tab a couple of times to accept or just keep typing ignoring the suggestion and it goes away. As someone who has been coding for 20 years, and has spent significant portions of my career coding with the less ambitious IntelliSense predecessors, it is a profoundly weird experience.

This is marginally less annoying

It's a bit like having an overeager intern shouting their opinions over my shoulder. Constantly. I can ignore him, but I can't yell at him, tell him to stop fixating on that one feature I finished 2 hours ago, we're doing something different now. I frequently think of Clippy, Microsoft's misguided sidekick from 90s versions of Office. On occasion it's helpful, like for writing a quick utility function. Though 9 times out of 10 it assumes functions exist when they simply don't. What's mind boggling is that's a problem we already solved! Why can't Copilot cross-reference it's suggestions with IntelliSense before vomiting garbage all over my screen? It's extrapolating an API function of this name should exist, because that's how human language works. Sorry, the Jenkins developers aren't good at intuitive function naming, the primary reason I've spent 20 hours in their docs in the last month alone.

Sorry, that got out of hand...

Fast forward a few (five) years: I saw a proof-of-concept on Reddit, generally they'd built a smart assistant with the personality of GLaDOS (the AI villain in the Portal video games). Her voice models exist on the internet, and you simply create a ChatGPT powered pipeline in Home Assistant, give it some simple, plain English instructions (known as Prompt Engineering), and you're off to the races.

What have I done?

Holy cow this is cool. I 3D printed a smart speaker running a software stack of my own creation. Now I can speak with GLaDOS in my own home, and she snarks back at me. If I'm willing to pay OpenAI fractions of a penny in API fees, she can even control my home. Many of my friends have remarked this is how you make Skynet.

"Hey GLaDOS, tell me a joke"

In my experimentation I have found there are huge differences between each generation of ChatGPT. Generations 3, 3.5, 4, and 4omni are worlds apart. ChatGPT 4o is weirdly good at coding. At least in small batches. I've been conversationally asking it to do things GitHub CoPilot can't. "Would it be possible to write a function to do X?", it spits something out, and the result is one of the following:

I take the response, tinker a bit and realize I didn't actually want to do this. Believe it or not, this is a win, and this happens a lot when you are coding. It saves an hour of reading API docs and iterating to write a function before ultimately coming to the conclusion this was the wrong approach all along
ChatGPT produces a cromulent function and with some massaging and tweaking fits exactly what I need. It makes it easier for me, a human being to do my job, but it certainly doesn't get it exactly right the first time. And that's fair, if I loaded my entire code-base into ChatGPT and asked it to make the changes I'm working on… it would literally have a breakdown and start to hallucinate.

Because ChatGPT doesn't know anything! It's auto-complete on steroids, the words that came before, statistically, should be followed with these other words. Plus some small randomization. Whether or not those words combined together have any basis in reality is completely immaterial. I really like CGP Grey's primer on Machine Learning, it's more than a decade old (yes, we used to call Narrow AI's like ChatGPT "Algorithms", but that stopped being sexy) Add to all this, the folks at OpenAI have selected for positive answers and a sickeningly cheerful demeanor. It doesn't want to be the bearer of bad news, as a matter of fact, it avoids this to a fault. It's fascinating to me that we've trained this thing based on internet message boards and individual blogs, and it's still so gods damned, oppressively, positive. The insistence on positive answers is actually a flaw and frequently results in conversations like:

Me: "The function you gave me doesn't work, I get <insert unexpected behavior>"
ChatGPT: "Oh yeah, that's because what you asked for isn't actually possible."
Me: ...

In November I took Google's week long Generative AI course (via Kaggle). It's free, intensive, and fascinating. You can take varying amounts of learning from it. They delve deep into training and the vector mathmatics underlying the models, but you can ignore that and focus instead on how to incorporate AI into your programs. I tried to dive deep, but it gets heavy. Ultimately what Google wants is for you to use Gemini in your applications and pay their API fees. After the training I decided to migrate GLaDOS to Google's Gemini - their free tier is more than enough for my usage rate, and the model seems comparable to ChatGPT. So I'm saving $3/mo. Also, because I'm a crazy person, I leveraged LLM vision powered by Gemini to count chickens in the coop after the Smart Home closes the door.

Gemini only sees 3 out of 4 chickens, a forgivable mistake.

One interesting tool we played with in the training is a tool called NotebookLM from Google. You may have seen it more recently in your Spotify Wrapped AI Podcast. It's fun, the gist is you upload data, like some eBooks, or your music listening history and then a pair of AI generated podcast hosts summarize the content. You can also ask a chatbot more concise questions without generating the podcast. Every day of the Kaggle training had a different NotebookLM podcast, the hosts varied from amusing to downright weird. The audio model hallucinates strange sounds of affirmation at odd and unintuitive times. Like most multimodal AI's this phenomenon seems to get weirder the longer the media goes.

Like... really weird.

I bring up NotebookLM because it's an example of what's called a grounded AI. These chatbots I've already discussed don't have access to the internet, they don't even know what day it is. Any knowledge they have is purely incidental and cannot be newer than the date/time they were trained. I'll re-iterate they don't know anything, but statistically speaking the "truth" is (hopefully) the most likely string of words to come out. Grounded models do have access to real data and ChatGPT isn't grounded. When Gemini summarizes your Google search results, it's grounded, but if you're just using it in the Android app it's typically not. NotebookLM is a grounded model, when it summarizes your Spotify listening, it's doing so based on real data. I have on occasion uploaded PDF user guides for complex software tools and then asked NotebookLM specific usage questions. The responses are correct, and it cites it's sources to boot.

I still don't think this thing is coming for my job any time soon. That said I'm realizing it's a remarkably powerful tool for my belt. Spreadsheets didn't obsolete accountants, it empowered them. I think Machine Learning is very similar, I can do bigger, cooler things faster, but it still requires me to know what big, cool, things we're doing.

Now, everyone is talking about DeepSeek-R1. We keep hearing it's equivalent to ChatGPT at a fraction of the price. It's disruptive! China's going to beat us! I think I've established above I'm not an expert in Machine Learning but I say with every ounce of humility I possess, I think I know more than most people, and would go out on a limb to say I'm better educated on LLMs than many of my industry peers. The extraordinary claims about DeepSeek have my skeptical alarm bells are ringing so loudly it's deafening.

I really don't

One of the first things the actual experts told us during the Google Gemini training was the ideas used to build ChatGPT and spark the 2017 AI rush have existed for years, and in some cases decades. The problem is and has always been they are expensive to test. We are in the Wild West of Artificial Intelligence, too many ideas, not enough time or resources. OpenAI took an educated gamble, and it paid off. For every great idea like this, there are 100 white papers proposing improvements/alternative methodologies that have not been tested because their just isn't enough time or data centers. Things are moving at breakneck speed, but this stuff takes time. And money. I mean, DeepSeek cost $6 million to test. The test worked, the resultant model is functional. Could you imagine spending that if their idea had been wrong? They were lucky it wasn't! It could have not worked. Also consider, maybe this wasn't the first Chinese attempt at building a model with competitive parity. How many dollars were spent testing ideas that didn't work, and so we never heard about them?

I've been interested in self-hosting an LLM but unwilling to allocate the tremendous amounts of hardware (and therefore electric bill). I'm currently home, sick recovering from the flu (so you'll forgive the unpolished nature of this entire post) but from cold-medicine addled boredom I fired up the infamous DeepSeek-R1 and I've got to say... I'm not impressed.

For the purposes of these entirely non-scientific tests there are two metrics I care about:

Speed: Inference rate (measured in tokens per second)

"Tokens" is an industry term, and are approximately equivalent to words... It's complicated, suffice it to say, this is how we measure LLM performance on any given piece of hardware

Accuracy: How useful the response is, this is entirely subjective, and I'm the final judge. Deal with it.

Warning, this paragraph gets technically dense:

A quick rundown of the vitals: I'm running Ollama 0.5.7 setup on an unprivileged LXC Ubuntu 24.04 LXC (Proxmox 8.3.2 on the hypervisor). VAAPI hardware encoding and GPU passthrough to the underlying NUC11PAHi7-1165. I did the core install using an unofficial Proxmox Community Script (formerly TTeck, may he RIP) but ultimately made some small modifications for security and performance in my homelab. The LXC has 6GB of RAM and 4 CPUs. It's not a tremendous amount of hardware acceleration, so it's definitely slow, but all tests should be consistently slow.

For this crude comparison I'm using Meta's Open Source Llama model. You might feel like that's unfair (to Meta) because DeepSeek is built on top of Llama. By definition DeepSeek is an improvement upon Llama, at least, an iteration thereof - no China didn't whole-cloth reinvent AI, they made incremental improvements to open source work, any other message is business as usual: pop-science news is lying to you for clicks.

The view from my sickbed

Here's the test: I'm going to ask a few different models a simple question: What actor played Spock? This is a (subjectively) good question because it's intentionally ambiguous. The name Spock could refer to the pediatrician/author or be hallucinated altogether. As this is a cultural touchstone we should get to the Star Trek character, but over the years multiple actors have played Spock, so their are several "right" answers. Generally speaking though, humans can guess the expected answer is "Leonard Nimoy," can the machine?

Remember none of these are grounded models, meaning they do not and cannot fact check. They have no access to the web, or any repository of knowledge. They just talk. They've been trained to mimic human speech, that's it. They will all simply word vomit without checking facts. This is called hallucinating in the industry when what they say is wrong and these responses will possibly be inaccurate. That said, I do want to see if we get accurate "hallucinations". Because ChatGPT 4omni is also ungrounded, and it gets a lot right!

So, without further ado, I fire up a lightweight version of Meta's LLM (llama3.1:8b), and ask my question (click to enlarge):

Not bad...

This answer is useful, more-or-less accurate but painfully slow. It took more than 90 seconds to get us the answer on my limited hardware, at an excruciating 2.49 tokens/s. Doesn't matter, you've all used LLMs, this one's similar to the ones you've used, if I had better hardware it'd be faster, but the answer is the same. We have a baseline! Now let's ask deepseek-r1:1.5b:

WTF?

11.08 tokens/s, wow, that's bleeding fast! The words were just pouring across my screen! First thing you'll note is the <think> blocks. DeepSeek is what's called a "Reasoning" model (that's what the R1 is for), meaning it must walk you through it's thought process. All of this content between these blocks is interesting but ultimately useless. It can help with debugging if you're doing prompt engineering or want to understand better what's going on in the model, but I would always turn it off on a production model. It cannot be disabled in DeepSeek. Programmatically I could cut it out but even if I remove it I still have to wait for the model to finish reasoning before I get the answer. This means, in my humble opinion, the token rate is misleadingly high. If we remove the reasoning the amount of time between me asking the question, and the answer appearing is much, much longer than the inference rate implies. This prompt took 48 seconds to run, which is admittedly faster, than the baseline but...

You will also notice the answer is completely and utterly useless. I had to Google "Jim Bourassa" and ... this entire answer is entirely hallucinated. No such actor exists on IMDB. There was an animated Stargate show: Staragate: Infinity, but there was no character named Spock. I'm not an expert on the Stargate franchise but I can't find any references to that ... Weird Chinese name it gave? Nobody named Jim Bourassa was on the actual show. The answer is completely trash.

"But wait!" some of my keen eyed readers may notice, I compared a 1.5b model to an 8b model!

The 1.5b model is tiny, at just 1.1GB

At the risk of overly simplifying, the 1.5b model is much dumber than the 8b model, and that's to be expected. The 1.5b model is the one everyone's running on their Raspberry Pi. You could probably run this model directly on your phone! No cloud involvement. Well, that's an exaggeration, but still, this is an ultra-lightweight model. I compared these as they're both the "fastest" models available for each technique but there is no equivalent Llama model. I've really only established that the 1.5b model is almost useless. Fine, let's try deepseek-r1:7b, I figure the 7b model is comparable to the 8b model, at least in size:

Um...

This run clocked in at a comparable inference rate of 2.48 tokens/s, not a surprise seeing as this models complexity is essentially the same as the Llama model. I'll note once again, the vast majority of the time was spent on reasoning (which is absolutely inane, and we'll get to that). Total duration from prompt to final answer was just shy of 3 full minutes! This took about 2 times longer than Llama!

Now let's talk about that answer! It immediately zeroed in on Shatner, an actor who was indeed in Star Trek, but long hair? British accent? What in the seven hells are you talking about?!

For giggles I decided to run one more test. I ran a modified version of my query in the Meta model one more time, this time asking it to explain it's reasoning. We won't get the <think> tags, but it should give us a reasonable approximation of the same behavior we get from DeepSeek. Here's the result:

Refreshing!

Again, these are all ungrounded so getting the right answer is an entirely "by chance" event, and yet, the Meta model gets the correct answer every time I ask. The reasoning is entirely sound and logical.

I want to underscore my earlier point, the media wants to pitch this as an embarrassment to American companies. The message we're hearing is some tiny Chinese company developed a new way of building models that modifies/iterates on existing methodologies developed by American companies. They did this purely out of necessity (trade restrictions on GPUs). I am not entirely convinced this new methodology is a anything more than a minor improvement. It's possible future iterations of this training method will prove more effective, and I'll concede they did a great job considering the ridiculously low price. Asserting that DeepSeek is equivalent to ChatGPT? That's (in my humble opinion) absolutely insane! I see no evidence to support that assertion, at least at the low-end performance level of these particular variants.

So, my not-exactly-professional opinion, this is much ado about very little. I do think this new training method could be extremely useful for building grounded chatbots. They're good at talking, but they spew absolute nonsense. If we tethered them to reality, the cheap training becomes a clear advantage. This Chinese startup made an incremental advance, maybe in a few years models trained in this way will provide useful/accurate answers. They also shared their work. This is all open source! OpenAI/Meta/Google will not be going out and retraining their models with this new method immediately, but if there is something to be learned from this cheaper training method, I'm sure they'll figure it out.

The world continues to revolve around the sun.