updating my priors
2577 stories
·
3 followers

An Open Letter to ICE Regarding My Potential Disappearance

1 Share

Dear ICE official(s),

I noticed you recently detained your first Iranian foreign national. As a first-generation Iranian American, I’ve been conditioned to assume this is a testament to our great Persian culture. At least I’m sure that’s what my dad will say. Is he from Shiraz?

I just had a few questions ahead of any potential deportation and/or the disappearance of myself or my family members.

I know a lot of people might criticize your lack of a formal DEI initiative, but I want to commend the diversity of the first batch of students you’re detaining. It’s like the cast of The Sex Lives of College Girls, if you swapped the white girls for a South Korean and a Palestinian.

Speaking of college, are you deporting only Western, South, and Eastern Asians (and Muslim Africans) with impressive educations? I noticed that, thus far, most of them are pursuing their PhDs. If it helps, despite pleas from my dad, I have no interest in being an MD or PhD.

I also noticed the students are in fields like mechanical engineering or have elite credentials like being a Fulbright Scholar. Again, my dad really wanted me to pursue that path. “Just get a business degree,” he told me. “What about being a lawyer?” he asked. I think he would have settled for a minor in business. But I just got an English degree, and my GPA was not great. Does that help my case?

Also, are you deporting only these legal residents with ties to college campuses, or do you plan on expanding to other places, like cultural centers or the Halal Guys?

I guess what I’m wondering is, are you super committed to universities, and what would you consider a “tie” to a university? If I go to a college campus once a week to take my kids to piano lessons, will you abduct me there, or is this really more to instill fear in all of the “good” immigrants that come to the United States to share their talents here instead of staying at home? Like, is this just an attack on the brain drain that benefits America?

I want to emphasize again that I only have an English degree. Yes, I ended up getting a master’s, but that was in social work, so again, really nothing to see here.

Speaking of preparation, I noticed you’re starting to use plainclothes agents. Bold move. Way to instill fear. Is there any sort of uniform hoodie we should be aware of, or just generally be afraid of any hooded white guys? I actually was already afraid of them, but honestly, I usually felt okay in broad daylight in a public place. Thank you for reminding me that I was never really safe.

Hypothetically speaking, if my Iranian dad had a relationship with a blonde American from the Midwest, thereby ending in my conception, will my 23andMe DNA results be taken into account when choosing what country to deport me to? Or do you detain/deport/disappear based on the highest percentage?

I deleted my 23andMe account out of concern for my data privacy, but on the off chance it will help me make my case, let me go ahead and disclose I’m actually only 49.7 percent Iranian. (I’m 0.5 percent Ashkenazi Jewish, so I physically can’t be antisemitic.) I’m 40.1 percent British and Irish.

If it’s possible to make a request, I wouldn’t mind Ireland. I know it’s not perfect, but they have access to abortion. Also, they speak out against genocide. Of course, I guess that’s what made me a target in the first place.

Thanks for your help,
Saba Khonsari

Read the whole story
jsled
5 hours ago
reply
South Burlington, Vermont
Share this story
Delete

A Modern Approach to Hit Points and Communicating Damage

1 Share

The relationship between game designers and hit points is a complex one. Whether in digital or pen-and-paper, hit points tend to be the structural unit that defines a player character or obstacle’s resilience, and has become how we conceptualise and communicate damage and interactive effects between designer and player. And frankly, I find that a little uninspiring, and sometimes, a little predatory.

This describes a slightly orthogonal approach to hit points, and ends with a design challenge that attempts to remove hit points entirely.

Hit Points in Use

There are three main uses for Hit Points within a game:

  1. Increasing time-to-kill

  2. Granularity

  3. Comparative abstraction

Increasing time-to-kill is about putting a step between the player interacting with a dangerous position, and ending a play session. In Call of Duty (Infinity Ward, 2003), a player steps out of cover, takes a few rounds, and returns fire. The game has effectively communicated risk and danger, and “punished” the player without ending play. This use of hit points to extend a play session was part of their original design intent by Dave Arneson in drafting first editions of Dungeons and Dragons (TSR, 1974): “a chance to live longer and do more.” Hit Points provide a way for a “violent-state” world to interact negatively upon a character without removing the character from play. In the same way, increased hit points let enemies stay around longer, requiring more interactions from the player to change the playstate.

Granularity is about giving a data set more “steps” to pass through in order to differentiate states. For example, a character with 100 hit points theoretically has 101 states to pass through (including 0 hit points). This gives game designers a dial to tweak: an attack that does 70 damage is fundamentally different to the play experience than an attack that does 30 damage. This is why the old design adage warns us away from 1s and 2s: They remove granularity of playstate. They also interact with the next usage:

Comparative abstraction describes the use of hit points as a way to express how things are diegetically or narratively different from each other. A paladin is “tougher” than a wizard because a paladin has 100 hit points, and a wizard has 20 hit points. A dragon is a “more powerful” baddie because the dragon does 6d6 damage to a players hitpoints, while a goblin does 1d6. By having a spread of numbers, we can describe things as “calculably different” or “different in scale”.

Granularity and Changing States

You may notice that only the first one in that list is a functional difference. The other two are structural differences. To show you what I mean, let’s imagine a game where one character has 2 hit points, and another character has 4 hit points, twice as tough. We meet both conditions for Granularity and Comparative Abstraction: We have dials, and those dials "say something” about the diegesis. However, these structural difference don’t mean anything if all enemy attacks do 100 damage. In all cases, the time-to-kill is immediate, and the characters are functionally identical. Hit points, though they may be coded into the game, are not a part of the functional player experience.

Let’s extend this one step further. Consider a game with two weapons: The starting minipistol, and the upgraded hand cannon that we’ll call the MEGAPISTOL! For ease, these weapons are identical except for their damage stat. If enemies have 100hp, and the minipistol does 50hp of damage, then the megapistol HAS to do 100hp of damage. This is because players don’t experience enemy hp as a number of hit points, they experience it as a number of interactions required to change the game state. If the minipistol (doing 50hp of damge) and the megapistol (doing 70hp of damage) both take 2 shots to kill an enemy, then they are functionally the same to the player. I acknowledge there are other dials to turn, like number of bullets or reload time, but I’m keeping this discussion along a single axis to discuss the use of hit points.

The takeaway lesson, then, is that hit point granularity (and thus, the comparative abstraction between different weapons or enemies) doesn’t mean anything on its own. We can’t have comparative abstraction by granularity of numbers, but rather comparative abstraction by granularity of game states. Players experience these game states by presence or absence, which means the addition of a measurable unit adds two game states: Presence and Absence. In the case of hit points, Alive or Dead. But a designer can define stages along that measurable unit to add additional states, and again because of presence and absence, it leaves us with this note: For every game state defined by a measurable unit, there is also a state defined by its absence.

The options provided by that definition of game state is a magical opportunity. I’m so frustrated by our obsession with a binary of “active vs inactive” game states. In Call of Duty I am as effective after taking one round as I am beforehand. In Dungeons and Dragons the dragon and the wizard are both dealing full damage until one side loses their final hit point. Now, for a game like Call of Duty, with a short time-to-kill (usually within a quarter of a second), this lack of granularity is perfect. Players cannot consider the change of game states within a firefight. Spray, as they say, and pray. This is not a tactical approach. However in Chess (yeah, I’m not putting a year here), the game state is usually measured with much finer comparative granularity. Material (how many pieces), Position (more activated pieces), Time (in timed games), or better Endgame availability are all the “hit points” of Chess. While a knight and a bishop are both valued at “3 points”, any player of experience would rate one higher at different states of play. The pace of the game allows that considered approach to comparative abstractions.

So why, in tactical games that have a more considered flow, like XCOM (2013), Wildermyth (2020), and yeah, Dungeons and Dragons, Fifth Edition (2014), do we not support the player in developing other interesting game states?

Additional States Generate Additional Play Experiences

Pew, pew. I shoot a laser beam from my sword, giving me and Link a ranged attack against these more dangerous enemies. Across the series (but starting at the start) The Legend of Zelda (1986) has included a game state where an Undamaged (full hit points) Link can shoot a beam from the sword. Placing a game state at the top end of hit points rewards mastery, and gives a low-risk bonus to players that are able to get through a level without taking damage.

I love Games Done Quick and the work they put in, and there’s an interesting state change in Pokemon Red (1996) that is only utilised in the speedrun. When a pokemon is on critically low health (“Red Bar”), the game preferences the two-tone health warning music over pokemon cries and level-up jingles. This creates a “faster state” where the following have to be true:

  1. The player has to take enough damage to be put into “red bar”

  2. The player cannot take enough damage to make their pokemon faint

  3. The player must maintain this state throughout subsequent fights.

Placing the beneficial game state at the lower end of health has created a high-risk, high-reward position that players will need expertise to juggle.

Opportunities to Consider

Doom (2016), and Doom Eternal (2020) meets Pokemon’s Red Bar

Doom lives and dies (pardon the pun) off a health system that drives the player forward. In Doom, players regain health by performing melee kills against weakened enemies.

AND
IT
IS
AWESOME!

I can’t speak highly enough of this Glory Kill System, but now I want to ask, what happens if we give Doom Guy a few states to pass through? Given the dynamic up-and-down bounce of health, I think it’s appropriate to utilise a single additional state, at the bottom end of the hit point pool, maybe the last 25%. When players are in this critically reduced health state, their damage is increased by 100% through a “berserk” feature.

The granularity dials of hit points interact with this too. Players on low difficulty, where monsters do less damage, will find it easier to remain in this “Red Bar” state take advantage of this benefit, giving players who are “Too Young To Die” the opportunity to feel powerful and have “clutch comebacks” against dangerous monsters. It does not, however, engage with comparative abstraction as only one character (the player character) is engaging with this mechanic.

However, because we’re placing it at the back-end of hit points, players on harder difficulties will find it more difficult to be safely put into the state, and more difficult to maintain it without dying, but where they can maintain it, will be able to take great advantage of the bonus it provides.

Major risks will be by making a change to health as an incentive, players may not be as willing to engage in Glory Kills and maintain the momentum the game holds so dear. Given that we don’t see players preserving health by hiding away from combat while at full, I think it is an unlikely outcome, but one to look out for nonetheless.

Dungeons and regaining dynamic states of Dragon health

The fourth edition of Dungeons and Dragons (2008) had a monster state called “bloodied”. All monsters would enter bloodied at the same state, defined by having half hit points remaining. This state gave descriptive granularity to the GM (players would want a fictional bark that described the character as wounded), and comparative abstraction as some monsters became more dangerous when bloodied, and some became less dangerous.

Given the exceptionally low time-to-kill in Dungeons and Dragons, I suggest that baddies may even have more states. Rather than using Bloodied as a binary on/off switch for abilities and recharges, there is the option for a state that is “Wounded”. Remember I said when measuring you can create a state by presence and absence. This could be a state where the monster was not at full hit points (ie where the “full health” state was absent); when the monster has taken some damage, but not enough to be considered something as critical as “Bloodied”. This provides an option for turning on or off early-round threats, or showing a monster ablating under the withering attacks of the Player Characters. A gorgon (the magical armoured cow version) may start with an Armour Class (AC) of 20, quite high. But after that first big hit, that ablates to 16, when Scathed. The extra granularity of states gives us an early bark, and gives the Gorgon an interesting early advantage to show it’s metallic resilience without dragging out a fight.

Four Damage States to Consider in Future Designs

Unhurt

A stage designed for alpha strike threats and ablative skills. Unhurt is defined as “not notably damaged", this includes not having taken damage, but also having received little scratches. A dragon with a sword through a scale may remain “unhurt” for extra comparative abstraction (“look how tough it is!”).

Unhurt is excellent for when the Designer wants to provide a high-interest threat that grabs attention immediately, where you want to draw player attention and make a splash, but not create an overpowering threat.

Alternatively, Unhurt provides an option for a slow, creeping threat. Something that doesn’t reveal its full hand of cards until the fight has already started.

Usage:
“The Ferret Armoured Scout Car has extra speed while Unhurt”.
”The Shapeshifter does not need to make checks to maintain its form while Unhurt”.

Bloodied/Scathed/Damaged

This is “noticably damaged”. “Shit, that hurt” but “I’ve had worse”. It’s a wound, it’s pain, but it’s mostly cosmetic damage. A secret agent with a round through the bicep. It’s Daredevil taking Elektra’s sai through his shoulder in the 2003 film and then fighting Bullseye minutes later with no noticeable consequence. I like shifting the term “Bloodied” earlier in the piece to increase the impact of combat. Bloodied, to me, is the villain with a split lip, tasting their own blood from their finger (or dramatically snapping their fingers). They’re not down, not even half, in fact, they’re just getting started.

Bloodied is an ideal state for showing the “adrenaline surge” of combat. You’re not going to wait until you’ve taken real damage to get that blood pumping, are you?

Usage:
”A Bloodied Barbarian adds 1d4 damage to their attacks as their humours get up.”
”A Bloodied player’s Hunger bar decreases more slowly.”

Wounded

Wounded is wrecked. Ruined. Lost limbs. For some characters this dips down on the power curve, for others it spikes up (comparative abstraction). Wounded should feel big, and depending on game tone, messy. Wounded is a great place for barks and VFX to be reactive of the situation a character has gone through: An orc ravaged by Legolas’ arrows should LOOK a different Wounded to one hit by Gimli’s axe.

Wounded is a state for showing things are nearing the end of their life, so don’t hold back on your design choices here. It’s a good place for sentient characters to break and run, or surrender like in Griftlands (Klei Entertainment, 2021). It’s also a great place for those big bloodlusty threats to dig in and fight with all of their remaining vigor.

Usage:
”The Predator gets -4 defence when Wounded - ‘If it bleeds, we can kill it.’”
”The ogre cannibal does double damage when Wounded.”

Dead

is dead. It’s an absence. It’s the punitive state or the win condition. It’s a bottom-out, rather than a functional state in itself.

Usage:
”You died” - Dark Souls (FromSoftware, 2011)

Conclusion

Players think in game states, they talk to each other in game states. Let’s let them play in game states. Numbers are a wonderful tool, that I will never begrudge the use of in games, but let’s not make them our primary communication method.

There is an argument to be made that this approach recreates HP, with my 4 states acting as a 4 hp system, and I can see what that’s saying, but it’s actually a result of hp being conceptualised as changing game state for so long. We’ve become conditioned to seeing the little white numbers pop up above that boss and consider it progress.

What I’m suggesting is not so much a change in structural approach, but a change in how we communicate these outcomes to players. Even if numbers remain in the game as the structural building blocks that make damage and health happen, I’m asking us to conceptualise damage as a changing condition of the player’s play experience, not as a changing condition of an abacus.

Design Challenge

If we have four functional states, do we need hit points at all? Let’s, as an exercise, take this to the furthest conclusion of replacing hit points with states entirely. Consider a game that uses hit points as it stands, and think about whether it could instead deliver the same experience with these states instead?

The answer won’t be yes for everyone, but as a design challenge, this will flex your understanding of how players functionally experience the dynamic movement of health for both goodies and baddies.

When you find a game that could do this, draft up some paper-prototype rules for how you would implement this, and review them using the first three elements we discussed:

  1. Time-to-kill - Does this dramatically change the flow of combat in this game? Does the new flow meet design intent?

  2. Granularity - Are there enough states to use as a dial to respond to player choices? Maybe there’s too many and you don’t need both Bloodied and Wounded? What else could you tune to make weapons feel different now that you can’t just add a “+4 damage” sticker and colour it blue?

  3. Comparative Abstraction - How do the states make enemies feel different to each other? Are Bloodied enemies functionally different for players than Unhurt enemies?

Sample thoughts:

XCOM could easily use Unhurt, Wounded, Bloodied, and Dead, maybe even maintaining its Bleeding out/Dead split that is rolled when a friendly operative reaches 0 hit points.

Some draft rules would include:
Phalanx characters (with shields) ignore changes of state from the front 180 degree arc.
A sniper rifle critical sets the target’s state to Dead.
All other crits double state movement.
Light weapons (pistols) do one state of damage
Assault weapons (rifles, smgs) do two states of damage
Heavy weapons do three states of damage.
A flanked enemy takes one additional state of damage.
A Faceless (big gooey tough enemy) requires two consecutive hits in a round to move from Unhurt

Maybe a fun additional mechanic called “cued shot”, where operators line their shot up with each other to break through defensive enemies? We could also tie this in with XCOM 2’s Bonds system that joins pairs of soldiers together as bffs.

Against our criteria:

  1. Time-to-kill remains roughly the same. An assault rifle with an appropriate tech level will move default enemies two states, which means two shots to kill, unless you crit. That’s about where it is. There’s some tweaking around an assault rifle crit doing 3 or 4 “states”, meaning one or two hits to kill, but I’d be happy to take that to playtesting.

  2. Granularity is I think maybe a little weak. XCOM is a GREAT test for this, because one of its most fun elements was enemy variety. However, that’s also a good teaching point for this kind of health system. A 4 hit point “Thin Man” (acid spitting guy in suit) is a functionally different enemy to a 4 hit point “Floater” (jetpack cyborgs who can zoom behind your cover to flank). They even use the same weapon (the light plasma rifle), but given the other dials left to turn, they still feel totally different to play against. For this reason I want to keep an eye on granularity during playtesting.

  3. Comparative Abstraction is a tough one to decide on my own because it’s so much about feeling. I can make guesses, but I might not be able to accurately predict a player’s understanding of the abstraction at play. I think one of the hardest abstractions to communicate will be increasing damage by tech level. In base XCOM this is simple to communicate: a floater has 4 health. A heavy floater has 14 health. Each still takes either one shot from a heavy weapon or two shots from light weapons (as player technology advances along the same rate as alien reveals) but the player can easily see that change just by the number of bars at the top. This would probably be won and lost in the missions between where a player upgrades their equipment and then when the player encounters Heavy Floaters (or, for the unfortunate souls, vice versa). That emotional impact still lands in base XCOM, and the advantage here is that it would be communicated with barks and vfx versus some white numbers floating above the enemy’s head.

Read the whole story
jsled
9 days ago
reply
South Burlington, Vermont
Share this story
Delete

How Many People Live Paycheck to Paycheck?

1 Share

Bernie Sanders is a big fan of citing troubling economic statistics. One of the figures that he features in his rotation is that 60 percent of Americans live “paycheck to paycheck.” This number consistently irritates certain wonks and so I’ve decided to do a deep dive into the controversy to see what I can make of it. In short, I’ve found that the phrase “paycheck to paycheck” is not consistently defined and that efforts to debunk the claim rely upon data that don’t convincingly do so.

Paycheck-to-Paycheck Surveys

LendingClub (60%)

The figure Bernie cites appears to come from the Paycheck-to-Paycheck report, which was a series of monthly reports put out by LendingClub between June 2021 and December 2023. The methods of this report are opaque. LendingClub claims to have surveyed around 2,500 to 3,000 consumers for each report, but the reports do not make clear whether they are simply asking people if they live paycheck to paycheck or deducing this in some way using personal financial information. Nonetheless, the LendingClub report found that 52 percent to 64 percent of consumers lived paycheck to paycheck during the months they surveyed.

LendingClub defines someone as living paycheck to paycheck if they have “no money left over after spending their earnings.” Put differently, for LendingClub, someone is living paycheck to paycheck if they currently have a low savings rate. Thus, as they explain in their June 2021 report, “one can have a good chunk of money in the bank as well as a good salary and still struggle to make ends meet.”

BankRate (34%)

Another estimate comes from a YouGov survey commissioned by BankRate. I could not find the precise question that was asked in that survey, but, unlike the LendingClub report, the BankRate write-up of the survey makes it clear that they explicitly asked people whether they were living paycheck-to-paycheck and 34 percent of workers answered that they were.

The BankRate write-up defines “paycheck to paycheck” this way:

The expression, “living paycheck to paycheck,” generally refers to having little or no money for savings left over from your paycheck after covering your regular expenses. You might be unable to pay your bills if you suddenly become unemployed or don’t receive the next paycheck.

The first sentence of this definition refers to currently having a low savings rate, but the elaboration in the second sentence refers to someone who has a low level of emergency savings to draw upon in the case of a negative income shock. These are not the same thing, though I suppose it is literally true that the former “might” lead to the latter. So, as best as I can tell, BankRate, like LendingClub, is using a low savings rate definition for the term, though it is unclear whether the survey respondents were asked the question with this definition or just asked more generally whether they live paycheck to paycheck.

Bank of America (50% or 26%)

Yet another estimate comes from Bank of America. In a survey meant to be representative of the US population, Bank of America asked how strongly people agree or disagree with the statement “I am living paycheck to paycheck.” Just under 50 percent of people answered that they strongly or somewhat agreed. In this part of the report, the authors state that the phrase “living paycheck to paycheck” can refer to “individuals or households that regularly spend nearly all of their income, leaving little to nothing left over for savings.” This is the low savings rate definition.

In this same report, Bank of America analyzed bank account data for a sample of their customers in order to determine how many of them spend 95 percent or more of their household income on necessity spending, which they define as “childcare, external credit card payments, gasoline, general retail, grocery, housing (mortgage/rent), insurance, cable TV/broadband, public transportation, tax payments, vehicle costs and payments.” Using this method, they conclude that 26 percent of people live paycheck to paycheck so defined.

Bank of America does not present the 26 percent figure as a debunking of the 50 percent figure. Instead, they point out that their lower number reflects their “focus on necessity spending,” ostensibly meaning they recognize that their particular definition of paycheck-to-paycheck living is very narrow. Another limitation of the 26 percent figure is that it comes from a sample of Bank of America customers, which are not representative of the overall population, and, even among those customers, they only have access to the income and spending information that those customers run through their Bank of America accounts. For instance, customers that are also doing necessity spending on a Citi credit card that they carry a balance for will end up miscounted.

This necessity-spending approach is seemingly intended to provide an extreme measurement aimed at establishing an absolute lower bound of possible estimates. Thus, the authors of the report conclude that, despite the extremity of their definition, which cuts the self-reported paycheck-to-paycheck figure in half, their findings “are significant and do suggest a relatively large proportion of households are living paycheck to paycheck.”

Conceptual Issues

Already in the three reports above, we see one of the problems with this discourse: the phrase “living paycheck to paycheck” is ambiguous. The Bank of America report calls this out explicitly, stating that the phrase is “somewhat nebulous and is not always clearly defined.” Across the three reports, we see at least three different ways of understanding the term:

  1. Low savings rate
  2. High necessity spending rate
  3. Low emergency savings level

These three things are related to one another, but not exactly the same.

There are also other complications with defining the phrase. For example, someone who currently lives off labor income and does not have enough money to retire could reasonably describe themselves as living “paycheck to paycheck.” How are else are they living? Certainly not “dividend to dividend.”

Should the phrase apply to retired people who, by definition, do not receive paychecks? Do students live paycheck to paycheck? Disabled people? Stay-at-home parents? Kids? On any given month, around 50 percent of the population does not work. This includes 40 percent of adults aged 18 and above as well as 25 percent of adults between the ages of 18 and 64. Do these people inherit the paycheck-to-paycheck status of their overall household or family unit? Or should we analyze them in a more individualized way?

I point this all out only to illustrate that, when we move away from looking at how people self-describe and start looking directly at personal financial data to determine how many people live paycheck to paycheck, there are lots of possible ways to do it, and many complications to work through.

Government Data

The most prominent critic of the idea that paycheck-to-paycheck living is very common is Matt Darling. His rebuttal typically consists of two figures from the Survey of Consumer Finances (SCF) and one survey question from the Survey of Household Economics and Decisionmaking (SHED). Both are government surveys conducted or sponsored by the Federal Reserve.

Survey of Consumer Finances

The first SCF figure Darling cites is median net worth, which in 2022, stood at $192,700. The problem with this figure is that it includes non-financial assets like one’s home and car and relatively illiquid financial assets like balances in retirement and education accounts. To address these problems, Darling provides a second SCF figure that more closely matches the financial resources people tend to think about as being available to them in the case of an emergency, i.e. their liquid financial assets. In 2022, median liquid assets stood at $7,850.

What’s interesting about this $7,850 figure is that Darling (and Ben Krauss) presents it as self-evidently debunking the paycheck-to-paycheck argument. But does it? The median income in the SCF data is $70,529. This means that median liquid savings is only 40 days of median income. If you would run out of liquid savings in 40 days, do you not live paycheck to paycheck? How few days does it need to be? 30 days? 20 days?

If we drop the elderly from the sample based on the observation that they are almost all retired and therefore not receiving any paychecks, the result is that median liquid savings fall to $6,700 and median income increases to $77,825. Thus, for the non-elderly who actually receive paychecks, liquid savings would only last 31 days at the median.

If we walk through each household and divide their liquid savings by their income to determine how many days of income they have in liquid savings, we can see what this looks like across the entire distribution (graph truncated at 80th percentile to keep the scale manageable).

Assuming the liquid-savings-to-income ratio is the right thing to look at, this is the best graph for assessing “paycheck to paycheck” status that there is. But where exactly do you draw the line for living paycheck to paycheck? At the 60th percentile, which is the figure Bernie likes to use, households have 57 days of savings. For non-elderly households, it’s 46 days of savings. The paycheck-to-paycheck claims just don’t seem that outlandish by this measure.

And this metric is using the SCF, which is a survey of household finances, not individual finances, and which defines households using a “primary economic unit” concept that effectively excludes any adult that is not the economically dominant individual or couple in their household. So, struggling young adults who live with mom and dad are not independently surveyed and are instead lumped in with their parents. Adjusting further for that would knock a few more days off all of these figures. So would weighting the results by individuals rather than households, as the former would end up counting minor children as among the population that lives paycheck to paycheck while the latter really does not.

Survey of Household Economics and Decisionmaking

The SHED figure that Darling likes to cite asks respondents the following question:

Have you set aside emergency or rainy day funds that would cover your expenses for 3 months in case of sickness, job loss, economic downturn, or other emergencies?

Around 54 percent of respondents answer yes while the other 46 percent answer no. From this, Darling concludes that at least 54 percent of Americans are not living paycheck to paycheck. But SHED asks these people other similar questions and the answers to those questions don’t line up well with the conclusion that these 54 percent of Americans can actually handle three months of expenses.

For instance, SHED asks the following:

Based on your current financial situation, what is the largest emergency expense that you could
handle right now using only your savings?

The below graph contains the distribution of answers to this question among the 54 percent who say they have three months of emergency expenses saved up:

So, 24 percent of the people who say they have three months of emergency savings also say they cannot afford an emergency expense of $2,000 or more. If we divide $2,000 by three months, we get $667 per month. In 2023, the poverty line for a single adult was $1,215 per month and not all of these people are single adults. Is it really the case that someone can afford three months of expenses if they don’t have enough savings to cover three months worth of deep poverty living? I am skeptical to say the least.

If we define someone as living paycheck to paycheck if they either say they do not have three months of emergency savings or say they cannot afford a $2,000 emergency expense, then SHED tells us 59 percent of American adults are living paycheck to paycheck, which is of course just 1 point shy of the Bernie-favored 60 percent figure.

If we exclude the retired from this calculation, since they don’t receive paychecks, or assign the paycheck-to-paycheck status of adults to their minor children (children are not counted in SHED), the number would go even higher than that.

Of course, this is all using the low emergency savings level definition of “paycheck to paycheck” living, which seems to be Darling’s preferred approach. The SHED also asks people whether, in the last month, they spent more than, less than, or about the same amount as their income, which aligns with the low savings rate definition of “paycheck to paycheck” living. Fifty-two percent of respondents say they spent more than or the same amount as their income.

Conclusion

Given the inherent ambiguities of the phrase, I am certainly not going to try to argue that there is an obvious “real” number out there for paycheck-to-paycheck living, nor am I going to vouch for self-reported answers to that or similar questions in surveys. But at the same time, the idea that it is most definitely not sixty percent, that Bernie is being obviously crazy in saying that, seems pretty silly, especially after you probe the SCF and SHED data that is meant to debunk the claim.

As always, it’s important to also keep in mind what exactly we are implying when we talk about living paycheck to paycheck. I am all for people having personal savings, but it’s also the case that, in a well-designed economic system, big financial shocks are smoothed over, not by one’s own personal assets, but through the welfare state. Large expenditures due to health problems should be handled by public health insurance. Income declines resulting from job loss or disability should be covered by unemployment and disability benefits. Economic security should not depend on an uninterrupted flow of paychecks and good health, but it also should not depend on building up large amounts of liquid assets.

Read the whole story
jsled
9 days ago
reply
South Burlington, Vermont
Share this story
Delete

Things we learned about LLMs in 2024

2 Shares

A lot has happened in the world of Large Language Models over the course of 2024. Here's a review of things we figured out about the field in the past twelve months, plus my attempt at identifying key themes and pivotal moments.

This is a sequel to my review of 2023.

In this article:

The GPT-4 barrier was comprehensively broken

In my December 2023 review I wrote about how We don’t yet know how to build GPT-4 - OpenAI's best model was almost a year old at that point, yet no other AI lab had produced anything better. What did OpenAI know that the rest of us didn't?

I'm relieved that this has changed completely in the past twelve months. 18 organizations now have models on the Chatbot Arena Leaderboard that rank higher than the original GPT-4 from March 2023 (GPT-4-0314 on the board) - 70 models in total.

Screenshot of a comparison table showing AI model rankings. Table headers: Rank (UB), Rank (StyleCtrl), Model, Arena Score, 95% CI, Votes, Organization, License. Shows 12 models including GLM-4-0520, Llama-3-70B-Instruct, Gemini-1.5-Flash-8B-Exp-0827, with rankings, scores, and licensing details. Models range from rank 52-69 with Arena scores between 1186-1207.

The earliest of those was Google's Gemini 1.5 Pro, released in February. In addition to producing GPT-4 level outputs, it introduced several brand new capabilities to the field - most notably its 1 million (and then later 2 million) token input context length, and the ability to input video.

I wrote about this at the time in The killer app of Gemini Pro 1.5 is video, which earned me a short appearance as a talking head in the Google I/O opening keynote in May.

Gemini 1.5 Pro also illustrated one of the key themes of 2024: increased context lengths. Last year most models accepted 4,096 or 8,192 tokens, with the notable exception of Claude 2.1 which accepted 200,000. Today every serious provider has a 100,000+ token model, and Google's Gemini series accepts up to 2 million.

Longer inputs dramatically increase the scope of problems that can be solved with an LLM: you can now throw in an entire book and ask questions about its contents, but more importantly you can feed in a lot of example code to help the model correctly solve a coding problem. LLM use-cases that involve long inputs are far more interesting to me than short prompts that rely purely on the information already baked into the model weights. Many of my tools were built using this pattern.

Getting back to models that beat GPT-4: Anthropic's Claude 3 series launched in March, and Claude 3 Opus quickly became my new favourite daily-driver. They upped the ante even more in June with the launch of Claude 3.5 Sonnet - a model that is still my favourite six months later (though it got a significant upgrade on October 22, confusingly keeping the same 3.5 version number. Anthropic fans have since taken to calling it Claude 3.6).

Then there's the rest. If you browse the Chatbot Arena leaderboard today - still the most useful single place to get a vibes-based evaluation of models - you'll see that GPT-4-0314 has fallen to around 70th place. The 18 organizations with higher scoring models are Google, OpenAI, Alibaba, Anthropic, Meta, Reka AI, 01 AI, Amazon, Cohere, DeepSeek, Nvidia, Mistral, NexusFlow, Zhipu AI, xAI, AI21 Labs, Princeton and Tencent.

Training a GPT-4 beating model was a huge deal in 2023. In 2024 it's an achievement that isn't even particularly notable, though I personally still celebrate any time a new organization joins that list.

Some of those GPT-4 models run on my laptop

My personal laptop is a 64GB M2 MacBook Pro from 2023. It's a powerful machine, but it's also nearly two years old now - and crucially it's the same laptop I've been using ever since I first ran an LLM on my computer back in March 2023 (see Large language models are having their Stable Diffusion moment).

That same laptop that could just about run a GPT-3-class model in March last year has now run multiple GPT-4 class models! Some of my notes on that:

This remains astonishing to me. I thought a model with the capabilities and output quality of GPT-4 needed a datacenter class server with one or more $40,000+ GPUs.

These models take up enough of my 64GB of RAM that I don't run them often - they don't leave much room for anything else.

The fact that they run at all is a testament to the incredible training and inference performance gains that we've figured out over the past year. It turns out there was a lot of low-hanging fruit to be harvested in terms of model efficiency. I expect there's still more to come.

Meta's Llama 3.2 models deserve a special mention. They may not be GPT-4 class, but at 1B and 3B sizes they punch massively above their weight. I run Llama 3.2 3B on my iPhone using the free MLC Chat iOS app and it's a shockingly capable model for its tiny (<2GB) size. Try firing it up and asking it for "a plot outline of a Netflix Christmas movie where a data journalist falls in love with a local ceramacist". Here's what I got, at a respectable 20 tokens per second:

MLC Chat: Llama - [System] Ready to chat. a plot outline of a Netflix Christmas movie where a data journalist falls in love with a local ceramacist. Show as Markdown is turned on. Here's a plot outline for a Netflix Christmas movie: Title: "Love in the Clay" Plot Outline: We meet our protagonist, JESSICA, a data journalist who has just returned to her hometown of Willow Creek, a small, charming town nestled in the snow-covered mountains. She's back to work on a story about the town's history and the effects of gentrification on the local community.

Here's the rest of the transcript. It's bland and generic, but my phone can pitch bland and generic Christmas movies to Netflix now!

LLM prices crashed, thanks to competition and increased efficiency

The past twelve months have seen a dramatic collapse in the cost of running a prompt through the top tier hosted LLMs.

In December 2023 (here's the Internet Archive for the OpenAI pricing page) OpenAI were charging $30/million input tokens for GPT-4, $10/mTok for the then-new GPT-4 Turbo and $1/mTok for GPT-3.5 Turbo.

Today $30/mTok gets you OpenAI's most expensive model, o1. GPT-4o is $2.50 (12x cheaper than GPT-4) and GPT-4o mini is $0.15/mTok - nearly 7x cheaper than GPT-3.5 and massively more capable.

Other model providers charge even less. Anthropic's Claude 3 Haiku (from March, but still their cheapest model) is $0.25/mTok. Google's Gemini 1.5 Flash is $0.075/mTok and their Gemini 1.5 Flash 8B is $0.0375/mTok - that's 27x cheaper than GPT-3.5 Turbo last year.

I've been tracking these pricing changes under my llm-pricing tag.

These price drops are driven by two factors: increased competition and increased efficiency. The efficiency thing is really important for everyone who is concerned about the environmental impact of LLMs. These price drops tie directly to how much energy is being used for running prompts.

There's still plenty to worry about with respect to the environmental impact of the great AI datacenter buildout, but a lot of the concerns over the energy cost of individual prompts are no longer credible.

Here's a fun napkin calculation: how much would it cost to generate short descriptions of every one of the 68,000 photos in my personal photo library using Google's Gemini 1.5 Flash 8B (released in October), their cheapest model?

Each photo would need 260 input tokens and around 100 output tokens.

260 * 68,000 = 17,680,000 input tokens
17,680,000 * $0.0375/million = $0.66
100 * 68,000 = 6,800,000 output tokens
6,800,000 * $0.15/million = $1.02

That's a total cost of $1.68 to process 68,000 images. That's so absurdly cheap I had to run the numbers three times to confirm I got it right.

How good are those descriptions? Here's what I got from this command:

llm -m gemini-1.5-flash-8b-latest describe -a IMG_1825.jpeg

Against this photo of butterflies at the California Academy of Sciences:

A photo of two butterflies feeding on a red tray

A shallow dish, likely a hummingbird or butterfly feeder, is red. Pieces of orange slices of fruit are visible inside the dish.

Two butterflies are positioned in the feeder, one is a dark brown/black butterfly with white/cream-colored markings. The other is a large, brown butterfly with patterns of lighter brown, beige, and black markings, including prominent eye spots. The larger brown butterfly appears to be feeding on the fruit.

260 input tokens, 92 output tokens. Cost approximately 0.0024 cents (that's less than a 400th of a cent).

This increase in efficiency and reduction in price is my single favourite trend from 2024. I want the utility of LLMs at a fraction of the energy cost and it looks like that's what we're getting.

Multimodal vision is common, audio and video are starting to emerge

My butterfly example above illustrates another key trend from 2024: the rise of multi-modal LLMs.

A year ago the single most notable example of these was GPT-4 Vision, released at OpenAI's DevDay in November 2023. Google's multi-modal Gemini 1.0 was announced on December 7th 2023 so it also (just) makes it into the 2023 window.

In 2024, almost every significant model vendor released multi-modal models. We saw the Claude 3 series from Anthropic in March, Gemini 1.5 Pro in April (images, audio and video), then September brought Qwen2-VL and Mistral's Pixtral 12B and Meta's Llama 3.2 11B and 90B vision models. We got audio input and output from OpenAI in October, then November saw SmolVLM from Hugging Face and December saw image and video models from Amazon Nova.

In October I upgraded my LLM CLI tool to support multi-modal models via attachments. It now has plugins for a whole collection of different vision models.

I think people who complain that LLM improvement has slowed are often missing the enormous advances in these multi-modal models. Being able to run prompts against images (and audio and video) is a fascinating new way to apply these models.

Voice and live camera mode are science fiction come to life

The audio and live video modes that have started to emerge deserve a special mention.

The ability to talk to ChatGPT first arrived in September 2023, but it was mostly an illusion: OpenAI used their excellent Whisper speech-to-text model and a new text-to-speech model (creatively named tts-1) to enable conversations with the ChatGPT mobile apps, but the actual model just saw text.

The May 13th announcement of GPT-4o included a demo of a brand new voice mode, where the true multi-modal GPT-4o (the o is for "omni") model could accept audio input and output incredibly realistic sounding speech without needing separate TTS or STT models.

The demo also sounded conspicuously similar to Scarlett Johansson... and after she complained the voice from the demo, Skye, never made it to a production product.

The delay in releasing the new voice mode after the initial demo caused quite a lot of confusion. I wrote about that in ChatGPT in “4o” mode is not running the new features yet.

When ChatGPT Advanced Voice mode finally did roll out (a slow roll from August through September) it was spectacular. I've been using it extensively on walks with my dog and it's amazing how much the improvement in intonation elevates the material. I've also had a lot of fun experimenting with the OpenAI audio APIs.

Even more fun: Advanced Voice mode can do accents! Here's what happened when I told it I need you to pretend to be a California brown pelican with a very thick Russian accent, but you talk to me exclusively in Spanish.

OpenAI aren't the only group with a multi-modal audio model. Google's Gemini also accepts audio input, and the Google Gemini apps can speak in a similar way to ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that's meant to roll out in Q1 of 2025.

Google's NotebookLM, released in September, took audio output to a new level by producing spookily realistic conversations between two "podcast hosts" about anything you fed into their tool. They later added custom instructions, so naturally I turned them into pelicans:

The most recent twist, again from December (December was a lot) is live video. ChatGPT voice mode now provides the option to share your camera feed with the model and talk about what you can see in real time. Google Gemini have a preview of the same feature, which they managed to ship the day before ChatGPT did.

These abilities are just a few weeks old at this point, and I don't think their impact has been fully felt yet. If you haven't tried them out yet you really should.

Both Gemini and OpenAI offer API access to these features as well. OpenAI started with a WebSocket API that was quite challenging to use, but in December they announced a new WebRTC API which is much easier to get started with. Building a web app that a user can talk to via voice is easy now!

Prompt driven app generation is a commodity already

This was possible with GPT-4 in 2023, but the value it provides became evident in 2024.

We already knew LLMs were spookily good at writing code. If you prompt them right, it turns out they can build you a full interactive application using HTML, CSS and JavaScript (and tools like React if you wire up some extra supporting build mechanisms) - often in a single prompt.

Anthropic kicked this idea into high gear when they released Claude Artifacts, a groundbreaking new fetaure that was initially slightly lost in the noise due to being described half way through their announcement of the incredible Claude 3.5 Sonnet.

With Artifacts, Claude can write you an on-demand interactive application and then let you use it directly inside the Claude interface.

Here's my Extract URLs app, entirely generated by Claude:

Extract URLs tool. Content pasted. URLs extracted. Shows a list of extracted URLs.

I've found myself using this a lot. I noticed how much I was relying on it in October and wrote Everything I built with Claude Artifacts this week, describing 14 little tools I had put together in a seven day period.

Since then, a whole bunch of other teams have built similar systems. GitHub announced their version of this - GitHub Spark - in October. Mistral Chat added it as a feature called Canvas in November.

Steve Krouse from Val Town built a version of it against Cerebras, showcasing how a 2,000 token/second LLM can iterate on an application with changes visible in less than a second.

Then in December, the Chatbot Arena team introduced a whole new leaderboard for this feature, driven by users building the same interactive app twice with two different models and voting on the answer. Hard to come up with a more convincing argument that this feature is now a commodity that can be effectively implemented against all of the leading models.

I've been tinkering with a version of this myself for my Datasette project, with the goal of letting users use prompts to build and iterate on custom widgets and data visualizations against their own data. I also figured out a similar pattern for writing one-shot Python programs, enabled by uv.

This prompt-driven custom interface feature is so powerful and easy to build (once you've figured out the gnarly details of browser sandboxing) that I expect it to show up as a feature in a wide range of products in 2025.

Universal access to the best models lasted for just a few short months

For a few short months this year all three of the best available models - GPT-4o, Claude 3.5 Sonnet and Gemini 1.5 Pro - were freely available to most of the world.

OpenAI made GPT-4o free for all users in May, and Claude 3.5 Sonnet was freely available from its launch in June. This was a momentus change, because for the previous year free users had mostly been restricted to GPT-3.5 level models, meaning new users got a very inaccurate mental model of what a capable LLM could actually do.

That era appears to have ended, likely permanently, with OpenAI's launch of ChatGPT Pro. This $200/month subscription service is the only way to access their most capable model, o1 Pro.

Since the trick behind the o1 series (and the future models it will undoubtedly inspire) is to expend more compute time to get better results, I don't think those days of free access to the best available models are likely to return.

"Agents" still haven't really happened yet

I find the term "agents" extremely frustrating. It lacks a single, clear and widely understood meaning... but the people who use the term never seem to acknowledge that.

If you tell me that you are building "agents", you've conveyed almost no information to me at all. Without reading your mind I have no way of telling which of the dozens of possible definitions you are talking about.

The two main categories I see are people who think AI agents are obviously things that go and act on your behalf - the travel agent model - and people who think in terms of LLMs that have been given access to tools which they can run in a loop as part of solving a problem. The term "autonomy" is often thrown into the mix too, again without including a clear definition.

(I also collected 211 definitions on Twitter a few months ago - here they are in Datasette Lite - and had gemini-exp-1206 attempt to summarize them.)

Whatever the term may mean, agents still have that feeling of perpetually "coming soon".

Terminology aside, I remain skeptical as to their utility based, once again, on the challenge of gullibility. LLMs believe anything you tell them. Any systems that attempts to make meaningful decisions on your behalf will run into the same roadblock: how good is a travel agent, or a digital assistant, or even a research tool if it can't distinguish truth from fiction?

Just the other day Google Search was caught serving up an entirely fake description of the non-existant movie "Encanto 2". It turned out to be summarizing an imagined movie listing from a fan fiction wiki.

Prompt injection is a natural consequence of this gulibility. I've seen precious little progress on tackling that problem in 2024, and we've been talking about it since September 2022.

I'm beginning to see the most popular idea of "agents" as dependent on AGI itself. A model that's robust against gulliblity is a very tall order indeed.

Evals really matter

Anthropic's Amanda Askell (responsible for much of the work behind Claude's Character):

The boring yet crucial secret behind good system prompts is test-driven development. You don't write down a system prompt and find ways to test it. You write down tests and find a system prompt that passes them.

It's become abundantly clear over the course of 2024 that writing good automated evals for LLM-powered systems is the skill that's most needed to build useful applications on top of these models. If you have a strong eval suite you can adopt new models faster, iterate better and build more reliable and useful product features than your competition.

Vercel's Malte Ubl:

When @v0 first came out we were paranoid about protecting the prompt with all kinds of pre and post processing complexity.

We completely pivoted to let it rip. A prompt without the evals, models, and especially UX is like getting a broken ASML machine without a manual

I'm still trying to figure out the best patterns for doing this for my own work. Everyone knows that evals are important, but there remains a lack of great guidance for how to best implement them - I'm tracking this under my evals tag. My SVG pelican riding a bicycle benchmark is a pale imitation of what a real eval suite should look like.

Apple Intelligence is bad, Apple's MLX library is excellent

As a Mac user I've been feeling a lot better about my choice of platform this year.

Last year it felt like my lack of a Linux/Windows machine with an NVIDIA GPU was a huge disadvantage in terms of trying out new models.

On paper, a 64GB Mac should be a great machine for running models due to the way the CPU and GPU can share the same memory. In practice, many models are released as model weights and libraries that reward NVIDIA's CUDA over other platforms.

The llama.cpp ecosystem helped a lot here, but the real breakthrough has been Apple's MLX library, "an array framework for Apple Silicon". It's fantastic.

Apple's mlx-lm Python library supports running a wide range of MLX-compatible models on my Mac, with excellent performance. mlx-community on Hugging Face offers more than 1,000 models that have been converted to the necessary format.

Prince Canuma's excellent, fast moving mlx-vlm project brings vision LLMs to Apple Silicon as well. I used that recently to run Qwen's QvQ.

While MLX is a game changer, Apple's own "Apple Intelligence" features have mostly been a disappointment. I wrote about their initial announcement in June, and I was optimistic that Apple had focused hard on the subset of LLM applications that preserve user privacy and minimize the chance of users getting mislead by confusing features.

Now that those features are rolling out they're pretty weak. As an LLM power-user I know what these models are capable of, and Apple's LLM features offer a pale imitation of what a frontier LLM can do. Instead we're getting notification summaries that misrepresent news headlines and writing assistant tools that I've not found useful at all. Genmoji are kind of fun though.

The rise of inference-scaling "reasoning" models

The most interesting development in the final quarter of 2024 was the introduction of a new shape of LLM, exemplified by OpenAI's o1 models - initially released as o1-preview and o1-mini on September 12th.

One way to think about these models is an extension of the chain-of-thought prompting trick, first explored in the May 2022 paper Large Language Models are Zero-Shot Reasoners.

This is that trick where, if you get a model to talk out loud about a problem it's solving, you often get a result which the model would not have achieved otherwise.

o1 takes this process and further bakes it into the model itself. The details are somewhat obfuscated: o1 models spend "reasoning tokens" thinking through the problem that are not directly visible to the user (though the ChatGPT UI shows a summary of them), then outputs a final result.

The biggest innovation here is that it opens up a new way to scale a model: instead of improving model performance purely through additional compute at training time, models can now take on harder problems by spending more compute on inference.

The sequel to o1, o3 (they skipped "o2" for European trademark reasons) was announced on 20th December with an impressive result against the ARC-AGI benchmark, albeit one that likely involved more than $1,000,000 of compute time expense!

o3 is expected to ship in January. I doubt many people have real-world problems that would benefit from that level of compute expenditure - I certainly don't! - but it appears to be a genuine next step in LLM architecture for taking on much harder problems.

OpenAI are not the only game in town here. Google released their first entrant in the category, gemini-2.0-flash-thinking-exp, on December 19th.

Alibaba's Qwen team released their QwQ model on November 28th - under an Apache 2.0 license, and that one I could run on my own machine. They followed that up with a vision reasoning model called QvQ on December 24th, which I also ran locally.

DeepSeek made their DeepSeek-R1-Lite-Preview model available to try out through their chat interface on November 20th.

To understand more about inference scaling I recommend Is AI progress slowing down? by Arvind Narayanan and Sayash Kapoor.

Nothing yet from Anthropic or Meta but I would be very surprised if they don't have their own inference-scaling models in the works. Meta published a relevant paper Training Large Language Models to Reason in a Continuous Latent Space in December.

Was the best currently available LLM trained in China for less than $6m?

Not quite, but almost! It does make for a great attention-grabbing headline.

The big news to end the year was the release of DeepSeek v3 - dropped on Hugging Face on Christmas Day without so much as a README file, then followed by documentation and a paper the day after that.

DeepSeek v3 is a huge 685B parameter model - one of the largest openly licensed models currently available, significantly bigger than the largest of Meta's Llama series, Llama 3.1 405B.

Benchmarks put it up there with Claude 3.5 Sonnet. Vibe benchmarks (aka the Chatbot Arena) currently rank it 7th, just behind the Gemini 2.0 and OpenAI 4o/o1 models. This is by far the highest ranking openly licensed model.

The really impressive thing about DeepSeek v3 is the training cost. The model was trained on 2,788,000 H800 GPU hours at an estimated cost of $5,576,000. Llama 3.1 405B trained 30,840,000 GPU hours - 11x that used by DeepSeek v3, for a model that benchmarks slightly worse.

Those US export regulations on GPUs to China seem to have inspired some very effective training optimizations!

The environmental impact got better

A welcome result of the increased efficiency of the models - both the hosted ones and the ones I can run locally - is that the energy usage and environmental impact of running a prompt has dropped enormously over the past couple of years.

OpenAI themselves are charging 100x less for a prompt compared to the GPT-3 days. I have it on good authority that neither Google Gemini nor Amazon Nova (two of the least expensive model providers) are running prompts at a loss.

I think this means that, as individual users, we don't need to feel any guilt at all for the energy consumed by the vast majority of our prompts. The impact is likely neglible compared to driving a car down the street or maybe even watching a video on YouTube.

Likewise, training. DeepSeek v3 training for less than $6m is a fantastic sign that training costs can and should continue to drop.

For less efficient models I find it useful to compare their energy usage to commercial flights. The largest Llama 3 model cost about the same as a single digit number of fully loaded passenger flights from New York to London. That's certainly not nothing, but once trained that model can be used by millions of people at no extra training cost.

The environmental impact got much, much worse

The much bigger problem here is the enormous competitive buildout of the infrastructure that is imagined to be necessary for these models in the future.

Companies like Google, Meta, Microsoft and Amazon are all spending billions of dollars rolling out new datacenters, with a very material impact on the electricity grid and the environment. There's even talk of spinning up new nuclear power stations, but those can take decades.

Is this infrastructure necessary? DeepSeek v3's $6m training cost and the continued crash in LLM prices might hint that it's not. But would you want to be the big tech executive that argued NOT to build out this infrastructure only to be proven wrong in a few years' time?

An interesting point of comparison here could be the way railways rolled out around the world in the 1800s. Constructing these required enormous investments and had a massive environmental impact, and many of the lines that were built turned out to be unnecessary - sometimes multiple lines from different companies serving the exact same routes!

The resulting bubbles contributed to several financial crashes, see Wikipedia for Panic of 1873, Panic of 1893, Panic of 1901 and the UK's Railway Mania. They left us with a lot of useful infrastructure and a great deal of bankruptcies and environmental damage.

The year of slop

2024 was the year that the word "slop" became a term of art. I wrote about this in May, expanding on this tweet by @deepfates:

Watching in real time as “slop” becomes a term of art. the way that “spam” became the term for unwanted emails, “slop” is going in the dictionary as the term for unwanted AI generated content

I expanded that definition a tiny bit to this:

Slop describes AI-generated content that is both unrequested and unreviewed.

I ended up getting quoted talking about slop in both the Guardian and the NY Times. Here's what I said in the NY TImes:

Society needs concise ways to talk about modern A.I. — both the positives and the negatives. ‘Ignore that email, it’s spam,’ and ‘Ignore that article, it’s slop,’ are both useful lessons.

I love the term "slop" because it so succinctly captures one of the ways we should not be using generative AI!

Slop was even in the running for Oxford Word of the Year 2024, but it lost to brain rot.

Synthetic training data works great

An idea that surprisingly seems to have stuck in the public consciousness is that of "model collapse". This was first described in the paper The Curse of Recursion: Training on Generated Data Makes Models Forget in May 2023, and repeated in Nature in July 2024 with the more eye-catching headline AI models collapse when trained on recursively generated data.

The idea is seductive: as the internet floods with AI-generated slop the models themselves will degenerate, feeding on their own output in a way that leads to their inevitable demise!

That's clearly not happening. Instead, we are seeing AI labs increasingly train on synthetic content - deliberately creating artificial data to help steer their models in the right way.

One of the best descriptions I've seen of this comes from the Phi-4 technical report, which included this:

Synthetic data as a substantial component of pretraining is becoming increasingly common, and the Phi series of models has consistently emphasized the importance of synthetic data. Rather than serving as a cheap substitute for organic data, synthetic data has several direct advantages over organic data.

Structured and Gradual Learning. In organic datasets, the relationship between tokens is often complex and indirect. Many reasoning steps may be required to connect the current token to the next, making it challenging for the model to learn effectively from next-token prediction. By contrast, each token generated by a language model is by definition predicted by the preceding tokens, making it easier for a model to follow the resulting reasoning patterns.

Another common technique is to use larger models to help create training data for their smaller, cheaper alternatives - a trick used by an increasing number of labs. DeepSeek v3 used "reasoning" data created by DeepSeek-R1. Meta's Llama 3.3 70B fine-tuning used over 25M synthetically generated examples.

Careful design of the training data that goes into an LLM appears to be the entire game for creating these models. The days of just grabbing a full scrape of the web and indiscriminately dumping it into a training run are long gone.

LLMs somehow got even harder to use

A drum I've been banging for a while is that LLMs are power-user tools - they're chainsaws disguised as kitchen knives. They look deceptively simple to use - how hard can it be to type messages to a chatbot? - but in reality you need a huge depth of both understanding and experience to make the most of them and avoid their many pitfalls.

If anything, this problem got worse in 2024.

We've built computer systems you can talk to in human language, that will answer your questions and usually get them right! ... depending on the question, and how you ask it, and whether it's accurately reflected in the undocumented and secret training set.

The number of available systems has exploded. Different systems have different tools they can apply to your problems - like Python and JavaScript and web search and image generation and maybe even database lookups... so you'd better understand what those tools are, what they can do and how to tell if the LLM used them or not.

Did you know ChatGPT has two entirely different ways of running Python now?

Want to build a Claude Artifact that talks to an external API? You'd better understand CSP and CORS HTTP headers first.

The models may have got more capable, but most of the limitations remained the same. OpenAI's o1 may finally be able to (mostly) count the Rs in strawberry, but its abilities are still limited by its nature as an LLM and the constraints placed on it by the harness it's running in. o1 can't run web searches or use Code Interpreter, but GPT-4o can - both in that same ChatGPT UI. (o1 will pretend to do those things if you ask it to, a regression to the URL hallucinations bug from early 2023).

What are we doing about this? Not much. Most users are thrown in at the deep end. The default LLM chat UI is like taking brand new computer users, dropping them into a Linux terminal and expecting them to figure it all out.

Meanwhile, it's increasingly common for end users to develop wildly inaccurate mental models of how these things work and what they are capable of. I've seen so many examples of people trying to win an argument with a screenshot from ChatGPT - an inherently ludicrous proposition, given the inherent unreliability of these models crossed with the fact that you can get them to say anything if you prompt them right.

There's a flipside to this too: a lot of better informed people have sworn off LLMs entirely because they can't see how anyone could benefit from a tool with so many flaws. The key skill in getting the most out of LLMs is learning to work with tech that is both inherently unreliable and incredibly powerful at the same time. This is a decidedly non-obvious skill to acquire!

There is so much space for helpful education content here, but we need to do do a lot better than outsourcing it all to AI grifters with bombastic Twitter threads.

Knowledge is incredibly unevenly distributed

Most people have heard of ChatGPT by now. How many have heard of Claude?

The knowledge gap between the people who actively follow this stuff and the 99% of the population who do not is vast.

The pace of change doesn't help either. In just the past month we've seen general availability of live interfaces where you can point your phone's camera at something and talk about it with your voice... and optionally have it pretend to be Santa. Most self-certified nerds haven't even tried that yet.

Given the ongoing (and potential) impact on society that this technology has, I don't think the size of this gap is healthy. I'd like to see a lot more effort put into improving this.

LLMs need better criticism

A lot of people absolutely hate this stuff. In some of the spaces I hang out (Mastodon, Bluesky, Lobste.rs, even Hacker News on occasion) even suggesting that "LLMs are useful" can be enough to kick off a huge fight.

I get it. There are plenty of reasons to dislike this technology - the environmental impact, the (lack of) ethics of the training data, the lack of reliability, the negative applications, the potential impact on people's jobs.

LLMs absolutely warrant criticism. We need to be talking through these problems, finding ways to mitigate them and helping people learn how to use these tools responsibly in ways where the positive applications outweigh the negative.

I like people who are skeptical of this stuff. The hype has been deafening for more than two years now, and there are enormous quantities of snake oil and misinformation out there. A lot of very bad decisions are being made based on that hype. Being critical is a virtue.

If we want people with decision-making authority to make good decisions about how to apply these tools we first need to acknowledge that there ARE good applications, and then help explain how to put those into practice while avoiding the many unintiutive traps.

(If you still don't think there are any good applications at all I'm not sure why you made it to this point in the article!)

I think telling people that this whole field is environmentally catastrophic plagiarism machines that constantly make things up is doing those people a disservice, no matter how much truth that represents. There is genuine value to be had here, but getting to that value is unintuitive and needs guidance.

Those of us who understand this stuff have a duty to help everyone else figure it out.

Everything tagged "llms" on my blog in 2024

Because I undoubtedly missed a whole bunch of things, here's every long-form post I wrote in 2024 that I tagged with llms:

(This list generated using Django SQL Dashboard with a SQL query written for me by Claude.)

Tags: google, ai, openai, generative-ai, llms, anthropic, gemini, meta, inference-scaling

Read the whole story
jsled
9 days ago
reply
South Burlington, Vermont
Share this story
Delete

It’s A Start: Musk And DOGE Take Their First Real Hit

1 Share

It is nearly impossible to keep up with all the litigation challenging all the terrible, if not also lawless and unconstitutional, things the Trump Administration is doing. But one group of cases is particularly interesting: the cases involving Musk and DOGE. In part because they are the lawless mercenaries Trump has been sending to do much of his unconstitutional bidding, but also because, by being lawless, they tempt their own personal liability that they may hopefully someday have to bear. Having their lawlessness established by the courts is also an important first step to stemming the tide, because even if it doesn’t lead to an injunction officially ordering it to stop, it can hopefully scare off anyone enabling it if they are forced to contemplate how much liability they may also have to face once Musk and DOGE’s illusory authority finishes falling away.

The earliest litigation challenging effects of their takeovers of agencies didn’t tend to directly name them as defendants (see for example American Foreign Service Association v. Trump). Those lawsuits tended to challenge what Musk and DOGE caused the agencies to do, generally in the form of complaining that “the agency leaders had no business taking these actions.” Then we started to see some litigation pursue more of a hybrid model where Musk and DOGE were also named, because their lawless interference was a factor in the illegal actions taken by the agency (see for example American Federation of Government Employees v. Office of Personnel Management). But a few lawsuits took a swing at Musk and DOGE’s lawlessness directly and how everything they were doing everywhere, or at least nearly everywhere, was unlawful. One of the earliest such cases pioneering this form of litigation was New Mexico v. Musk, which is now in the middle of some expedited discovery. Although a TRO was denied in that case, in dicta the judge there said it looked like Musk and DOGE had been acting lawlessly across government, interfering with all sorts of agencies and their actions.

But this week came the first official finding that Musk and DOGE appeared to be unlawfully messing with agencies. It came in one of the other earlier cases, Does 1-26 v. Musk and relates solely to their actions taken in furtherance of unlawfully shutting down USAID.

What follows is a scenic tour of this important decision, with some comments afterwards about its potential implications. Although it is a preliminary finding rather than a final determination, and specific to an injunction governing its actions only with respect to USAID, its reasoning is sure to reverberate, not just in this case but across all cases involving their behavior. It also walks through some important concepts that will be relevant in all cases, especially ones trying to seek some sort of injunctive relief to get the what is going on to stop, so it is worth seeing how they were handled here.  For the sake of readability, I’ll use something called “pin cites” to indicate where in the 68(!) page decision the relevant language can be found and save block quotes for particularly salient language, lest the post get too long and confusing. But it’s all important, and surely not the last word on any of this.

Factual background

The decision opens by setting out some factual background. It begins by describing the parties, first DOGE, [p.2-3], and then Elon Musk, [p.3-5], documenting in particular how Trump had continually declared him to be DOGE’s leader, despite declarations filed by the government asserting the contrary. It then described DOGE’s activities writ large across the government,[p.5-7], noting in particular that “DOGE has taken numerous actions without any apparent advanced approval by agency leadership.” [p.6]. The next several pages [p.7-13] summarized DOGE’s meddling with USAID in particular.

After that the background section describes the plaintiffs, who are all “current and former employees or personal services contractors (‘PSCs’) of USAID.” [p.14]. They have proceeded anonymously, although their complaint and declarations describes their different roles and the types of harm each has either experienced or observed as a result of those roles. For instance, one plaintiff is stationed in a high-risk area in Central America, yet in danger of losing access to cellphones, electricity, and internet—and with them the security infrastructure they depend on—as a result of DOGE’s interference with USAID’s payment system. [p.15]. Another in the Middle East is in a similarly insecure predicament. [p.15]. The decision described their concerns, which in addition to those security-related ones also includes wrongful job loss, the inability to get benefits and owed payments processed, and the potential leakage of sensitive personal data, including family member information and information contained within security clearance files. [p.15-16]. The decision also discussed the reputational harms plaintiffs have experienced as a result of Musk besmirching the reputation of USAID and the tangible way it has already caused problems for them at home and abroad. [p.16].

The decision then launches into a discussion about the preliminary injunction the plaintiffs are seeking in order to obtain “narrow emergency relief.” [p.17]. To get an injunction the plaintiffs would need to make a “clear showing” that, “(1) They are likely to succeed on the merits; (2) they are likely to suffer irreparable harm in the absence of preliminary relief; (3) the balance of equities tips in their favor; and (4) an injunction is in the public interest.” [p.17]. Of note, these factors have been issues in essentially all of the cases seeking some form of injunctive relief, including TROs, although courts in general are even more hesitant to award them in the TRO context given that they are a remedy awarded even earlier in the litigation before more facts, or even more notice, has been given to the other party. But because these are such important factors in all these cases seeking to get the madness to stop it is worth looking to see how the court found them satisfied enough here to at least somewhat so order it.

Standing

As the court recites, standing requires that “(1) the plaintiff must have suffered an ‘injury in fact’; (2) the injury must be fairly traceable to the actions of the defendant; and (3) it must be ‘likely’ that the injury will be ‘redressed by a favorable decision.'” [p.17-18]. We’ve seen some flavor of this before, like in the Murthy v. Missouri case, where standing was not found.

Here, on “injury in fact,” the court found a variety of potential injuries, such as plaintiffs having their contracts ended or employment terminated, or other harms as discussed in the factual background. [p.18-19]. Some plaintiffs also raised concerns about DOGE having access to sensitive personal information, but the court found it unnecessary to predicate standing on that potential harm. [p.18]. It’s worth noting that not every court has credited either of these injuries, however. For instance, in AFSA, which addressed in part USAID firings, the judge there said that employees had no standing to bring claims for wrongful firings to the district court because there was an alternative form of employment adjudication prescribed by statute. On the other hand, that litigation was against the agency for using its own authority to fire people, whereas in this case the issue was someone with no authority effectively firing them, which raised a different issue than the employment statutes contemplated. And in terms of the access to sensitive information courts have somewhat diverged, although how they have done so is a subject for another day. Here the harm was credited, because if DOGE is behaving unlawfully, it’s still a harm that shouldn’t have been experienced.

As for traceability, “plaintiffs need not established that the challenged action [by defendants] is the ‘proximate cause’ of the injury and instead need only show that it is ‘in part responsible for’ the asserted injury.” [p.19]. And here the court finds the plaintiffs have done that, alleging “sufficient facts to support the conclusion that the personnel and contract actions taken against Plaintiffs, as well as the failures to pay their expenses, occurred at least in part because of Defendants’ actions.” [p.19]. The court then listed a number of facts in the record, including Musk’s own public statements, such as, “we’re in the process of … shutting down USAID,” to support that conclusion, [p.19], along with the fact that DOGE had gained full access to USAID’s office and computer systems and “Musk even threatened to call the United States Marshals if they were not provided with such full access.” [p.20]. The government tried to argue that all these alleged harms were really “caused by independent actions authorized by USAID and its leadership wielding their own power,” but the court wasn’t buying it. [p.20]. The court recognized that whatever the agency heads may have done they hadn’t done in a vacuum:

Here, the record supports the conclusion that the USAID officials were not actually independent actors and that even if they were, they in fact would predictably sign off on the actions directed or taken by Defendants. President Trump publicly acknowledged that Musk and DOGE wield significant influence across federal agencies when he stated in an interview that Musk “take[s] an executive order that I’d signed, and he would have those people go to whatever agency it was” and then “some guy that maybe didn’t want to do it, all of a sudden, he’s signing.” [p.21]

Also not helping the government’s position was that people at USAID who tried to say no to DOGE got fired. [p.21]. And that the RIF notices that laid off thousands of agency employees bore metadata saying that the notices had been sent out by a member of DOGE, not an agency official. [p.21].

Then, on redressability, the court found that the requested relief of an injunction barring Defendants from “[i]ssuing, implementing, enforcing, or otherwise giving effect to terminations, suspensions, or stop-work orders,” as well as barring their access to agency computers and requiring them to reestablish the systems for use by the agencies “would at least contribute to relieving Plaintiffs of some of their injuries.” [p.22-23]. The government tried to plead a form of hypocritical helplessness, complaining essentially that Musk and DOGE lacked the authority to fix what they broke. [p.23]. But the court didn’t buy the contention that Musk and DOGE lacked such agency:

Plaintiffs have presented evidence that as a practical matter, Musk and DOGE Team Members acting at his direction have had the ability to cause personnel actions against employees and contractors, to stop payments, and to control any action that requires use of USAID’s computer systems. The record reflects that Musk has personally taken credit for shutting down USAID, and that he and another DOGE official overrode objections from USAID officials to gain access to the USAID classified computer systems and facilities for DOGE Team members and then caused dissenting USAID officials to be placed on administrative leave. It also reflects that DOGE Team Members have had complete control over the USAID computer systems and, on at least one occasion, blocked USAID-approved payments from being sent out. Indeed, at the hearing, Defendants effectively acknowledge that DOGE has total control over USAID systems when their counsel stated that thus far they have been unable to identify a USAID official unconnected to DOGE who would have the ability to take actions over the computer system to assist Plaintiffs with their immediate needs.” [p.23]

Likelihood of success on the merits

After finding that the plaintiffs had the standing to ask for the injunction, the next major issue in the decision’s analysis was whether they had a likelihood of success on the merits in their underlying complaint. In other words, are they likely to be correct in how they argued that Musk and DOGE’s acts were unlawful in how they harmed them.

One way the plaintiffs argued it was that Musk was acting in violation of the Appointments Clause of the Constitution. The issue is that it appears that he has been performing the role of an “Officer” without having been appointed to that role. To have acted as an Officer an individual must “(1) exercise[] significant authority pursuant to the laws of the United States”; and (2) “occupy a ‘continuing’ position established by law.” [p.25]

On the first point, “significant authority,” the government basically tried to argue that Musk was just advising, and all the things he advised were done because the USAID officials “ratified” the recommendations and turned them into their own. [p.26]. And while the court found the record unclear on some of the actions, for others the court found no evidence to suspect that USAID officials had ratified them. [p.27-28]. It also noted that Musk and DOGE, “despite their allegedly advisory roles, had taken other unilateral actions” at other agencies without any apparent authorization from those agencies’ officials. [p.28]. And it recognized that if the court bought the government’s arguments that Musk had not done what he was accused because he had no formal legal authority to do any of it, it would “open the door to an end-run around the Appointments Clause.” [p.31].

If a President could escape Appointments Clause scrutiny by having advisors go beyond the traditional role of White House advisors who communicate the President’s priorities to agency heads and instead exercise significant authority throughout the federal government so as to bypass duly appointed Officers, the Appointments Clause would be reduced to nothing more than a technical formality. [p. 31].

On the second point, “continuing position,” the court, after some lengthy analysis, determined that the USDS Administrator qualified as a “continuing position.” [p.34]. Then there was the question of whether Musk actually is the USDS Administrator. [p.34]. The government tried to argue via declarations that he was not, but on the other side was all sorts of mounting evidence of Trump declaring that Musk actually was at least the de facto administrator, including, “[m]ost notably,” Trump having publicly stated that, “I signed an order creating the Department of Government Efficiency and put a man named Elon Musk in charge.” [p.35-36]. Ultimately to the court it looked like Musk has a “continuing position” for purposes of the Appointments Clause, and thus the claim that his presence in it was unlawful was likely to succeed. [p.36].

The other claim brought against Musk and DOGE was for violating the separation of powers, which arises when the authorities of the Executive Branch encroach upon those of the Legislative. [p.37]. The gist of the plaintiffs’ argument is that when Musk and DOGE acted to eliminate USAID, a federal agency created by statute and where the ability to undo it has been reserved for Congress can undo it, it unlawfully “usurped Congress’s authority to create and abolish offices.” [p.37].

The decision spent a few pages discussing how Musk and DOGE went about effecting USAID’s shutdown. [p.37-40]. It then turned to the contours of such a separation of powers claim, guided largely by the earlier precedent from Youngstown Sheet & Tube Co. v. Sawyer:

To act within its authority, the President or the Executive Branch much act based on authority that “stem[s] either from an act of Congress or from the Constitution itself.” Courts apply a tripartite framework, originally set forth in Justice Robert Jackson’s concurrence in Youngstown, to assess whether an executive action runs afoul of the Separation of Powers. First, “[w]hen the President acts pursuant to an express or implied authorization of Congress, his authority is at its maximum, for it includes all that he possesses in his own right plus all that Congress can delegate.” Second, if Congress is silent and neither grants nor denies authority, the President must rely only on the President’s independent powers as established by the Constitution and possibly based on authority existing in “a zone of twilight” in which there may be “concurrent authority” with Congress. Finally, if the President “takes measures incompatible with the express or implied will of Congress,” then the President’s “power is at its lowest ebb” and the President may “rely only upon his own constitutional powers minus any constitutional powers of Congress over the matter.” [p.41]

The court then looked at whether Congress had expressly or impliedly authorized the dismantling and elimination of USAID and found none. [p.42-44].

Where Congress has consistently reserved for itself the power to create and abolish federal agencies, specifically established USAID as an agency by statute, and has not previously permitted actions taken toward a reorganization or elimination of the agency without first providing a detailed justification to Congress, Defendants’ actions taken to abolish or dismantle USAID are “incompatible with the express or implied will of Congress. Accordingly, the third Youngstown category applies, and the President’s “power is at its lowest ebb.” [p.46-47].

After that it considered whether the President’s own Article II authority let him end the agency anyway. The government essentially argued that because he has foreign affairs powers, and USAID involved foreign policy, he could do what he wanted to the agency without being second-guessed by the courts. [p.47-48]. But the court wasn’t having it, noting how much the closure “relate[s] largely to the structure of and resources made available to a federal agency, not to the direct conduct of foreign policy or engagement with foreign governments.” [p.48]. If the government’s theory was correct then “the President would have unilateral control over all aspects of the State Department and could even abolish it as a matter of the foreign policy power.” [p.49].

As a backup argument the government also insisted that the President’s interest in “avoid[ing] waste, fraud, and abuse” was an expression of his Article II power to ensure that the laws are faithfully executed, but the court didn’t buy this argument either. While “it may justify the termination or placement on leave of certain employees […] when, however, the Executive Branch takes actions in support of the stated intent to abolish an agency, such as permanently closing the agency headquarters and engaging in mass terminations of personnel and contractors, those actions conflict with Congress’s constitutional authority to prescribe if and how an agency shall exist in form and function.” [p.49]. The decision then spent a few pages backing up that conclusion further. [p.50-53].

Irreparable harm

To show irreparable harm, the plaintiffs needed to show two things: that an award of money damages later wouldn’t adequately compensate for the harm, and that the claim that the plaintiff will suffer harm is “neither remote nor speculative, but actual and imminent.” [p.53]. But then an issue arises: While the denial of a constitutional right constitutes irreparable harm, that rule holds only in “cases involving individual rights and not the allocation of powers among the branches of government.” [p.54].

The court addressed this issue by returning to the plight of the individual plaintiffs, as discussed (both here and in the decision) in the background section, because it is in their lives that the effects of this unconstitutional exercise of power will be felt, including as to their physical security. [p.55-56]. While some of the other litigation challenging the recent harm to USAID has resulted in some mitigation of the harms Musk and DOGE created through their interference with the agency, the court here generally found the relief incomplete, with the likelihood of harm remaining. [p.56].

Notably the court also found another source of harm: Musk’s disparaging public statements about USAID. While generally a claim that an employee’s reputation would be damaged as a result of an adverse employment action does not establish irreparable harm for injunction purposes, “cases may arise in which the circumstances surrounding an employee’s discharge, together with the resultant effect on the employee, may so far depart from the normal situation that irreparable injury might be found.” [p.56-57]. And the court here found that this case presented such circumstances.

Defendants’ public statements regarding the reasons for the actions relating to USAID go far beyond the ordinary. On February 2, 2025, as USAID headquarters was being shut down, Musk stated on X that USAID is “evil” and in another post that has been viewed at least 33.2 million times, that “USAID is a criminal organization.” The next day, Musk also publicly stated in a lengthy discussion on X that USAID was not “an apple with a worm in it” but was instead “just a ball of worms” that is “hopeless” and “beyond repair” to the point that “you’ve got to basically get rid of the whole thing.” Where such a prominent member of the Executive Branch has publicly described Plaintiffs’ place of employment in these ways on such a large media platform, and in a way that effectively characterizes it not as an agency in which certain individuals have engaged in misconduct but as a criminal enterprise from top to bottom, the likely harm to the reputation of personnel who worked there is of a different order of magnitude, because these statements naturally cast doubt on the integrity of those who worked there. [p.57-58]

On top of this reputational harm, which the court also found to be non-speculative, [p.58-59], there was also the harm related to the disclosure of sensitive personal information, which can constitute irreparable harm. [p.59].

Here, there are specific reasons to be concerned about the potential public disclosure of personal, sensitive, or classified information. First, as described above, the DOGE Team Members took extreme measures to gain access to classified information, including in SCIFs, when there was no identified need to do so and, as confirmed by J. Doe 11, at least some of them lacked security clearances. These measures included threatening to call the U.S. Marshals and then placing security personnel on administrative leave for attempting to enforce restrictions relating to classified material. Relatedly, J. Doe 2, a USAID employee on administrative leave with responsibilities relating to cybersecurity and privacy, has reported that DOGE Team Members without security clearances used their root access to USAID’s systems to “grant themselves access to restricted areas requiring security clearance.” [p.60]

The court then noted that sensitive personal information seems to have already been leaked, and that “disclosure of personal information is of greater concern where some Plaintiffs, such as J. Doe 1, are or have previously been posted overseas in high-risk areas and have expressed concern about ‘highly sensitive personal information’ such as ‘foreign contacts’ and ‘a safety pass phrase’ being released from personnel and security clearance files.” [p.60-61]. Given that DOGE has already “displayed an extremely troubling lack of respect for security clearance requirements and agency rules relating to access to sensitive data,” that they’ve already leaked some unredacted, and the exigent sensitivity of other data now in their possession, the court found a likelihood of irreparable harm if DOGE was not enjoined. [p.61].

Balance of the equities and the public interest

The remaining requirements for a preliminary injunction are that the balance of equities tips in favor of the plaintiffs, and that the injunction is in the public interest, which, in cases involving the government, merge into one factor. [p.61]. Basically, because an injunction changes something, this factor addresses whether that change is in the public’s interest. Here, though, because no one is harmed by a preliminary injunction that enjoins activity likely to be unconstitutional, and it looks like what Musk and DOGE did violated both the Appointments Clause and Separation of Powers, the court found these factors tipped in favor of the plaintiffs. [p.61-62].

The court further found that this factor tipped in favor of the plaintiffs because “the public interest is specifically harmed by Defendants’ actions, which have usurped the authority of the public’s elected representatives in Congress to make decisions on whether, when, and how to eliminate a federal government agency,” and because these likely unconstitutional actions have already put plaintiffs in physical jeopardy. [p.62]. The injunction also wouldn’t stop USAID from operating as an agency—it wasn’t even a party to the suit able to be enjoined anyway. The only activities being enjoined were the ones that are likely unlawful. Even Musk and DOGE could still “conduct assigned work pursuant to the various executive orders that complies with the Constitution and federal law.” [p.62].

Remedy

Having decided to issue the injunction, the next question for the court was what it should say. It decided to award some but not all of the relief the plaintiffs had requested. [p.64]. First, it addressed DOGE’s IT-related activities, requiring it to reestablish all email, payment, and other systems to their functional states. [p.64]. While the injunction doesn’t preclude all DOGE members from accessing agency information, in part because it couldn’t police if individual DOGE staff might be government employees entitled to the access in some other way, it did enjoin them “from any disclosure outside the agency of PII or other personal information of USAID employees or PSCs,” including on the DOGE website. [p.64]. Where some legally required disclosures had to be made, only USAID personnel unaffiliated with Defendants could make them. [p.64].

And the court decided that this injunction applied to how DOGE’s activities interfaced with all USAID personnel, and not just the plaintiffs themselves.

“[W]here the parties have been unable to identify a means by which individualized relief could be provided without jeopardizing Plaintiffs’ anonymity, and the record already contains multiple examples of USAID personnel who were placed on administrative leave or otherwise sanctioned for objecting to Defendants’ actions, the Court finds that applying these requirements to all current USAID employees and PSCs, including those on administrative leave, is necessary to provide full relief to plaintiffs.” [p.64-65]

One thing the court did not do was order the revocation of all the “mass personnel and contract terminations [that] are part of the ongoing dismantling of USAID that likely violates” the Constitution. It did not so abstain because the terminations are legitimate, but because it is currently unclear whether they had been issued entirely under the auspices of DOGE, or if they had been effectively ratified by legitimate USAID leadership. While such ratification wouldn’t necessarily make the layoffs lawful—see the discussion earlier in the decision about how USAID cannot be dismantled without authorization from Congress (as well as other cases involving other agencies addressing how agencies generally cannot be closed without Congress allowing it)—USAID itself was not party to this litigation and thus couldn’t be bound by such an order. Also the record on this question, about who caused the layoffs, is currently unsettled. [p.65].

At this point, however, DOGE cannot do more to affect the shutdown of the agency, including with respect to further firings, building closures, or records deletion. [p.65-66]. DOGE also must, within 14 days of the order, reopen USAID headquarters. However, if in that time period someone with appropriate authority ratifies the decision to close it, this requirement will be stayed. [p.66-67]. But no other portion of the injunction was stayed pending appeal—it is in effect now. [p.67]

Some implications

Several things have already happened since this decision was issued just a few days ago. For one, the government has appealed. At the time of this writing no stay has been issued, but given how it has acted in other cases it seems likely that the government will seek one to delay the injunction while it appeals.

The government also tried to play musical chairs with DOGE and agency appointees in an apparent effort to circumvent the injunction. Because this lawsuit only challenged DOGE, and not the lawfulness of anything the agency did on its own accord, it left open the possibility that the agency might have done DOGE things under its own auspices, or yet still be able to, like with respect to the building closure. These actions—like terminating contracts or the bulk of its workforce—might still be illegal, but, as discussed above, it will be for another case (or at least later in this litigation) to decide. Per where things are with respect to this case, USAID personnel lawfully in their roles can still run the agency and are presumed to be running it lawfully, while, with this decision, DOGE personnel are not. So the government tried a “foil a judicial order with this one quick trick!” maneuver by converting a DOGE staffer into a USAID official, which would then seem to grant him the mantle of legitimacy to effectively ratify his team’s own illegitimate actions. But it didn’t work: the court subsequently reiterated that the injunction kept anyone connected with DOGE out of the agency’s business.

But it does seem clear that there is no meaningful separation between Musk/DOGE and any appointed officials running USAID, and that these appointees are essentially window-dressing there only to allow the unlawful force of Musk and DOGE to take over. These decisions by the agency officials to let Musk and DOGe to control so much themselves should therefore be found unlawful, and courts are starting to get there, recognizing in other cases how the APA in particular finds such abdication of good judgment, to let DOGE loose, to be arbitrary and capricious. The USAID cases challenging the firings and contract terminations have so far had more trouble using this argument to get usable injunctive relief, although some of that difficulty may be due to them being some of the earliest challenges brought against any of what Musk and DOGE have wrought. The more DOGE has done, the more subsequent courts have noticed, and although courts are generally still being extremely conservative in how they have been ruling against the government, each time one does it does seem to make the next such decision against the government more likely. But the reasoning of each decision still matters and how courts find standing, or likelihood of irreparable harm, is still is very specific to the particulars of each case, including the agency or agencies involved, what actions are being challenged, with what claim(s), by whom, and against whom.

Still, we’ve not really had a chance yet to see too many decisions be made in the wake of this one. Having a court at last officially credit the arguments that any DOGE authority was falsely claimed seems an important point to be reached and one that hopefully should affect all the other cases, including in the ones requesting injunctive relief against what DOGE is doing. Right now all these cases are being litigated against the big black box of the government, and having to swim upstream against a lot of doctrine that says the government is due a lot of deference in how it uses its own constitutionally-appointed power—and perhaps rightly so, if that constitutionally-appointed power is to not be unduly obstructed.

But with DOGE it is different because what it is finally starting to dawn is the judicial recognition that DOGE has no constitutionally-appointed governmental power. And these cases, rather than being about the government using its lawful power badly, as most constitutional challenges are, are really about an unlawful power being allowed to do anything. Injunctive relief should therefore be much more readily available because it is not relief from government power itself that is being sought but by a separate, unentitled, invading power running around and causing immense and exigent harm.

Read the whole story
jsled
10 days ago
reply
South Burlington, Vermont
Share this story
Delete

Most Men Don't Want to Be Heroes (and That's Okay)

1 Share
Most Men Don't Want to Be Heroes (and That's Okay)

We are continually being asked to feel sorry for men, to understand that there is some significant sense in which we men are being poorly served by a liberal society. Exactly how is usually left undefined. It's taken for granted that we’re being ignored, disrespected, or ‘left behind’. When a specific grievance is asserted, it's transparently false. In both cases, I think the grievance-merchants are relying on us to ‘connect the dots’ and pick up on something not quite being said. 

Take Chris Arnade’s recent article in the Free Press, arguing that men—all men—need to be heroic, or at least to be seen as heroes. Modern liberal society is apparently hostile to this and won’t give us these opportunities. In doing so, it deprives us of some innate drive, making us unhappy and unfulfilled. 

I personally find these ‘think of the poor men’ pieces condescending. They don’t understand my life and I don’t like their claim to speak for me. This one especially so, and that’s no accident. Chris Arnade’s entire project is a sort of voyeuristic ventriloquism: gawping at, and speaking for, people who he imagines can’t speak for themselves. His origin story is, after a long career in finance, he got a buyout and was able to retire early. With his newfound freedom, he took up a hobby of photographing poor people (no, really) in ways many have criticised as exploitative and demeaning. Because our mainstream media are unfathomably stupid, he swiftly gained publication and recognition. His writing follows in the same suit—reporting on the ‘forgotten’ people of America in the manner (and one suspects with the factual accuracy) of a Victorian anthropologist lecturing on a tribe of noble savages he encountered. 

Naturally, Arnade is a big proponent of the poverty narrative—people vote Trump because of economic desperation and cultural disrespect. And this Free Press article seems to move towards the masculinity narrative—effete liberalism is pushing men right. I’ve said my piece on both of those. I want to put the political implications to one side, and focus on the core argument.

What to me seems wrong—and obviously wrong—are the two claims Arnade makes in his title and byline: That “all men” need to be heroes to be happy and fulfilled, and that the opportunity to do so is somehow being denied them.

To be a man

When I was much younger, I saved someone from drowning. They had (possibly while intoxicated) gone into a rough and choppy sea, at night, and were struggling to stay above water. Worse, the tide was pulling them out. I went in after them and, with some effort, brought them back to shore. As we got close, an older man I did not know also came in to help and, between the two of us, we dragged them out. Exhausted and freezing cold, but safe. 

It might surprise someone like Arnade to learn that this has not proved an especially important moment in my life. I’m glad I did it. I received profuse thanks from the person in question and general plaudits from my peers (which Arnade imagines all young men need). And then, well… life moves on. Other things happen to you. It’s not something that’s provided any great moral lesson for me. Nor is it important to my sense of identity—this is the first time I’ve mentioned it publicly, not out of humility; I honestly just don’t really think about it. 

I’ve also provided support to people in less dramatic, more long-term, more female-coded ways. For instance, assisting a loved one through a disability. Or being, with my family, a carer for a close relative with Alzheimer's. There is absolutely no doubt the latter have given more meaning to me, developed my character more, and have strengthened my relationships with others in a way more traditional ‘heroics’ couldn’t.

Providing long-term care for someone is an endless series of small decisions to prioritize the other person, most in themselves trivial and quickly forgotten. Rather than one moment in which you have to master yourself, you have to decide to continually live that value. And it improves you. It will teach to be kind, it will teach you how to care about someone in a way that taking a one-time risk won’t. You will feel frustration with people for things that are not their fault and have to move past that. You’ll then feel guilt—often quite profound guilt—for having felt that frustration. You will learn—and you will be forced to learn—how to forgive others and yourself. All of this will be mixed with moments of real joy and real connection. I can’t speak for everyone, but these have been among the most important parts of my life.

On a societal level, if there is a crisis of acts of service not being recognised it is of this latter, female-coded, kind. Despite Arnade’s claim that heroics are now (somehow) looked down on, whenever I’ve done something (even something quite minor) that fits this male-coded frame, I’ve received praise and recognition. In Arnade’s own story—which he takes as an exemplar of his thesis—the ‘hero’ (who retrieved a drunk from a locked bathroom) was bought drinks and made to feel good about his actions. (“he strutted around like the cat’s meow.”)  In contrast, looking after a relative in cognitive decline can be very isolating. Despite it being the much more common experience, many carers feel  profoundly alone. Finally, as societies age, more and more of us are going to need to fill this role.

There is not the same structural need for an army of men pulling people out of locked bathrooms or choppy seas. That’s not the point, Arnade might say—men need that, and without it we’ll be forlorn, miserable, useless mopes. But will we? For most of us, a true emergency rescue moment might happen once or twice throughout your life. You want to meet the moment, but I think it will be challenging to build a stable identity around. 

How could you? Take my case: I was happy enough to be given credit, but am I going to tell that story in every interaction for evermore? Am I going to sit, day after day, meditating in satisfaction on my ‘hero moment’? Can you imagine a more insufferable prat?

And most men who want this to be their personality don’t even have that. They live in anticipation of one.  Consider gun nuts who define themselves by making themselves ‘ready for the moment’. With a grim predictability, study after study shows they are far, far more likely to use their beloved firearms to end their own lives than to stop a ‘bad guy with a gun’.

Professional heroics 

To find real meaning and fulfillment in heroism, I think you’d have to do more of it. For most of us, this sort of thing might happen once or twice in your life. You’ll make a—likely poorly informed and impulsive—decision. Hopefully everything works out. And then the world will move on. There are, however, plenty of vocations which involve ‘heroic’ acts. 

The opportunity to pursue these (or, for that matter, for the rest of us to behave commendably in a rare emergency) is not something being ‘taken away’ from men. Indeed, it's difficult to even understand why Arnade thinks this. As mentioned, in his own specifically selected anecdote, the man is both allowed to be ‘heroic’ and praised for it. The only evidence he offers is this article, which he characterises as arguing “the ancient hero archetype is corrosive, bad, and unnecessary—an outdated concept of masculinity, which promotes imperialism.” 

The first issue here is that’s not what the linked article says. It’s an examination of how men like Jordan Peterson and Elon Musk love to cite ancient heroic poetry, but largely misunderstand it. However I can easily see how Arnade would read it as saying something else: I imagine he gave it a quick scan and his priors kicked in—academic liberal author, ‘front of the class kid’, bet he ‘sneers’ at real men. It’s saying something about the hero, about masculinity. Must be against it. Arrogantly so. 

His whole article is framed this way. The smug pseudo-knowledge of liberal intellectuals is contrasted with the real world wisdom he gleaned from uncouth mouths in a trashy dive bar (it's even more explicit in the Substack post on which the article is based—“one of the divey-est dive bars in the US, with a collection of intoxicated, high, and strung out customers”). This is, in the words of social science, a dubious social epistemology. By its standards, Arnade should defer to my perspective: I’d bet money I’ve spent more time in working class American dive bars as he has. And I was there as a customer, not an ex-banker on a poor person safari. 

I suspect however that what Arnade is channeling isn’t a deeper meaning he’s deduced from proximity to the poor, but a narrative pushed by conservative writers he reads. Namely, that liberalism is a feminising ideology. That ‘back in the day’ men might go to war and be rewarded with social standing and an obedient woman. That  now society has no use for men. We cannot prove ourselves this way, and must work meaningless feminised office jobs that suck the life out of us and quash our masculine urges. 

As always with conservatism, it’s not immediately clear what day ‘back in the day’ was. But we can perhaps start with some of their models of masculinity: They love the image of the Spartan warrior—the iconic helmet, or even just the name ‘Spartan’, appears on memes, fitness routines, team logos, and ‘trad’ accounts. The European knight is likewise a common symbol of lost manhood. The thing is, both those figures sat atop rigid hereditary caste systems. Something like 90% of Sparta were helots (slaves); only 2-3% were the famed warriors. One medieval knight required an economic base of around 300 tenants or serfs (semi-free agricultural workers) to support them. Even in Republican Rome, which recruited much further down the social ladder (and won wars because of it), limited conscription to landowners. Non-property owners, the urban poor, and of course slaves were excluded. 

Sparta, Rome, or the age of the Crusades were not a better time to be a man—even if your only criterion is ‘gives opportunities to be a hero in battle’. The overwhelming odds are you would be working to support a warrior aristocrat, not be one. Also, if our concern is men being disrespected, consider that workers supporting the aristocrats were usually defenceless against humiliation or abuse from them. You have to get into the modern age before mass conscription allows most men to ‘prove themselves’ in war. Even then, race and class discrimination might limit how you could participate. Finally, if I were to choose an era to prove my manhood in battle, I would choose one which had antibiotics and surgery with anesthetic.  

Some far-right commentators—for instance the Bronze Age Pervert—are quite open that this is not a problem for them: their project is about a few exceptional men, not the peons who support them (much less women). Those who might be tempted by this worldview should realize that the commentators pushing it do not expect you to be one of the masculine elect. You will be toiling so they can larp as heroic warriors.  Arnade doesn’t go that far, indeed that bullet-biting isn’t available to him: he claims “all men” need to be heroes. Applying that standard honestly, he should come to the conclusion that past societies were much, much worse for his ‘forgotten’ men. 

Today, virtually anyone can become a soldier, or police officer, or firefighter (and be well compensated in pay and social status).  Liberalism's critics are forever claiming it's ‘taken away’ things there’s never been more universal access to. ‘Traditional marriage’ is not being taken from you. Anyone can still do that. You want to be a ‘trad’ with a stay at home wife and lots of kids? Millions of people do, plenty of women still want that role, and broad economic prosperity makes it easier, not harder. Want masculine hobbies? No one is stopping you. Just want to grill? Meat has never been more accessible for the common man!

Liberalism: good for men too!

The revealed preference however is that most men don’t want these vocations. The Army aggressively recruits; really any young man at any point can join. The vast majority don't. Men fantasise about being in combat scenarios but, by and large, don’t seek them out. This is one reason why conservatives (and fellow travelers like Arnade) hate liberalism so much: free choice disproves their biological essentialism. Rather than abandon the narrative (all men gravitate towards certain roles), they abandon reality. Men aren’t choosing non-’heroic’ roles, liberalism is (somehow) stopping them. 

Most men don’t want to be heroes and that’s fine. Arnade doesn't know what’s in your head and, despite his ‘listening to the common man’ schtick, he doesn’t care. He has his narrative, informed by elite conservative writers, and goes into the world reading it into his interactions with poor people. Even when, as we have seen, those interactions flatly contradict it. He has his perception of liberalism and, as we have seen, that’s what he’ll read us as saying—regardless of what we actually do say. 

Liberal freedom isn’t just about finding the life that best suits you, it’s a grand experiment in us all finding the best ways we can care for one another. Arnade characterizes modern liberalism as favouring “the individual over the community”. This is wrong; liberalism values both. Its canonical texts—On Liberty for instance—are self-consciously about finding a balance between the two. In recent times, the great push for unconstrained individualism, the tearing down of structures of communal aid, and the bleak insistence that “there’s no such thing as society” have come from the political right. Liberalism has compromised too much with this vision of men reduced to want fulfillment, but never let go of the insight that freedom, choice, and pluralism are both better for individuals and better for society.

I think it’s better for men to be able to choose the ways we do things for others. I know several men my age who take on equal, or even primary, parenting responsibilities. They do so, not because a feminised society has forced this on them, but because they enjoy doing it. They love their kids, these relationships give their lives meaning, and it makes them better people. Past societies might have discouraged, or even prohibited men from taking on this role. Now they can. And I think they will be better at it precisely because they have chosen it, as J. S. Mill puts it:

In proportion to the development of his individuality, each person becomes more valuable to himself, and is therefore capable of being more valuable to others. There is a greater fulness of life about his own existence, and when there is more life in the units there is more in the mass which is composed of them.

Arnade imagines that this sort of freedom leaves men frightened and confused. That we are simple and stupid creatures who need “to play a stock character.” To again quote Mill, that we should fit ourselves into “the small number of moulds which society provides in order to save its members the trouble of forming their own character.” I can’t help but find this condescending. If you want to play a stereotype, fine, no one is stopping you. But note that Arnade only makes this claim about other men, never himself. Does he find himself unable to choose his life path, unable to make decisions, unable even to know how to present himself to others, without a stronger guiding hand from society? He never says, but one suspects not. 

So what’s this actually about?

With all that said, it must be noted that articles on male angst clearly have resonance. Why? For one thing, the people in every historic period and type of society have felt angsty. It's just something people do. And articles like Arnade’s give an easy answer. To be clear, there’s nothing wrong with feeling insecure or sad. If there’s one thing the ‘think of the men’ articles get right it's that men’s mental health is underdiscussed and stigmatized—though they rarely provide useful solutions.

If there is a significant societal change that these articles are responding to, it’s not a matter of men losing something—we demonstrably haven't—it's about women gaining something. They can now make their own way in the world. Women can be heroes too!  We men can still earn respect in all sorts of ways, but are no longer granted it simply by virtue of being men. People like always having someone beneath them. That, I think, is what the ‘male malaise’ genre is, at its core, about. 

When we’re asked to consider the poor, left behind young men, we’re often reminded that girls are now exceeding boys in most aspects of education. This is true and, on the surface, a reasonable enough thing for public policy to think about (for instance, is this gap due to different socialisation, different learning styles, etc?). Beneath the surface, I think something uglier is sometimes being said: That it is an unbearable indignity for boys to have girls ‘above’ them like this. I think what’s gone wrong for a lot of men in their lives is—though they might not admit it to themselves in these terms—they’re angry that their sisters went to college and they didn’t. Their lives have been fine, but their sisters or female school peers have been better and that feels like an injustice. One they’ve not been able to let go of. Their fathers had to reconcile themselves to female peers in the workplace, most young men now will be managed by a woman at some point. When we hear about how ‘disrespected’ men feel, is it that feminists are stopping men being firefighters? Or is it that men in office jobs are being told what to do by a woman? 

There’s an urge in liberalism to debate the highest version of the opposing argument. Steelman, not strawman. That’s valid and useful, but we shouldn't let it get in the way of plainly understanding what the manosphere is complaining about. These are not tortured souls, reading Homer in a world of hyper-feminists who no longer care for it. They’re useless miserable prats who wish they had the courage to call their female boss a b**** and live vicariously through Trump because they imagine he’d do that. 

Liberalism does not force us from the communal to the individual. I doubt helots had a great sense of community with the Spartan overlords who hunted them for sport. Nor does it force men from male-coded forms of community service to female-coded ones. It gives us the choice. We have never before had more ability to develop ourselves as fathers, as children caring for parents, or in who or how we date. We’ve also never before had more opportunity to be traditionally masculine—to make a vocation of the army, or pulling people from burning buildings, or competing in professional sports. Despite the silly and shallow self-pity of some, there has never been a better time to be  a man. 

The ‘cost’ for all this is we have to give up having women as automatic social inferiors. That should be an easy trade to make.  


Featured image is פסל אכילס הגוסס לאחר שנפגע בחץ בעקבו by י.ש., CC BY-SA 3.0

Read the whole story
jsled
14 days ago
reply
South Burlington, Vermont
Share this story
Delete
Next Page of Stories