Episode Transcript
This is a computer-generated transcript. While our team has reviewed it, there may be errors.
Morgan Sung: This is The Purge… Of government websites. Since President Trump’s inauguration, federal agency and military websites have been wiped. Some are gone completely, while others have been overhauled to remove any references to so-called woke terminology, all in this effort to comply with an executive order to end diversity, equity, and inclusion programs in the federal government.
And it seems like in the rush to remove all of these woke words, there were maybe some unintentional cuts, like when the Department of Defense took down a 1940s photo. It was a picture of a pilot, posing in front of one of the planes that dropped the atomic bombs over Hiroshima and Nagasaki. But what does this photo have to do with diversity, equity, or inclusion? Well, the plane was named the Enola Gay. It was named after the pilot’s mother, Enola Gay Tibbitts. There are still other photos of the Enola Gay available on government websites, but in the past few months, countless pages with crucial information have been wiped from the internet.
Fortunately, for journalists, historians, and anyone who cares about keeping track of facts, there’s a tool that lets us go back and see exactly how those websites have changed. Unfortunately, that very tool is under threat.
This is Close All Tabs. I’m Morgan Sung, tech journalist and your chronically online friend, here to open as many browser tabs as it takes to help you understand how the digital world affects our real lives. Let’s get into it.
Close All Tabs Senior Editor Chris Egusa is gonna walk us through this magical tool called the Wayback Machine.
Chris Egusa: Hey, Morgan.
Morgan Sung: So I already have a tab open. It’s the current version of the State Department’s safety tips for queer people traveling abroad. And you have your own tab.
Chris Egusa: So it’s the same web page, same URL, but I have gone back in time, kind of. That’s where the Wayback Machine comes in. It’s part of this organization called the Internet Archive. And for the past 30 years, it’s basically scraped the internet page by page and archived it. So if you have a URL or link to a website, you can go back and see all the ways that that website has changed. I used the Wayback Machine to look at the page from January 5th, before this executive order. At the top of the page, it’s addressed to LGBTQI plus travelers.
Morgan Sung: Unlike the one I’m seeing, the one that’s currently live, which just says, L-G-B Travelers. What else does your version have?
Chris Egusa: So it has a lot of resources. It has instructions for changing your passport’s gender marker, warnings about conversion therapy practices in other countries, and also links to the National Center for Transgender Equality and other organizations.
Morgan Sung: Yeah, this current one I have on my screen doesn’t have any of that, just no warnings about conversion therapy and definitely none of those resources for trans people. There’s actually no mention of trans people at all. They kept a link to the Trevor Project, but they made a point to say that it’s an organization for LGB youth.
Chris Egusa: Which is wild, because the Trevor Project is very involved in advocating for trans youth and gender-affirming care.
Morgan Sung: Yeah, and again, this is just one page out of who knows how many that have been altered to take the T out of LGBT. How panicked should we be about the scale of erasure of public information?
Chris Egusa: Well, this has happened before. During Trump’s first term, pages about climate change and the environment were altered to soften the language, or they were just wiped entirely.
Morgan Sung: But that purge wasn’t nearly as expansive or haphazard as the one we’re currently living through, right?
Chris Egusa: No, but I think it has prepared a lot of us for a situation like this. This time around, a lot more people are relying on the Wayback Machine and the Internet Archive.
Morgan Sung: Okay, and in all of this mess, the existence of the internet archive itself is under threat, which could spell trouble for the future of all online libraries. So to get a better understanding of it all, Chris, you went to the archive in person a few weeks ago. Let’s start there. Let’s make that our first tab. What is the Internet Archive? You know, when I think of the Internet Archive, I’m thinking of like the Matrix, Cyberchase, when they’re like running through this kind of like cloud and there are just binary numbers everywhere. But the Internet Archive is a real physical location.
Chris Egusa: Yeah, no, it’s a real place. It is not in the Matrix. It is in San Francisco in the Richmond district. It’s in this very grand building and out front it has these huge Greek columns that kind of line the entrance of it. They actually chose the building in part because it resembles the archive’s logo, which is the columns of the Library of Alexandria.
Morgan Sung: The Library of Alexandria. I mean, that is like the Greek idea of a universal library. That’s a pretty lofty idea to aspire to.
Chris Egusa: It is. And the organization isn’t shy about their ambitions. Their stated mission is to provide, quote, universal access to all knowledge. According to their website, the archive currently contains, and I’m just gonna reel off a bunch of numbers here, 835 billion webpages. I think last I checked, that number’s actually close to a trillion. 44 million books and texts, 15 million audio recordings, 10.6 million videos, and on and on.
Morgan Sung: Wow.
Chris Egusa: It’s a lot, like, I can’t even quite conceptualize what a trillion web pages even looks like.
Morgan Sung: Okay, Chris, tell me, what was the archive actually like?
Chris Egusa: It was actually really cool. Brewster Kahle is the founder of the archive and he was really excited to give me a tour and introduce me to all kinds of old media devices they’d collected over the years.
Brewster Kahle: Edison invented these cylinders in 1880. Mostly, you know, things that you’ve never heard.
Chris Egusa: So yeah, right when we got there, we go up some stairs and he shows me this very vintage, beautiful old gramophone.
Brewster Kahle: It’s a Victor Talking Machine 5 from 1927. So it’s a old 78 RPM player, no electricity, it’s a crank, has a horn, so spinning up.
Song: Mairzy doats and dozy doats and liddle lamzy divey / A kiddley divey too, wouldn’t you?
Chris Egusa: I don’t know if you recognize that song, but it immediately made me think of Twin Peaks.
Morgan Sung: Yes! Yes, it does. I am right in the middle of my rewatch right now, so…
Chris Egusa: I need to do a re-watch in David Lynch’s honor, for sure. Yeah, and then when he started playing this, he then started dancing around the room, which is just like-.
Morgan Sung: Okay, Audrey Horne.
Chris Egusa: Yeah, yeah.
Song: A little bit jumbled and jivey
Chris Egusa: So after visiting this little museum area, Brewster takes me into this huge room and it has this beautiful domed ceiling. It used to be a Christian Science Church. The whole building did. There are all these lines of pews that are facing where the pulpit used to be. And in place of that pulpit, there’s this huge projector screen. And so they use this place for like movie screenings, for local community events and things. But there is one thing that draws your eye more than anything else in the room. That is the statues.
Morgan Sung: Statues?
Chris Egusa: Yeah, they are these hundreds of terracotta figures. They are about waist high, and each one of them has distinct features in clothing. And they’re all kind of facing forward, like they’re congregants in this hall of worship. Haunting. So it’s a bit of an eerie scene, but according to Brewster…
Brewster Kahle: If you work for the archive for three years, then we make a little statue. Basically tributes to the people that made the organization happen.
Morgan Sung: So not like framed photos, just a three foot statue.
Chris Egusa: Yes, exactly. And they’re really detailed. Another employee was with us on the tour. His name is Chris Freeland.
Chris Freeland: It’s weird to be standing here in front of a terracotta statue of yourself, but here we are. It does look like me, and that’s also uncanny. Everyone says they got the beard right, which is making me sad, since it is covered in gray and no longer brown, or red-brown, like it was when I was, you know, 20 years younger, but here we are.
Chris Egusa: And all these statues are standing in the pews and around the outside of the room, facing the front, sort of at attention.
Morgan Sung: You know what this reminds me of? It’s that terracotta army that’s like protecting the tomb of like the first emperor of China. And they’re meant to like protect the emperor in the afterlife. And I guess it’s fitting because these statues look like they’re, I don’t know, protecting the internet ephemera long after it’s gone.
Chris Egusa: That’s actually very appropriate because behind all of these statues in the very back of the room is where the servers live. Um, he tells me that they hold 145 petabytes of data, which I don’t deal in petabytes, it’s the one after terabyte. Um, so it’s a lot. And yeah, these are the servers where all of the billions of web pages, videos, and audio, where all of it lives. There’s a cool moment actually, like when you stare at these devices, you see all of these twinkling blue lights flashing and flickering across them, um, like hundreds a second. And here is what Brewster told me about that.
Brewster Kahle: Every time a light blinks is somebody uploading or downloading something from the Internet Archive. I think that the technology reflects the people that make it, so let’s make it beautiful.
Morgan Sung: That sounds almost magical. I mean, I get how this could be like a religious experience.
Chris Egusa: Yes, and I almost compare it to the way that old cathedrals were meant to evoke that sense of awe. It was hard not to feel a little awestruck being in that place.
Brewster Kahle: These servers hold some non-trivial percentage of all of the published works of humankind.
Morgan Sung: Okay, so tell me about Brewster, the guy who founded this whole thing, started this statue army and this cathedral of servers.
Chris Egusa: So, as you can already tell, Brewster is pretty eccentric and he kind of comes from that old school vision of the internet where he thinks it should be free and open. He really feels that now power is way too concentrated in a few big companies.
Brewster Kahle: Companies often don’t sell anything anymore. They just license it. Now, if you use Netflix or Spotify, you don’t even have a video library like you used to with DVDs or you don’t have MP3s on your device or records in your collections. So there’s been this shift by the large-scale publishers towards ongoing control of materials and surveillance of what it is being viewed.
Chris Egusa: They believe that most things should be open and available to the public, and especially that old things should be preserved, even old webpages. And so that’s where you get the Wayback Machine.
Brewster Kahle: We have the World Wide Web on Archive.org, available back to 1996, so you can go and find your old webpages, your old GeoCity sites, or whatever it is that you’ve done in the past. But it also is relevant to people currently. Journalists are using it a lot to find, well, what did that person say? And they’re saying something kind of different. um… and they said they never said that well no we don’t know we found that in the television news archive and you can search and find this on television transcripts uh… back to 2009.
Chris Egusa: And you can tell that Brewster is old school because he uses the phrase worldwide web, which I love.
Morgan Sung: So vintage of him.
Chris Egusa: Yeah, it is kind of miraculous. I do feel that you can go back and look at a website like that and see exactly what it looked like, but there’s no other place that that exists.
Morgan Sung: Okay, so clearly the Internet Archive is this incredible resource. Is it true that it might shut down?
Chris Egusa: Well, let’s talk about it.
Morgan Sung: Yeah, let’s open a new tab on that, but right after this break. Okay, new tab, Internet Archive Lawsuits. So let’s talk about these lawsuits that the Internet Archive is facing. They’re not about the Wayback Machine or the webpage archiving, right? They’re about a totally different part of the archive’s operations.
Chris Egusa: So the Internet Archive also has these huge operations where they preserve old physical media. In some cases, the stuff they’re preserving is very clearly public material that is for public access. Like they have this program called Democracy’s Library, where they go and digitize all the print records for all kinds of government agencies. But they also digitize things like books and music, and sometimes that includes copyrighted material. So there are two specific lawsuits at the center of this. The first was a case called Hachette v. Internet Archive, and that was brought against the Internet Archive by book publishers. They objected to the Archive’s practice of digitizing books and lending them out digitally, even though many of them were out of print.
Morgan Sung: Okay, but that sounds like a normal library thing.
Chris Egusa: Yeah, it kind of is, though there is a wrinkle to it because of this program they did in 2020 during the pandemic lockdowns. So before, the archive operated kind of how libraries normally do. They have a certain number of licenses, you can check each book out, but during this time it became unlimited access for anyone. Though I will say that Brewster and his team strongly dispute the idea that the case was about this pandemic era program at all. They say the lawsuit had been planned before that program ever started.
Either way, after a lengthy appeals process, the judge did rule against them, and the judgment required the Internet Archive to pay publishers an undisclosed amount. And even though the lawsuit was about like these specific 127 copyrighted works, the Archive ended up removing over 500,000 books from their digital collection, which free speech and pro-access people were very upset about because a lot of these books aren’t available anywhere else.
Morgan Sung: And that brings us to this next lawsuit. This was brought on by the music industry, two major record labels, Universal Music Group and Sony Music. What can you tell us about this case?
Chris Egusa: So this suit was brought against them in 2023 and it centers around another one of the Internet Archive’s programs. This one is called the Great 78 Project. And 78 stands for 78 RPM records. It’s a format that was super popular from the 1890s to like the late 1950s. And this program was this massive communal undertaking to digitize and preserve these very old 78s. They digitized and cataloged more than 400,000 of these recordings since starting the project in 2017. And they made those recordings available to the public to stream on their website. And the key here is how they thought about this project. They felt like they were undertaking the preservation of a defunct technology and the sound of American culture in a bygone era.
Brewster Kahle: Those older materials that were sort of foundational of what did America sound like are so obsolete that we went and we circulated in the industry conferences to say, “okay, there’s going to be this project, the Great 78 Project,” and libraries and archives, a hundred different ones came together to go inform this. The industry knew about it. They were all supportive that when we talked to them, it was all great.
Morgan Sung: But the record labels saw things differently.
Chris Egusa: They definitely did. In their lawsuit, the labels called the Great 78 Project quote, “wholesale theft of generations of music.” And they claim that by making the records available to stream for free, that the Internet Archive was displacing streams that generate royalties on platforms like Spotify or Apple Music, royalties that could have gone to either the platforms themselves or the copyright holders like the artists.
Morgan Sung: Well, do they have a point?
Chris Egusa: So obviously I’m not a legal expert, but here’s what the record labels say. In the suit, they’re focusing on 4,000 specific recordings that do have copyright protections. They are commercially available and many of them are still very popular, including Bing Crosby’s White Christmas, which is the best-selling single of all time. So I think it’s gonna be tough.
Morgan Sung: And we know what the actual amount that they’re suing for is, right?
Chris Egusa: So it’s $621 million. And just to put it in perspective, the Internet Archive’s operating budget is a tiny fraction of that, just around $30 million.
Brewster Kahle: If we’re found guilty of being a library and then that will cost us, yes, it would snuff the Internet Archive. And that may be the point.
Morgan Sung: That’s pretty bleak.
Chris Egusa: Yeah it is. I will also add that regardless of where you land on, okay, was this copyright infringement or not, the details of the case strike me as kind of strange. First thing is there were a notably small number of streams per audio file in question. Um, like hundreds, or, you know, in some cases a few thousand. But not like hundreds of thousands or millions. And so, if you actually convert the number of streams to a dollar amount based on how much Spotify royalties pay, you’re generally looking at a couple of dollars per audio file. A bit more in a few cases.
So clearly publishers are not experiencing, you know, dramatic monetary loss due to these relatively small number of streams. Right? But the record companies, they still decided to sue for the maximum amount under the law, which is $150,000 per record, even though they could have sued for less. They also never even asked the Internet Archive to take the records down. They never received a request. They were just slapped with this lawsuit.
Brewster Kahle: And if we had gotten that list, we would have taken it down. And we did, once they sued it, you just give us the list and we would have taken them down.
Chris Egusa: But the other thing is, and I’m not saying that this argument will hold up in court, but like, I think about a platform like YouTube, right? YouTube gets copyrighted material uploaded to it constantly. And the way it works is that when an interested copyright holding party requests that they remove certain content, that content then gets taken down.
Brewster Kahle: I mean, that’s the way the internet basically works and those 78s are on YouTube. So it’s, so we basically have a, they’re after something else.
Chris Egusa: He thinks the publishing companies are going after the library system itself, the ability for people to access materials for free.
Brewster Kahle: The bigger picture that’s going on and the real contest is not about money, it’s actually about control. Can libraries own anything in the digital world? Is there digital ownership? That’s the central characteristic. And there’s a question, “is the United States going to have libraries have their traditional roles of buying, preserving, lending and interlibrary loan?”
Chris Egusa: But the case itself likely won’t move forward until later this year, so we’ll have to wait and see how that develops.
Morgan Sung: All right, changing gears a little bit. We are a tech show, so I feel like we’re almost contractually obligated to mention AI somehow in almost every episode that we make. But I don’t know, Chris, that feels like a new tab.
Chris Egusa: I think so.
Morgan Sung: Okay, new tab, Internet Archive and AI Legal Battles. AI companies have also been hit by big lawsuits from publishers, and you may not think of it at first, but AI companies like ChatGPT and the Internet Archive have some similarities. They both use tools to scrape the web for data and text and other content. Of course, what they do is different. The Internet Archive stores and preserves it, while AI companies use it to train their models. What’s Brewster’s take on AI?
Chris Egusa: So he’s a big proponent actually.
Brewster Kahle: We’re using the AI technologies for a bunch of what may seem like mundane tasks, but are super helpful. Like putting metadata on all these government documents.
Chris Egusa: He says that one of the big problems with a site like the Internet Archive is that there’s just so much stuff on there. Organization can be a struggle and people visiting the site can get overwhelmed. AI can make all of that easier by tagging and categorizing the billions of pieces of media they have to make them more easily findable.
Brewster Kahle: I mean, if you go to archive.org anecdotally people say, “You know you kind of arrive and it’s just huge and it’s a mass and holy crow and I don’t know where to start!” And so if we could make that on ramp easier wouldn’t that be fantastic.
Chris Egusa: And as far as the lawsuits against AI companies, he thinks that the laws are too in favor of publishers and copyright holders and that they should be relaxed to allow AI companies to operate more easily.
Brewster Kahle: We don’t have regulatory clarity. So there are now 80 lawsuits around the AI world. So it’s going to be just who has more lawyers. And that’s going to end up with just a few gigantic players.
Morgan Sung: I mean, I’m actually so surprised that he’s pro-AI. We’ve talked a lot about how AI has ushered in this era where everything is essentially editable. So yeah, for somebody who’s so preoccupied with the preservation and accurate recording of history, I was surprised that he’d be so on board with technology that seems to be like the antithesis of that in some ways.
Chris Egusa: Yeah. One thing that is clear is that the outcomes of these AI lawsuits could impact the Internet Archive because they’re both about the enforcement of copyright law.
Morgan Sung: Right, it seems like there’s this trade-off where if you want a free and accessible internet where information is free and accessible, you also have to expect it to be scrapable.
Chris Egusa: 100%. And the Internet Archive does similar kinds of scraping techniques that AI companies do, like you said. Overall, it seems that he thinks that’s a trade-off worth making.
Morgan Sung: Okay, so we have the Internet Archive, this organization that provides all these public services that the internet has become dependent on. And we also have this massive lawsuit that threatens to shut the organization down. I mean, it feels like we’re in a moment where that possibility is more concerning than ever. We have political turbulence, disinformation, these new AI technologies that are making it harder and harder to get the truth.
Chris Egusa: Wait Morgan, Morgan…
Morgan Sung: Yeah?
Chris Egusa: Do you think that’s a new tab?
Morgan Sung: Okay, you know what? You’re so right, Chris. Do you wanna do the honors?
Chris Egusa: I would love to. Let’s open a new tab. What happens if the Internet Archive goes away?
Morgan Sung: We talked about this at the top of the episode, but the Internet Archive plays such a critical role in our information ecosystem. And like Brewster says, our ability to go back and check the record. I mean, that’s what we lose when we lose the Internet Archive.
Chris Egusa: It’s such an important issue, especially right now. Brewster says that after each presidential term, they go through and catalog all of the government websites, including the ones we talked about earlier.
Brewster Kahle: We have, since the year 2004, gone and done an end of term crawl to go and record all the federal websites that we possibly can to go and download and preserve what it looked like before the change and then right away after the change. And are there changes? Yes. Are there always changes? Yes. Are there changes that you agree with? Depends on how you voted, but the idea of library is we’re there to preserve the record.
Morgan Sung: I think the logical question is, what if it does shut down? What then? Has Brewster even entertained this idea?
Chris Egusa: I think it’s hard for him to go there. Like, this is his life’s work. But he has definitely thought about the threat that looms if our ability to preserve our understanding of the past goes away. So he references George Orwell’s dystopian vision of the future from the book 1984.
Brewster Kahle: The image of the memory hole is just the idea that next to your desk is this hole that you can go and put the only copy of that newspaper in an incinerator and be able to change history is upon us. The average life of a web page is 100 days before it’s changed or deleted. If we do not actively collect them and preserve them and keep them accessible, we’re living in the memory whole universe of George Orwell.
Morgan Sung: Well, okay, is there any hope here? Is there only option to just give it and crawl into a memory hole and accept it?
Chris Egusa: I don’t think we have to accept the memory hole, and I certainly hope that we don’t. But I’ll wrap up with an observation about Brewster himself. So what struck me about him the most was he just has this unrelenting optimism. He seems to truly love what he does, and he believes in it so strongly, and he’s cultivated this team around him that really shares in that vision. So, even with the looming threat of this extinction-level lawsuit coming up from the music publishers, it’s like he can’t quite bring himself to imagine that the Internet Archive could really go away.
Brewster Kahle: Oh, I think we’re doing fine. I think that there might be pieces of the Internet Archive that are chiseled away by very powerful interests, but the idea of a library or even just the Internet Archive as an organization has got lots of support. So, can the Internet Archive go away? Yes. Would it be a bad thing, I would think so. But I think the real issues are going to be whether the legislatures and the judiciary go and side with people’s access to information in some way or another. We’ll see that play out over the next 25 years of the Internet Archive’s life.
Chris Egusa: For Brewster, it’s about the Internet Archive, of course, but it’s also so much more.
Brewster Kahle: I don’t know the words exactly, but in every librarian’s mind, those who control the past control the present, those who control the present control the future. The idea of a library is part of an ecosystem of how society remembers. That’s how it thinks of itself. If you were to erase the Internet Archive and the libraries, which is in many ways happening now, then we will live in a danger of having people be able to recast what happened. Oh, as a society that believes in universal education and the fulfillment of individual possibility, we just can’t let that happen.
Morgan Sung: So are you ready to close these tabs?
Chris Egusa: Let’s close these tabs.
Morgan Sung: Close All Tabs is a production of KQED Studios and is reported and hosted by me, Morgan Sung. Our producer is Maya Cueva. Chris Egusa is our Senior Editor. Jen Chien is KQED’s Director of Podcasts and helps edit the show. Original music and sound design by Chris Egusa. Additional music by APM. Mixing, mastering and additional sound design by Brendan Willard. Audience engagement support from Maha Sanad and Alana Walker. Katie Sprenger is our Podcast Operations Manager, and Holly Kernan is our Chief Content Officer. Support for this program comes from Birong Hu and supporters of the KQED Studios Fund.
Some members of the KQED podcast team are represented by the Screen Actors Guild, American Federation of Television and Radio Artists, San Francisco Northern California Local. Keyboard sounds were recorded on my purple and pink Dust Silver K84 wired mechanical keyboard with Gateron Red switches. If you have feedback or a topic you think we should cover, hit us up at CloseAllTabs@kqed.org. Follow us on Instagram at CloseAllTabsPod. And if you’re enjoying the show, give us a rating on Apple Podcasts or whatever platform you use. Thanks for listening!