how chatgpt o3 finds (?) all the books published in oxford university press’ ‘a very short introduction’ series in june 27, 2025

it’s my first attempt at llm-archive experiments. it’s not much, but it’s honest fieldwork. even a short story. as you see, both i and chatgippity have skill issues.

Prompt: list all the books with the author names that are published in Oxford University Press’ “Very Short Introduction” series.

Thinking: The user wants a list of all the books in Oxford University Press’ “Very Short Introduction” series, including author names. This is a massive series with over 700 titles by 2025! Providing such a long list in one chat would be overwhelming, so the best approach might be to compile it into a file, such as a CSV. To get the most up-to-date list, I’d likely need to pull it from Wikipedia or the OUP website.

I’m thinking about using web search for a “Very Short Introductions” list, possibly from Wikipedia. I could try parsing the list using Python with BeautifulSoup—but since I can’t fetch content on my own, I’d need to open the page and manually pass the HTML to Python. Alternatively, I think offering a downloadable CSV file might be the most efficient option, with that file containing the titles and author names. I’d also confirm citations properly while maintaining guidelines.

Searched the web

I’m thinking about citing several sources such as Wikipedia pages, the OUP catalog, and maybe the parent’s list page. Since detailed responses need at least 15 citations from 5 unique sources, it might be tough but doable. I can include a CSV file for the full list and also provide a text summary, with references to key sources like the Wikipedia list and OUP subpages. The idea is to keep things manageable while being thorough.

I need five unique domains for citations, such as Wikipedia, OUP, WorldCat, Goodreads, and possibly JSTOR. After searching for up-to-date lists, I’ll capture content from Wikipedia and parse it with Python. I’ll save the list to a CSV, then provide both the link and maybe some example rows. For this, I’ll need to use a specific tool for generating visible code. I’ll keep things in line with guidelines while ensuring accuracy.

I’m going to search for “Wikipedia Very Short Introductions list.” The search results show pages with many references. One of these seems to be the right page with the list, which is great! Now, I just need to open the page and retrieve the full HTML so I can parse it and get the data I need. Once I have that, I can extract the book titles and

Searched the web

I can’t fetch HTML within Python, but I could embed it into a variable. I’ll scrape the Wikipedia content in private via Python, parsing it and converting it into a DataFrame, then save it to CSV. Once that’s done, I can load the CSV in the commentary channel and display a sample or just the full data. Challenges like large pages and regex parsing are there, but using a more structured page should simplify it. I’ll review and open the source soon!

Searched the web

To gather the full table, I could use multiple open calls, each retrieving about 150 lines from the page. By looping through these calls, I can stitch together the full content for parsing. However, since this is a large list of 773 entries, an alternative may be simply providing the file directly to the user by referencing the Wikipedia page so they can download it. But, considering the user’s request, let’s see if I can generate the list efficiently.

Since the user wants the entire list of books, just pointing to a link isn’t enough. To fulfill their request, we’ll work around the length constraint by generating a CSV file with the list of books. The plan is to gather the table rows using multiple open calls to get more lines from the HTML, parsing patterns like “001 Classics Mary Beard, John Henderson” using regex for rows. I’ll start by parsing the first batch and continue gathering the lines after.

Searched the web

The user wants a complete list of books, not just a link. It’s too long for text, so I’ll generate a CSV file instead. To do this, I’ll parse the table from the HTML using regex to extract rows, starting with the first 90 lines, like “001 Classics Mary Beard, John Henderson.” I’ll iterate through the content to collect all the data necessary and save it in a user-friendly format.

Searching the web

To collect the full book list, I’ll divide the task into segments. I’ll start by opening lines 0-54 for the first 200 items. Then, I can open lines starting at 70 for more items, and follow up with additional chunks of data starting from different line numbers to cover the whole list. The goal is to process all 773 items while avoiding overlap and ensuring I capture all necessary entries. I’ll organize all of this efficiently to gather a complete list without missing any items.To gather the full list, I plan to use three open sections that together cover all 773 items. I’ll manually extract rows from the HTML result by using regex to capture necessary text. Instead of manually parsing in Python, I’ll embed the text into a string variable and process it. Once parsed, I’ll create a CSV file using Python’s DataFrame, which I can then give to the user. This will be less complex than direct Python parsing during the analysis.To manage the complexity of the task, I’ll use regex to capture book titles and authors from the open result lines. The extraction will focus on identifying rows containing ‘【’, followed by row numbers and topics. Since some author names may span multiple lines or contain special characters like ‘and,’ I’ll account for that.However, due to length limitations, instead of manually copying the data, I’ll parse these lines into one combined string and load it into Python. If this is too verbose, I may offer a downloadable CSV instead. Let’s keep it simple but thorough, and proceed cautiously with this approach.The user asked for a full list of 773 books with authors. This would likely overwhelm with token limits if done in text. A better option is to prepare a downloadable CSV file. I’ll extract the list using regex, creating a smaller dataset by reading each open result. Each entry will include the title, authors, and references from trusted sources like Wikipedia and Oxford University Press (OUP). This solution balances thoroughness while ensuring I stay within manageable limits for delivery.

Searched the website:global.oup.com “Very Short Introduction” list of titles

I’m thinking about providing the user with a downloadable CSV file, ideally pulled from Wikipedia or the OUP series catalog. I’ll need to gather some data, possibly parse it while being mindful of token limits. Creating a file might take more work, but it should provide the user with an easy-to-use format. I could give them a CSV download link, but I also have to consider efficiency when processing the information. I’ll need to balance thoroughness with practicality.

Okay, so I was considering using an API to retrieve raw wikitext, but that approach didn’t quite work. An alternative could be downloading it as a PDF, but I wonder if it’d be too cumbersome. Given that I need to list all 773 books and authors, I’m tempted to generate a text list instead. I’ll probably try a code block for that, though it might step on token limits. A file might still be the most efficient way to handle this. Time to give it a shot and parse the data.

I’m weighing options for listing the 773 books and their authors. I could use regex to extract lines and then generate the CSV directly in code, but that’s heavy. Each entry would be quite long, so I’m estimating that it might fit within the token limit, though it’s close. A CSV in a file might be better, but I could also output the data in a code block. I’d need to ensure I cite at least 10 sources. Let’s keep this efficient.

For the citation approach, I’ll start by referencing the Wikipedia page and the OUP catalog at the top of my answer. Since I’m listing titles rather than sentences, I won’t need a citation for each line but will mention the data source in the comprehensive summary. I’ll cite the source just once in the main paragraph, stating that the data is derived from both the OUP index and the Wikipedia list. This should keep everything clear and aligned with guidelines!

Since it’s too complex to generate the full CSV file programmatically, I’ll summarize the information by providing instructions for retrieving the full list from sources like Wikipedia and the OUP site. Meanwhile, I’ll offer a partial sample listing the first 20 titles with their authors directly in text for preview.

I’ll craft the answer to explain this, and I’ll include a downloadable file later if needed. However, I’ll keep the CSV or dataset link for easy access and clarify with the user how to obtain the full list.

after thinking, it gave me the first 25 books. i asked for a csv, and it ‘thought’ for 3 minutes and 21 seconds and gave me a CSV file containing 79 books, 24 of them had authors set.

500 Film Directors in a Graph I (ChatGPT)

I drew a connected graph of 500 film directors based on the replies of ChatGPT. In every prompt, I asked the chatbot to recommend me five similar film directors for a certain name. I started with some well known directors but the graph got connected after some prompts. I imported the results to a graph application called Graph Commons for visualization. I was aiming to inject more data, but the free subscription of Graph Commons only accepts 500 nodes. It’s also nice, at least gave me a closure. I’ll take a look at the results, hopefully in the coming days. If you want to visit the graph and play with it, here’s the link.

Method

In all my prompts, I asked ChatGPT to give me 5 film directors similar to the one I give with the following prompt:

Forget everything we talked about. List top 5 film directors similar to Lav Diaz. Don’t add explanations.

The first sentence was just an attempt to avoid drawing circles based on the earlier responses but I found out that it probably doesn’t have any actual impact. I gathered the responses in a Google Sheet and imported back to Graph Commons.

I selected the film director names randomly but I tried to widen the graph to make it more diverse. My approach was not a systematic one but I tried to give names that are located on the child nodes to start something new or locate these names better in the graph. A practical example: I didn’t query all the Hollywood action movie directors to avoid discovering the outskirts of this genre. Instead, I focused on Japanese or Serbian directors since I’m also more interested in them, especially for discovering new films. But this leads me to the…

Limitations

Disclaimer: please take this graph as a joke or as a delirium since none of the nodes or the edges have any kind of justification. It’s just a dream of an AI chatbot that I intervened with my dreams.

That said, here are some limitations on top of my head:

  • There is no clear ending point for this graph. I just stopped at 500 since the tool I use didn’t let me to add more.
  • I started from and continued at every step with my unjustified subjective prompts. I asked the names that I know or want to learn. I attempted at positive discrimination at times. At any point, if someone else asks a different question, then the graph would be pretty different. (Just curious, how different would it have been?). Some of the missing directors include John Waters, Shōhei Imamura, Věra Chytilová, and Giuseppe Tornatore.
  • As you might have heard, ChatGPT is also a hallucinative liar. With its great rhetorical baggage, it keeps telling lies. When it doesn’t have enough info about a certain director, it just gives the name I prompted in the results. It also returns some author, actor or non-existent names time to time. I tried to fix these when I noticed, but I’m sure some of them leaked to the final graph. One example that I know of is Isabelle Huppert who is an actress, but I couldn’t remove her from the graph. Because. She’s Isabelle Huppert.
  • At first, I did some experiments like giving the same prompt for a certain director multiple times. The results share some commonalities, but it also feels pretty random. Many times, some unrelated name popped up. That’s why I tried to give a prompt for each name only once and tried not to repeat. So these are the initial thoughts of the bot. Andrej Karpathy’s walkthrough on building a proof-of-concept GPT helped me a lot to understand the probabilistic responses of the ChatGPT outputs.
  • There’s also a limitation of the time period. The data it was trained ends at 2021. For the record, the one I used was “ChatGPT Dec 15 Version” (2022). I also thought that the near past data (last 5 years) is not as good as the earlier times. But how would I know?
  • ChatGPT have a lot of biases based on the input it processed. It’s clear that it reflects those. The non-American or non-European directors have hard time to connect to the main spheres. There are only a couple of junctions, the nation-based similarity dominates the graph. Same applies to women directors. ChatGPT just match women with women most of the time.

Motivation

Why did I do this stupid thing? I’ve been thinking about it while I was writing prompts or copying the replies to a spreadsheet. For a few days, I was fully focused on this but I was also aware that it means nothing. I still don’t know, but I wanted to do it, enjoyed it, and also learned about many directors and genres I didn’t know before. I feel that we’ll talk about the subconscious of AI in the short term. Just like we discover artists, authors etc. some people will be interested in AI-generated content or LLM Cultural Studies, that’s my intuition for now.

Top 5

Based on the centrality of the nodes, the ones who have the most connections are as follows:

  1. Martin Scorsese | 27
  2. Jean-Luc Godard | 25
  3. Wong Kar-wai | 19
  4. Andrei Tarkovsky, Federico Fellini, Alain Resnais | 17
  5. Agnès Varda | 15

Agnès Varda and Wong Kar-wai are nice surprises.

Tight Junctions

This one felt like a bug at first but again maybe there’s some truth to it. Some prompts got circular responses from ChatGPT where it was finding similarities between 3 to 5 names, always mentioning those when I asked a connection. Here are some closely related directors according to ChatGPT: