Design & Dev Tools

AI Tools to Analyze Complex Documents

Generative AI is handy for condensing and simplifying vast amounts of info. Use cases for marketers and business execs include:

  • Explaining complicated documents, such as patents.
  • Summarizing long narratives into key points and definitions.
  • Extracting important sections of ebooks and whitepapers for blogs and promotional materials.

What follows are four gen AI tools to accomplish those tasks. To test, I instructed the tools to analyze a 2023 patent from Google for Search Generative Experience, with three prompts:

  • What is this document about?
  • How does SGE select external resources in its answers? Do the resources originate from organic search results?
  • Define essential terms from the document.

My goals were to evaluate the accuracy of the responses and check for “hallucinating” — inventing info.

ChatGPT

Home page of ChatGPT

ChatGPT

ChatGPT can now process uploaded documents, including PDF files. It provided a generic overview of the prompt, “What is this document about?”

But it was the only tool that answered the second question, “Did the resources originate from organic search results?”

According to the patent document, not all documents are selected solely from organic search results. The selection process involves different categories of search result documents:

Query-Responsive Search Result Documents: These are documents obtained in response to the original query submitted by the user.

Related-Query-Responsive Search Result Documents: These documents are derived from searches performed for queries related to the original query.

Recent-Search-Responsive Search Result Documents: These are documents that responded to recent queries, possibly related to the current query or user’s search history.

ChatGPT made up the names of the categories (in bold)! The patent does not use those terms.

So I prompted, “I don’t see these category names in the original document. Did you make them up?”

ChatGPT’s response:

You are correct, and I apologize for the confusion. The category names I used (“Query-Responsive Search Result Documents,” “Related-Query-Responsive Search Result Documents,” and “Recent-Search-Responsive Search Result Documents”) were indeed my own phrasing to summarize the concepts described in the patent. They are not directly quoted from the document.

The patent actually discusses the process of selecting documents based on their responsiveness to a given query, related queries, and recent searches. These selections can include documents that responded to the original query, as well as those responding to related or recent queries. My intent was to clarify this process, but I should have stated clearly that these category names were my own summarizations, not direct quotations from the document.

Thank you for pointing out this discrepancy.

ChatGPT’s response is an example of hallucinating, a weakness of many gen-AI tools.

Claude

Home page of Claude

Claude

Claude is an independent generative AI model from Anthropic that does not use ChatGPT, unlike the other tools on this list. In my testing, Claude was the most cautious in responding.

For example, the other three tools answered “Yes” to “Are the documents selected from the organic search results?” Only Claude offered a nuanced response:

The patent does not explicitly state if the documents are selected from organic search results.

However, there are a few contextual clues that suggest they are referring to organic search results:

    • The background describes providing query-based summaries of “search result documents” without specifying if these are organic or paid results. Typically “search results” refers to organic results returned by a search engine.
    • There is no mention of filtering out or handling paid/promoted result documents differently than organic documents when selecting documents to summarize.
    • The goal seems to be summarizing the most relevant documents to the search query to provide an overview for users. Organic search results are generally more optimized for relevancy than promotional or paid results.
    • Related queries are used to expand document selection. Related/expanded queries are typically used with organic document sets, not paid advertising results.

So while not definitively stated, the patent language strongly suggests the query-responsive documents being selected to generate summaries are organic search results rather than paid or promotional results.

PDF Pals

Home page of PDF Pals

PDF Pals

PDF Pals is a desktop application that offers 30 free questions for testing. It requires a ChatGPT API key for those tests.

Once installed, PDF Pals can process uploaded PDF documents.

The tool helpfully includes page numbers with its responses, making it easier to verify the info. Clicking any page number will take you to that section in the document.

In my testing, PDF Pals didn’t simplify the patent to my level of understanding. Its responses were too technical, despite my prompting it otherwise. Still, the summaries were useful, albeit complicated.

AskYourPDF

Home page of AskYourPDF

AskYourPDF

AskYourPDF is a web app requiring no API key for testing. After scanning a document, AskYourPDF suggests optional follow-up questions. Like PDF Pals, it includes page numbers, although they are not clickable.

AskYourPDF’s responses were easier to understand than PDF Pals’ and, conversely, less comprehensive. And it didn’t extract definitions from the PDF patent, stating incorrectly that none were there.

Thus AskYourPDF in my testing was helpful for higher-level overviews but not detailed. A benefit of that approach, however, is likely fewer hallucinations.

Interestingly, all four tools analyzed the Google PDF patent slightly differently. Each provided unique explanations. The key is verifying the info. All of the tools made mistakes.

Ann Smarty
Ann Smarty
Bio


x