LLM Input/Output - Output Parsers

Table of contents

Output Parsers

Chat Models and LLMs

LangChain uses existing Large Language Models (LLMs) from various providers like OpenAI and Hugging Face. It does not build its own LLMs but offers a standard API to interact with different LLMs through a standard interface.

Accessing Commercial LLMs like ChatGPT

from langchain_openai import ChatOpenAI

# instantiate the model
llm = ChatOpenAI(
        model='gpt-3.5-turbo',
        temperature=0
    )

Output Parsers

Output parsers in Langchain are crucial for structuring responses from language models. Here are examples of Langchain’s specific parser types:

PydanticOutputParser:
- Uses Pydantic models to ensure outputs match a specified schema, providing type checking and coercion similar to Python dataclasses.
JsonOutputParser:
- Ensures outputs adhere to an arbitrary JSON schema, with Pydantic models optionally used to declare the data structure.
CommaSeparatedListOutputParser:
- Extracts comma-separated values from model outputs, useful for lists of items.

Pydantic OutputParser

from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field

# define the desired data structure
class QueryResponse(BaseModel):
  description: str = Field(description= "A brief description of the topic asked by the user")
  pros: str = Field(description='three points showing the pros of the topic asked by the user')
  cons: str = Field(description='three points showing the cons of the topic asked by the user')
  conclusion: str = Field(description="summary of topic asked by the user")

# Set up a parser and add instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=QueryResponse)
print(parser)

PydanticOutputParser(pydantic_object=<class '__main__.QueryResponse'>)

print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema 
{"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
{"properties": {"description": {"title": "Description", "description": "A brief description of the topic asked by the user", "type": "string"}, "pros": {"title": "Pros", "description": "three points showing the pros of the topic asked by the user", "type": "string"}, "cons": {"title": "Cons", "description": "three points showing the cons of the topic asked by the user", "type": "string"}, "conclusion": {"title": "Conclusion", "description": "summary of topic asked by the user", "type": "string"}}, "required": ["description", "pros", "cons", "conclusion"]}

# create final prompt with formatting instructions from the parser

prompt_txt = """
    Answer the user query and generate the response based on the following formmatted instructions:

    formatted instructions:
    {format_instructions}

    Query:
    {query}
  """

prompt = PromptTemplate(
    template=prompt_txt,
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)
print(prompt)

PromptTemplate(input_variables=['query'], partial_variables={'format_instructions': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"description": {"title": "Description", "description": "A brief description of the topic asked by the user", "type": "string"}, "pros": {"title": "Pros", "description": "three points showing the pros of the topic asked by the user", "type": "string"}, "cons": {"title": "Cons", "description": "three points showing the cons of the topic asked by the user", "type": "string"}, "conclusion": {"title": "Conclusion", "description": "summary of topic asked by the user", "type": "string"}}, "required": ["description", "pros", "cons", "conclusion"]}\n```'}, template='\n              Answer the user query and generate the response based on the following formmatted instructions:\n\n              formatted instructions:\n              {format_instructions}\n\n              Query:\n              {query}\n            ')

chain = (prompt | llm | parser)

question = "Tell me about the carbon sequestration"
# invoke chain
response = chain.invoke({"query": question})
# get the response
print(response)

QueryResponse(description='Carbon sequestration is the process of capturing and storing carbon dioxide to mitigate its presence in the atmosphere and reduce the impact of climate change.', pros='1. Helps reduce greenhouse gas emissions. 2. Can help restore degraded lands. 3. Provides economic opportunities in carbon offset markets.', cons='1. Requires significant investment and technology. 2. Long-term storage risks and uncertainties. 3. Potential for negative environmental impacts if not managed properly.', conclusion='Overall, carbon sequestration has the potential to play a significant role in addressing climate change, but careful planning and monitoring are essential to ensure its effectiveness and sustainability.')

print(response.description)

'Carbon sequestration is the process of capturing and storing carbon dioxide to mitigate its presence in the atmosphere and reduce the impact of climate change.'

# printing as dictionary
response.dict()

{'description': 'Carbon sequestration is the process of capturing and storing carbon dioxide to mitigate its presence in the atmosphere and reduce the impact of climate change.',
'pros': '1. Helps reduce greenhouse gas emissions. 2. Can help restore degraded lands. 3. Provides economic opportunities in carbon offset markets.',
'cons': '1. Requires significant investment and technology. 2. Long-term storage risks and uncertainties. 3. Potential for negative environmental impacts if not managed properly.',
'conclusion': 'Overall, carbon sequestration has the potential to play a significant role in addressing climate change, but careful planning and monitoring are essential to ensure its effectiveness and sustainability.'}

for key, value in response.dict().items():
  print(f"{key}:\n{value}\n")

description:
Carbon sequestration is the process of capturing and storing carbon dioxide to mitigate its presence in the atmosphere and reduce the impact of climate change.

pros:
1. Helps reduce greenhouse gas emissions. 2. Can help restore degraded lands. 3. Provides economic opportunities in carbon offset markets.

cons:
1. Requires significant investment and technology. 2. Long-term storage risks and uncertainties. 3. Potential for negative environmental impacts if not managed properly.

conclusion:
Overall, carbon sequestration has the potential to play a significant role in addressing climate change, but careful planning and monitoring are essential to ensure its effectiveness and sustainability.

JsonOutputParser

from typing import List

from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field

# define the data structure
class QueryResponse(BaseModel):
  description: str = Field(description= "A brief description of the topic asked by the user")
  pros: str = Field(description='three points showing the pros of the topic asked by the user')
  cons: str = Field(description='three points showing the cons of the topic asked by the user')
  conclusion: str = Field(description="summary of topic asked by the user")

# set up parser
parser = JsonOutputParser(pydantic_object=QueryResponse)
print(parser)

JsonOutputParser(pydantic_object=<class '__main__.QueryResponse'>)

# create final prompt with formatting instructions from the parser

prompt_txt = """
  Answer the user query and generate the response based on the following formmatted instructions:

  formatted instructions:
  {format_instructions}

  Query:
  {query}
"""

# create a template for a string prompt
prompt = PromptTemplate(
    template=prompt_txt,
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)
print(prompt)

PromptTemplate(input_variables=['query'], partial_variables={'format_instructions': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"description": {"title": "Description", "description": "A brief description of the topic asked by the user", "type": "string"}, "pros": {"title": "Pros", "description": "three points showing the pros of the topic asked by the user", "type": "string"}, "cons": {"title": "Cons", "description": "three points showing the cons of the topic asked by the user", "type": "string"}, "conclusion": {"title": "Conclusion", "description": "summary of topic asked by the user", "type": "string"}}, "required": ["description", "pros", "cons", "conclusion"]}\n```'}, template='\n              Answer the user query and generate the response based on the following formmatted instructions:\n\n              formatted instructions:\n              {format_instructions}\n\n              Query:\n              {query}\n            ')

# create a chain
chain = (prompt | llm | parser)
print(chain)

PromptTemplate(input_variables=['query'], partial_variables={'format_instructions': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"description": {"title": "Description", "description": "A brief description of the topic asked by the user", "type": "string"}, "pros": {"title": "Pros", "description": "three points showing the pros of the topic asked by the user", "type": "string"}, "cons": {"title": "Cons", "description": "three points showing the cons of the topic asked by the user", "type": "string"}, "conclusion": {"title": "Conclusion", "description": "summary of topic asked by the user", "type": "string"}}, "required": ["description", "pros", "cons", "conclusion"]}\n```'}, template='\n              Answer the user query and generate the response based on the following formmatted instructions:\n\n              formatted instructions:\n              {format_instructions}\n\n              Query:\n              {query}\n            ')
| ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7d2609ab6fe0>, async_client=<openai.resources.chat.completions.AsyncCompletions>, temperature=0.0)
| JsonOutputParser(pydantic_object=<class '__main__.QueryResponse'>)

queries = [
  "Tell me about the carbon sequestration",
  "Tell me about backpropagation algorithm in machine learning"
]

queries_formatted = [{"query": subject} for subject in queries]
print(queries_formatted)

[{'query': 'Tell me about the carbon sequestration'},
{'query': 'Tell me about backpropagation algorithm in machine learning'}]

# get the response
responses = chain.map().invoke(queries_formatted)

import pandas as pd
# convert response to DataFrame
data = pd.DataFrame(responses)
print(data)

	description	pros	cons	conclusion
0	Carbon sequestration is the process of capturi...	1. Helps reduce greenhouse gas emissions. 2. C...	1. Requires significant investment and technol...	Carbon sequestration has the potential to play...
1	Backpropagation is a key algorithm used in tra...	1. Backpropagation allows neural networks to l...	1. Backpropagation can suffer from the vanishi...	In conclusion, backpropagation is a powerful a...

for response in responses:
  for key, val in response.items():
    print(f"{key}:\n{val}\n")
  print('-----------------------------------------------------------')

description:
Carbon sequestration is the process of capturing and storing carbon dioxide to mitigate its presence in the atmosphere and combat climate change.

pros:
1. Helps reduce greenhouse gas emissions. 2. Can help improve soil quality. 3. Provides potential economic opportunities in carbon trading.

cons:
1. Requires significant investment and technology. 2. Some methods may have limited effectiveness. 3. Long-term storage risks and uncertainties.

conclusion:
Carbon sequestration has the potential to play a significant role in addressing climate change, but it also comes with challenges and uncertainties that need to be carefully considered.
-----------------------------------------------------------
description:
Backpropagation is a key algorithm used in training artificial neural networks in machine learning. It involves calculating the gradient of a loss function with respect to the weights of the network, and then using this gradient to update the weights in order to minimize the loss.

pros:
1. Backpropagation allows neural networks to learn complex patterns and relationships in data. 2. It is an efficient way to optimize the weights of a neural network. 3. Backpropagation can be used in various types of neural networks, such as feedforward and recurrent networks.

cons:
1. Backpropagation can suffer from the vanishing gradient problem in deep neural networks. 2. It requires a large amount of labeled training data to perform well. 3. Backpropagation can be computationally expensive, especially for large networks.

conclusion:
In conclusion, backpropagation is a powerful algorithm that has been instrumental in the success of neural networks in machine learning, despite some limitations.
-----------------------------------------------------------

CommaSeparatedListOutputParser

from langchain_core.output_parsers import CommaSeparatedListOutputParser

# output parser
output_parser = CommaSeparatedListOutputParser()

# get formatted instructions
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

'Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`'

# create final prompt with formatting instructions from the parser

prompt_txt = """
  List 5 real-world use cases where object detection can be used:

  output format instructions:
  {format_instructions}
"""

prompt = PromptTemplate.from_template(template=prompt_txt)
print(prompt)

PromptTemplate(input_variables=['format_instructions'], template='\n              List 5 real-world use cases where object detection can be used:\n\n              output format instructions:\n              {format_instructions}\n\n            ')

chain = (prompt | llm | output_parser)

response = chain.invoke({'format_instructions': format_instructions})

# loop through response as it is list
for r in response:
  print(r)

Autonomous vehicles for detecting pedestrians cyclists and other vehicles on the road
Retail stores for tracking inventory levels and monitoring product placement
Security systems for identifying unauthorized individuals entering restricted areas
Healthcare for analyzing medical images and detecting abnormalities or diseases
Agriculture for monitoring crop health and identifying pests or diseases in plants