In the corporate world, we attend numerous meetings daily. We may have meetings that last longer than expected, and we may forget to take notes on essential issues. It is vital to keep meeting minutes and what if it is done automatically?
With so much data moving in the digital environment, there is a need to develop algorithms that can automatically condense lengthy texts and summarize information so that the intended messages may be communicated fluently. With that objective, we incorporated the text summarizing model in our official life by automating the summation of minutes of meetings.
The summarization involves the following steps:
- Speech to Text Analytics conversion with the help of AWS Transcribe.
- Perform Extractive and Abstractive summarization using models and algorithms.
Speech to Text conversion using AWS Transcribe:
Amazon Transcribe provides high-quality and accurate speech-to-text transcription for a wide range of use cases. AWS Transcribe can also operate on streaming audio, providing a stream of transcribed text in real-time.
For transcribing the audio files using AWS Transcribe first the files must be stored in Amazon S3. AWS Transcribe can only operate files that are stored in Amazon S3. The number of speakers in the audio can be identified using the max_speakers parameter.
The steps for transcribing speech to text using AWS Transcribe are as follows
1. Create an AWS account
2. Go to the AWS Management Console page.
3. Click on your username on the top right, and choose “My Security Credentials.”
4. Choose “Access keys (access key ID and secret access key.”
5. Create new keys, and remember to save them!
6. Add to our code: Initialize the transcription job.
7. AWS Transcribe will transcribe files from your S3 Storage. Try to upload a random audio/video file to S3 Storage
8. Fill in the necessary details and run the code.
We can also create our own vocabulary to identify complex or any domain-specific words from the audio file. AWS Transcribe outputs a JSON.JSON is parsed to the required format shown below:
[0:00:02] spk_0: segments of invoices and time. So there are two things that were not Oneness, a Russian based on Mind Games and the other is version version Mind Mind Get project and the other is based on uh in 25 10 deep learning models that will automatically both tagged data and uh extract basic layouts of information like for example addresses and our own transport. So there are two versions of that we're working on on. So that's about the document. [0:00:49] spk_1: Yeah. Hi Ash.I'm working on the documentation portion. Actually, I'm just rewriting updating the FBI and points like documents like I'm just building a comprehensive documentation which the team requested.
Summarization
There are two types of summarizations:
Extractive summarization means identifying important sections of the text and generating them verbatim producing a subset of the sentences from the original text.
Abstractive Summarization reproduces important material in a new way after interpretation and examination of the text using advanced natural language.
Extractive Summarization
Text Rank is an extractive and unsupervised text summarization technique. The flow of the Text Rank algorithm is depicted in the below diagram.
Sample input
Vikatan. The trump administration has ordered the military to start withdrawing refused 7,000 troops from afghanistan in the coming months. To defence official statistics. Abrupt shift in the 17 year old war there and a decision that start afghan officials. Put the had not been on the planet. President trump made the decision to pull the troops. Khasra number united states has in afghanistan now. At the same time he decided to pull american forces out of syria. Man official site. Denouncement came hours after jim mattis the secretary of defence said that he would resign from his position at the end of february of the disagree with the president over his approach to policy in the middle east. The point of troop withdrawal and the resignation of mr matches leave a mark iii picture for what is next in the united states longest war. And the comments of gangster has been troubled by spasms of violin afflicting the capital kabul and other important area. The united states has also been conducting talks with the representatives of the taleban. In what officials have described as discussions that could lead to formal talks to end the conflict. Senior afghan officials and dustin diplomas in kabul woke up to the shock of the news on friday morning. And many of them based foki of ahead. Several afghan official often the loop on security planning and decision making. Said that they had received no indication in recent days that the americans would pull the troops out. The fear that mr trump might take a action however often leave the background of the questions with the united states they said. This on the abrupt decision as a further signed at voices from the ground was lacking in the debate over the war and that with mister mister mat is ignition afghanistan had lost one of the last influential voices in washington to channel that reality of the conflict into the white house deliberation. The president long campaigned on building troops home. But it 2017 at the request of mr machis ki begrudgingly flight an additional 4000 rupees to the afghan campaign to try to a sim. An end to the conflict. Do pentagon officials have said that the influx of forces coupled with a more aggressive their campaign was helping the war effort afghan forces continue to take nearly unsustainable level of casualties and lose ground to the taleban. The new american effort in 2017 was the first step in ensuring afghan forces could become more independent without a set timeline for withdrawal. But with plans to quickly reduce the number of american troops in the country it is unclear if the afghans can hold their own against increasingly aggressive taleban. Currently american air strike for reply was not seen the in the height of the war. When tens of thousands of american troops were spread throughout the country. The air support. Official se consists mostly of popping up. Afghan troops while they try to hold territory from a resurgent paliwal.
Sample output
President trump made the decision to pull the troops. Do pentagon officials have said that the influx of forces coupled with a more aggressive their campaign was helping the war effort afghan forces continue to take nearly unsustainable level of casualties and lose ground to the taleban. Abrupt shift in the 17 year old war there and a decision that start afghan officials. Afghan troops while they try to hold territory from a resurgent paliwal. Several afghan official often the loop on security planning and decision making. But with plans to quickly reduce the number of american troops in the country it is unclear if the afghans can hold their own against increasingly aggressive taleban. When tens of thousands of american troops were spread throughout the country. The fear that mr trump might take a action however often leave the background of the questions with the united states they said. The new american effort in 2017 was the first step in ensuring afghan forces could become more independent without a set timeline for withdrawal. Said that they had received no indication in recent days that the americans would pull the troops out
Abstractive summarization using custom dataset
Abstractive Summarization is performed using Text-To-Text Transfer Transformer (T5) model.T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format.
The SAMSum dataset, which contains 16k chat chats with hand annotated summaries, was used to train the model.In addition to this, I added a few of the meeting transcripts of our daily status update calls.
https://metatext.io/datasets/samsum
We are using the T5 base model and simpleT5 expects data frame to have 2 columns: “source_text” and “target_text” and the T5 model expects a task-related prefix: since it is a summarization task, we will add a prefix “summarize:”.The model was trained for epoch 10 with a batch size of 8, and it produces a fair summary.
Sample input
[0:00:49] spk_1: Yeah. Hi Team. I'm working on the documentation portion. Actually, I'm just rewriting updating the FBI and points like I'm just building a comprehensive documentation which client requested.[0:01:07] spk_0: Yeah. Okay.[0:01:19] spk_2: So actually client contacted me and with further care part of POC that came along. So we are looking into that like how to pass it and has created Sprint and he has a call today evening. So before that I'm scanning through the dataset that we have and trying to understand the data and ask questions.[0:01:53] spk_0: Okay.Okay, guys, thank you. See you tomorrow. Just a second. Thanks. Excellent.
Sample output
"i'm just building a comprehensive documentation which mike requested," spk_1 says. spk_2: "i'm scanning through the dataset that we have and trying to understand the data and ask questions.
Reference links
Step 1: Create your first S3 bucket
https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html
COLAB LINK
https://colab.research.google.com/drive/1aq-XY2_Ghqmf8gJt3jFClE6UibYsN-Nh?usp=sharing