!pip install sentence-transformers
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: sentence-transformers in /usr/local/lib/python3.8/dist-packages (2.2.2)
Requirement already satisfied: torch>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from sentence-transformers) (1.13.0+cu116)
Requirement already satisfied: transformers<5.0.0,>=4.6.0 in /usr/local/lib/python3.8/dist-packages (from sentence-transformers) (4.25.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from sentence-transformers) (4.64.1)
Requirement already satisfied: huggingface-hub>=0.4.0 in /usr/local/lib/python3.8/dist-packages (from sentence-transformers) (0.11.1)
Requirement already satisfied: nltk in /usr/local/lib/python3.8/dist-packages (from sentence-transformers) (3.7)
Requirement already satisfied: sentencepiece in /usr/local/lib/python3.8/dist-packages (from sentence-transformers) (0.1.97)
Requirement already satisfied: scipy in /usr/local/lib/python3.8/dist-packages (from sentence-transformers) (1.7.3)
Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (from sentence-transformers) (1.21.6)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.8/dist-packages (from sentence-transformers) (1.0.2)
Requirement already satisfied: torchvision in /usr/local/lib/python3.8/dist-packages (from sentence-transformers) (0.14.0+cu116)
Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.8/dist-packages (from huggingface-hub>=0.4.0->sentence-transformers) (21.3)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.8/dist-packages (from huggingface-hub>=0.4.0->sentence-transformers) (6.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from huggingface-hub>=0.4.0->sentence-transformers) (3.9.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.8/dist-packages (from huggingface-hub>=0.4.0->sentence-transformers) (4.4.0)
Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from huggingface-hub>=0.4.0->sentence-transformers) (2.25.1)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.8/dist-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (2022.6.2)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.8/dist-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (0.13.2)
Requirement already satisfied: joblib in /usr/local/lib/python3.8/dist-packages (from nltk->sentence-transformers) (1.2.0)
Requirement already satisfied: click in /usr/local/lib/python3.8/dist-packages (from nltk->sentence-transformers) (7.1.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from scikit-learn->sentence-transformers) (3.1.0)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/local/lib/python3.8/dist-packages (from torchvision->sentence-transformers) (7.1.2)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging>=20.9->huggingface-hub>=0.4.0->sentence-transformers) (3.0.9)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests->huggingface-hub>=0.4.0->sentence-transformers) (2.10)
Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from requests->huggingface-hub>=0.4.0->sentence-transformers) (4.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests->huggingface-hub>=0.4.0->sentence-transformers) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests->huggingface-hub>=0.4.0->sentence-transformers) (2022.12.7)
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.feature_extraction.text import TfidfVectorizer
# for embeddings
from utils_conflict_unsupervised import cal_cosine_bert_tf
from utils_conflict_unsupervised import cal_cosine_sim
from utils_conflict_unsupervised import cal_cosine_use
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /root/nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
# for finding the optimal cutoff from training set
from utils_conflict_unsupervised import find_optimal_cutoff_withroc
from utils_conflict_unsupervised import find_conflict_detect
from utils_conflict_unsupervised import plot_roc_curve
from utils_conflict_unsupervised import test_cutoff
req_df = pd.read_excel('/content/drive/MyDrive/Req_Conflict_Data/world_vista_labeled.xlsx',usecols=['idx','requirement', 'conflict', 'label'])
req_df['requirement'] = req_df['requirement'].astype(str)
req_df.head()
| idx | requirement | conflict | label | |
|---|---|---|---|---|
| 0 | 1.0 | The system shall allow medication orders to be... | Yes | Yes(2) |
| 1 | 2.0 | The system shall allow medication orders to be... | Yes | Yes(1) |
| 2 | 3.0 | The system shall allow physician offices to us... | Yes | Yes(4) |
| 3 | 4.0 | The system shall allow physician offices to on... | Yes | Yes(3) |
| 4 | 5.0 | The system shall trigger registration reminder... | No | NaN |
<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24" width="24px">
<script>
const buttonEl =
document.querySelector('#df-aebca96c-1b79-4b44-850c-100c2b25a713 button.colab-df-convert');
buttonEl.style.display =
google.colab.kernel.accessAllowed ? 'block' : 'none';
async function convertToInteractive(key) {
const element = document.querySelector('#df-aebca96c-1b79-4b44-850c-100c2b25a713');
const dataTable =
await google.colab.kernel.invokeFunction('convertToInteractive',
[key], {});
if (!dataTable) return;
const docLinkHtml = 'Like what you see? Visit the ' +
'<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
+ ' to learn more about interactive tables.';
element.innerHTML = '';
dataTable['output_type'] = 'display_data';
await google.colab.output.renderOutput(dataTable, element);
const docLink = document.createElement('div');
docLink.innerHTML = docLinkHtml;
element.appendChild(docLink);
}
</script>
</div>
# No: non-conflict, Yes: conflict
req_df['conflict'].value_counts()
No 84
Yes 56
Name: conflict, dtype: int64
train,test = train_test_split(req_df,train_size=0.80,stratify=req_df['conflict'].values,random_state = 45)
print("Training instances :\n",train['conflict'].value_counts())
print("Testing instances :\n",test['conflict'].value_counts())
Training instances :
No 67
Yes 45
Name: conflict, dtype: int64
Testing instances :
No 17
Yes 11
Name: conflict, dtype: int64
# get the optimal cosine similarity cutoff
cos_dict = find_conflict_detect(train,embeddings = 3)
Universal sentence encoder

# cosine similarity cutoff with roc curve
cutoff = find_optimal_cutoff_withroc(cos_dict)
print(cutoff)
0.59
test_df,candidate_set = test_cutoff(test,cutoff,embeddings = 3)
Universal sentence encoder
precision recall f1-score support
No 0.650000 0.764706 0.702703 17
Yes 0.500000 0.363636 0.421053 11
accuracy 0.607143 28
macro avg 0.575000 0.564171 0.561878 28
weighted avg 0.591071 0.607143 0.592054 28
********** Confusion Matrix for this fold *************
[[13 4]
[ 7 4]]
The tpr for this fold is : 0.36363636363636365
from utils_conflict_unsupervised import final_conflict
final_conflict(req_df,candidate_set,test_df)
precision recall f1-score support
No 0.590909 0.764706 0.666667 17
Yes 0.333333 0.181818 0.235294 11
accuracy 0.535714 28
macro avg 0.462121 0.473262 0.450980 28
weighted avg 0.489719 0.535714 0.497199 28
********** Confusion Matrix for this fold *************
[[13 4]
[ 9 2]]