Hands-On - Evidently AI

Official Website
Evidently AI is a platform that specializes in explaining machine learning models, providing transparency and interpretability in their predictions. It helps data scientists and machine learning practitioners understand, visualize, and communicate the behavior of their models. By offering insights into model performance and potential biases, Evidently AI contributes to building more trustworthy and understandable artificial intelligence applications.

Read More

Data Jam EP 6 - Rotary Position Embedding (RoPE)

Rotary Position Embedding, or RoPE, is a type of position embedding which encodes absolute positional information with rotation matrix and naturally incorporates explicit relative position dependency in self-attention formulation.
Notably, RoPE comes with valuable properties such as flexibility of being expand to any sequence lengths, decaying inter-token dependency with increasing relative distances, and capability of equipping the linear self-attention with relative position encoding.

Read More

Data Jam EP 3 - Multimodal Deep Learning

Overview

Multimodal deep learning is an approach in machine learning that focuses on processing and understanding data from multiple modalities or sources, such as text, images, audio, and more. This approach aims to leverage the complementary information provided by these different data types to improve the accuracy and richness of machine learning models –by gpt

Read More

Data Jam EP 2 - Anomaly Detection

Anomaly Types

  • Point or Global Anomalies → specific data points
    A point anomaly is where a single datapoint stands out from the expected pattern, range, or norm. In other words, the datapoint is unexpected.
Read More

Data Jam EP 1 - Explore Large Language Model

LLM

  • A Very Gentle Introduction to Large Language Models without the Hype - by Mark Riedl - Medium
  • Awesome-LLM: a curated list of Large Language Model
  • Llama-cpp - 🦜️🔗 Langchain
  • Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4
  • Introduction - CS324
  • COS 597G (Fall 2022): Understanding Large Language Models
  • Large Language Model Text Generation Inference
Read More

TorchSenti - Sentiment Analysis Framework for Researcher with PyTorch

TorchSenti is a natural language library that focuses on sentiment analysis tasks which aims to provide sentiment analysis dataset and pre-trained models. The library build on top of PyTorch, we want to support research community to expand the knowledge and contributors to solve current problems. Those features and resources helps NLP researchers to benchmark and evaluate their proposed method. However, this library may be a starting point for everyone that want to learn sentiment analysis in depth. Find the details on the repository.

Read More

Experiments on Paraphrase Identification using Quora Question Pairs

We modeled the Quora question pairs dataset to identify a similar question. The dataset that we use is provided by Quora. The task is a binary classification. We tried several methods and algorithms and different approach from previous works. For feature extraction, we used Bag of Words including Count Vectorizer, and Term Frequency-Inverse Document Frequency with unigram for XGBoost and CatBoost. Furthermore, we also experimented with WordPiece tokenizer which improves the model performance significantly. We achieved up to 97 percent accuracy. Code and Dataset

Read More