Anjani Sowmya

Hi, I'm Anjani Sowmya

Aspiring Data Analyst & Data Scientist

Turning Raw Data into Predictive, Visual, and Actionable Insights

Contact Me

About Me

My Introduction

Insightful and results-oriented Data Analyst and aspiring Data Scientist with a strong foundation in turning complex data into clear, actionable insights. Proven experience in building end-to-end data solutions—from data collection and cleaning to advanced analytics, modeling, and visualization. Skilled in tools and technologies including Power BI, SQL, Python, R, Excel, TensorFlow, and Scikit-learn, with expertise in data visualization, ETL pipelines, machine learning, deep learning, and statistical analysis. Adept at developing predictive models, interactive dashboards, and automating reporting workflows to drive informed decision-making. Passionate about leveraging data to solve real-world challenges, uncover trends, and deliver strategic value across domains. Known for translating raw, structured, and unstructured data into impactful business outcomes through a blend of analytical thinking and technical proficiency.

Skills

What I Bring to the Table

Languages & Programming

Python, R, SQL, TypeScript, Node.js

Tools & Platforms

Power BI, Tableau, Excel, Git, Jupyter Notebook, R Shiny, MySQL Workbench, MS Access

Libraries & Frameworks

Pandas, NumPy, Scikit-learn, TensorFlow, PyTorch, OpenCV, Seaborn, Matplotlib, ggplot2, dplyr, Florence 2 (LLM)

Cloud & Databases

MySQL, MongoDB, AWS, Google Cloud, Azure

Analytics & Machine Learning

EDA, Feature Engineering, PCA, Regression, Classification, Clustering, Anomaly Detection, Time Series Analysis, Decision Trees, Random Forest, SVM, Naive Bayes, KNN, Deep Learning, Model Evaluation, Data Science

Data Workflow & Processing

ETL, Data Cleaning, Data Mining, Data Wrangling, Statistical Analysis, Databricks

Qualification

My Personal Journey
Education
Work

Masters of Science in Data Science

University of Arizona, Tucson, AZ, USA
Aug 2023 - May 2025
Coursework
  • Data Mining and Discovery: Learned techniques for uncovering patterns from large datasets using classification, clustering, outlier detection, and association rule mining with R Programming.

  • Data Analysis and Visualization: Built advanced data visualization skills using design principles, interactivity, and tools like ggplot2, Quarto and R Shiny to communicate insights effectively.

  • Introduction to Machine Learning: Implemented supervised and unsupervised ML algorithms for pattern recognition, decision making, and feature selection, laying the foundation for more advanced ML systems using Python.

  • Artificial Intelligence: Studied AI methods including search, logic, probabilistic reasoning, and reinforcement learning. Applied these to solve problems involving uncertainty and decision-making using Python.

  • Data Warehousing and Analytics in the Cloud: Designed scalable cloud-based data warehousing solutions and performed large-scale analytics using MySQL and modern cloud tools like Microsoft Azure.

  • Database Development and Management: Built and managed relational databases using MS Access. Covered end-to-end development from schema design to queries, forms, and reports.

  • Healthcare Data Science: Developed predictive models and data pipelines tailored for healthcare datasets using SQL and R Shiny dashboards. Gained a deep understanding of the impact and applications of healthcare data, including how to access, interpret, and utilize it effectively.

  • Data Ethics: Gained a strong foundation in ethical issues surrounding data use, including bias, privacy, surveillance, and global policy. Explored real-world dilemmas across sectors like health, education, and social media.

  • Science Information and Its Presentation: Explored the societal role of scientific information and how it is accessed, interpreted, and shared across diverse contexts. Gained interdisciplinary insights into making science data findable and understandable for varied audiences.

Bachelor of Technology in Computer Science

Andhra University, Visakhapatnam, Andhra Pradesh, India
Aug 2019 - May 2023
Coursework
  • Computer Programming using C: Learned core concepts of procedural programming: data types, control flow, functions, arrays, pointers, file I/O, and basic debugging in C.

  • Probability, Statistics & Queuing Theory: Studied probability models, statistical inference, regression, and stochastic queuing systems.

  • Data Structures & Algorithms: Explored efficient data organization techniques and algorithms (linked lists, trees, sorting, searching), with complexity analysis and performance evaluation.

  • Digital Logic Design: Gained foundational knowledge of digital circuits, Boolean algebra, combinational/sequential logic, and hardware design principles.

  • Database Management Systems: Designed and managed relational databases, learned SQL, normalization, schema design, transactions, and query optimization.

  • Computer Networks: Covered network architecture, protocols (TCP/IP), routing, switching, and fundamentals of OSI layers and network communication.

  • Operating Systems: Explored processes, threading, memory management, scheduling, concurrency, and file system design.

  • Object-Oriented Software Engineering: Applied OO principles like encapsulation, inheritance, and design patterns (UML modeling) for software development lifecycle.

  • Web Technologies: Learned HTML, CSS, JavaScript, web frameworks, client–server architecture, and basic full-stack development.

  • Image Processing: Studied digital image fundamentals, filtering, transformations, feature detection, and basic computer vision techniques using tools like OpenCV.

  • Data Warehousing & Data Mining: Learned to design data warehouses, ETL pipelines, OLAP, and apply mining techniques like clustering, classification, and association rule discovery.

  • Machine Learning: Implemented supervised/unsupervised learning algorithms (regression, classification, clustering) and evaluated models using metrics and feature selection.

Data Engineer | University of Arizona

Tucson, Arizona, United States
Jan 2025 - Present
Key Contributions

  • Engineered an emotion recognition system with Python, integrating deep learning for multimodal inputs, achieving 69% classification accuracy.

  • Preprocessed and conducted validation checks to align speech & visual data using OpenCV and Whisper, reducing preprocessing time by 25%.

  • Fine-tuned Florence-2 for emotion classification, achieving a training loss of 1.67 while optimizing GPU usage and reducing training time by 30%.

Data Engineer Intern | Servify

Hyderabad, Telangana, India
May 2024 - Aug 2024
Key Contributions

  • Built a secure Audit Log System using Node.js, Prisma ORM, and RabbitMQ for asynchronous, enabling low-latency tracking of 10K+ user actions and compliance reporting.

Data Scientist Intern | SmartKnower

Bengaluru, Karnataka, India
May 2022 - Aug 2022
Key Contributions

  • Built an ML pipeline using the UCI Adult dataset, applying EDA, data cleaning, and feature engineering for income classification.

  • Trained multiple classifiers (Decision Tree, Random Forest, Logistic Regression, SVM, KNN), achieving 85.5% accuracy with Decision Tree.

  • Presented model insights and performance metrics to non-technical stakeholders to guide business decisions.

Projects

Where Data Meets Impact

Invoice Dashboard

Oct 2025 - Dec 2025
View Details
  • Designed and implemented a full Databricks Medallion Architecture (Bronze → Silver → Gold) pipeline to extract, cleanse, and normalize unstructured invoice data using Spark, Delta Lake, and advanced regex-based parsing.

  • Engineered automated product categorization logic and built analytical models to compute revenue, VAT, quantity trends, and year-over-year metrics, improving data usability and business insight generation.

  • Developed an interactive Databricks SQL dashboard featuring category distribution, timeline trends, and gross/net comparisons, enabling clear visualization of key financial and operational insights.

View Dashboard

Multimodal Emotion Recognition System Using Florence 2

Jan 2025 - May 2025
View Details
  • Implemented an emotion recognition system integrating deep learning models for synchronized text and visual inputs.

  • Preprocessed speech and visual frames using OpenCV and Whisper, applying semantic and exact text-to-emotion mapping.

  • Fine-tuned Florence-2 for emotion classification, achieving 69% accuracy and a training loss of 1.67, while optimizing GPU utilization, batch processing, and memory management for efficient model training.

View Report View Poster

Determinants of Life Expectancy

Aug 2024 - Dec 2024
View Details
  • Conducted a global study on life expectancy disparities using WHO and UN data, analyzing key socioeconomic, healthcare, and environmental factors across countries.

  • Performed data cleaning, handled missing values, and conducted EDA with correlation heatmaps, box plots, and geographic visualizations.

  • Developed ML models (Logistic, Ridge, Lasso Regression); the best model achieved 87% accuracy and identified income composition and child mortality as top predictors.

  • Applied predictive modeling and healthcare analytics to a real-world public health challenge, strengthening expertise in R, data visualization, and regression techniques.

View Code View Report View Presentation

World Cup Cricket Dashboard

Mar 2024 - May 2024
View Details
  • Built a simulated live cricket analytics dashboard using historical match data and backend API-driven updates to mimic real-time performance.

  • Processed structured datasets for key metrics (scoring rate, partnerships, player stats) and applied data cleaning, transformation, and dynamic aggregation.

  • Visualized match dynamics through line charts (run rate trends), bar plots (top batsmen), partnership graphs, and team comparison donut charts using R, ggplot2, and Quarto.

  • Revealed strong correlation between top partnerships and total scores, with visual insights into team dominance and scoring fluctuations enhancing real-time sports analytics skills.

View Report

Interactive Tourism Dashboard

Jan 2024 - May 2024
View Details
  • Designed a Power BI dashboard to visualize visitor trends to India, integrating multiple Kaggle datasets covering foreign tourists, overseas Indians, and crew demographics.

  • Performed data cleaning, reshaping, and merging in R using JOIN operations; created conceptual, logical, and physical data models for efficient schema design.

  • Implemented DAX measures for dynamic aggregations (e.g., total visitors, year-over-year change, gender ratios) and built demographic filters for age group, gender, country, and mode of travel.

  • Developed interactive visuals including line graphs, tornado plots, treemaps, and stacked charts to analyze visitor trends and demographics, uncovering key patterns across age, country, and entry points.

View Report

Checkers Game

Jan 2024 - May 2024
View Details
  • Designed a playable Checkers game with human vs. human and human vs. AI modes, implementing core mechanics like move validation, piece capturing, and winner detection.

  • Built a Monte Carlo simulation–based AI in Python that evaluates random future game outcomes to select the most strategic move.

  • Overcame challenges related to game state modeling, randomness control, and AI integration through iterative debugging and testing.

  • Demonstrated strong AI performance in simulations, with future plans to integrate reinforcement learning for smarter opponent move anticipation.

View Code View Report

Descriptive Analysis of Grade Outcomes

Aug 2023 - Dec 2023
View Details
  • Analyzed grade outcomes (DEW rates) across semesters pre-, mid-, and post-COVID to uncover patterns in student performance using DEW and course attribute datasets.

  • Applied EDA, anomaly detection, and time series analysis, revealing that fully online, asynchronous courses had notably higher DEW rates—especially during Fall 2020.

  • Built decision tree and regression models to identify key predictors such as course modality and session timing, though overall predictive accuracy remained limited.

  • Highlighted the evolving impact of online learning and proposed future enhancements through expanded data sources and refined modeling techniques.

View Report

SmartGPT

Nov 2022 - May 2023
View Details
  • Developed a custom GPT-like NLP system to generate responses from university datasets using cosine similarity, Excel-based input, and tools like Python, Scikit-learn, NLTK, and OpenPyXL.

  • Preprocessed institutional documents and fine-tuned the model for context-aware answers; evaluated accuracy through expert validation.

  • Built a user-friendly interface with HTML/CSS/JS and Node.js backend; modeled system behavior with UML diagrams (class, sequence, activity).

  • Conducted extensive testing (unit, integration, regression) and proposed future enhancements including sentiment analysis and cross-institution NLP applications.

View Code View Presentation

Winning Space Race with Data Science - IBM

Aug 2022 - Sept 2022
View Details
  • Applied full data science lifecycle using IBM Watson Studio and Cloud Pak for Data — including data cleaning, EDA, modeling, and visualization — to solve a real-world problem.

  • Performed feature engineering and built supervised (regression, classification) and unsupervised (K-Means, DBSCAN) models to uncover insights and make predictions.

  • Evaluated models using accuracy, precision-recall, AUC-ROC, and RMSE; used AutoAI to automate model selection and tuning for improved performance.

  • Communicated business insights through dashboards, Python-based visualizations, and data storytelling to guide data-driven decision-making.

View Code View Presentation

Certifications

Upskilling & Specializations

Databricks Fundamentals Accreditation

View Certificate

Microsoft Certified: Power BI Data Analyst Associate

View Certificate

IBM Data Science Professional Certification

View Certificate

HackerRank Skill Certification Test - SQL

View Certificate

HackerRank Skill Certification Test - Python

View Certificate

HackerRank Skill Certification Test - ReactJS

View Certificate

AWS Cloud Practitioner

View Certificate

Database Management Systems Certification

View Certificate

Cyber Security Analyst Job Simulation Certification

View Certificate

Best of Next Credential

View Certificate

Technical Series Credential

View Certificate

Contact Me

Get in Touch

Call Me

+1-520-534-7675

Location

Tucson, AZ, USA