AI Powered DORA Report #493

samad-yar-khan · 2024-07-30T11:03:24Z

Pull Request Contents

Acceptance Criteria fulfillment
Evaluation and Results: GPT4o Vs LLAMA 3.1
Conclusion
Future Work
Proposed changes (including videos or screenshots)

Quick Look at how it works 👀

Screen.Recording.2024-07-30.at.4.34.33.PM.mov

Acceptance Criteria fulfillment

Added Service for extensible model support through FireworksAI and OpenAi keys.
Added API to generate dora score based on 4 key data.
Added APIs for AI summary for all 4 key trends.
Added API to for compiled summarise all the data.
Added UI User to add token, select a model and generate AI summary.
Enable users to copy summary.

Evaluation and Results: GPT4o Vs LLAMA 3.1

We did the DORA AI analysis for July on the following open-source repositories: facebook/react, middlewarehq/middlware, meta-llama/llama and facebookresearch/dora.

Mathematical Accuracy

Middleware generated a DORA Performance Score for the team based on this guide by dora.dev
To test out the computational accuracy of the model we provide it with the four key metrics and prompt the LLM to generate a DORA Score and compare the results with Middleware.
The four keys was a JSON of the format.

    {
        "lead_time": 4000,
        "mean_time_to_recovery": 200000,
        "change_failure_rate": 20,
        "weekly_deployment_frequency": 2
    }

The Actual Dora Score for the repositories was around 5. While OpenAi’s GPT4o was able to predict the score to be 4-5 most of the times, LLAMA 3.1 405B a margin away.

DORA Metrics score: 5/10

GPT 4o with DORA score 5/10

LLAMA 3.1 with DORA Score 8/10 (incorrect)

GPT 4o DORA Score was closer to the actual DORA score than LLAMA 3.1 in 9/10 cases, hence GPT4o was more accurate compared to LLAMA 3.1 in this scenario.

Data Analysis

The trend data for the four keys dora metrics, calculated by Middleware, was fed to the LLMs as input along with different experimental prompts to ensure a concrete data analysis.
The trend data is usually a JSON object with date strings as keys, representing weeks' start dates mapped to the metric data.
```
	{
	   "2024-01-01": {
             ...
         },
         "2024-01-08": {
             ...
         }
	}
```
Mapping Data: Both the models were at par at extracting data from the JSON and inferring the data in the correct manner. Example: Both GPT and LLAMA were able to map the correct data to the input weeks without errors or hallucinations.

Deployment Trends Summarised: GPT4o

Deployment Trends Summarised: LLAMA 3.1 405B
Extracting Inferences: Both the models were able to derive solid inferences from data.
- LLAMA 3.1 identified week with maximum lead time along with the reason for the high lead time.
- This inference could be verified by the Middleware Trend Charts.
- GPT4o was also able to extract the week with the maximum lead time and the reason too, which was, high first-response time.
Data Presentation: Data representation has been a hit or miss with LLMs. There are cases where GPT performs better at data presentation but lacks behind LLAMA 3.1 in accuracy and there have been cases like the DORA score where GPT was able to do the math better.
- LLAMA and GPT were both given the lead time value in seconds. LLAMA was able to round off the data closer to the actual value of 16.99 days while GPT rounded off the data to 17 days 2 hours but presented the data in a more detailed format.
  
  GPT4o
  
  LLAMA 3.1 405B

Actionability

The models were able to output similar actionables for improving teams' efficiency based on all the metrics.
Example: Both the models identified the reason for high lead-time to be first-response time and suggested the team to use an alerting tool to avoid delayed PR Reviews. The models also suggested better planning to avoid rework where rework was high in a certain week.

GPT4o

LLAMA 3.1 405B

Summarisation

To test out the summarisation capabilities of the models we asked the model to summarise each metric trend individually and then feed the output results for all the trends back into the LLMs to get a summary or in Internet's slang DORA TLDR for the team.

The summarisation capability of large data is similar in both the LLMs.

LLAMA 3.1 405B

GPT4o

Conclusion

For a long time LLAMA was trying to catch up with GPT in terms of data processing and analytical abilities. Our earlier experimentation with older LLAMA models led us to believe that GPT is way ahead, but the recent LLAMA 3.1 405B model is at par with the GPT4o.

If you value data privacy of your customers and want to try out the open-source LLAMA 3.1 models instead of GPT4, go ahead! There will be negligible difference in performance and you will be able to ensure data privacy if you use self hosted models. Open-Source LLMs have finally started to compete with all the closed-source competitors.

Both LLAMA 3.1 and GPT4o are super capable of deriving inferences from processed data and making Middleware’s DORA metrics more actionable and digestible for engineering leaders, leading to more efficient teams.

Future Work

This was an experiment to build an AI powered DORA solution and in the future we will be focusing on adding greater support for self hosted or locally running LLMs from Middleware. Enhanced support for AI powered action-plans throughout the product using self hosted LLMs, while ensuring data privacy, will be our goal for the coming months.

Proposed changes (including videos or screenshots)

Added Services

Added AIAnalyticsService to allow summarising and inference of DORA data based on different models.
- We use Fireworks AI APIs for Large langauge models.
- We use OpenAI APIs for GPT models.

UI Changes

Added UI for choosing and generating summary.
Users can copy the generated summary in markdown.

Screen.Recording.2024-07-30.at.4.34.33.PM.mov

Added APIs

Dora Score

 curl --location 'http://localhost:9696/ai/dora_score' \
--header 'Content-Type: application/json' \
--data '{
    "data": {
        "lead_time": 4000,
        "mean_time_to_recovery": 200000,
        "change_failure_rate": 20,
        "avg_daily_deployment_frequency": 2
    },
    
    
    "access_token": "",
    "model": ""
}'

Lead Time Trend Summary

curl --location 'http://localhost:9696/ai/lead_time_trends' \
--header 'Content-Type: application/json' \
--data '{
    "data": {
        "2024-01-01": {
            "first_commit_to_open": 25,
            "first_response_time": 25,
            "lead_time": 274,
            "merge_time": 224,
            "merge_to_deploy": 0,
            "pr_count": 1,
            "rework_time": 0
        },
        "2024-01-08": {
            "first_commit_to_open": 20.8,
            "first_response_time": 67173,
            "lead_time": 86981.5,
            "merge_time": 3759.2,
            "merge_to_deploy": 0,
            "pr_count": 10,
            "rework_time": 16028.5
        }
    },
    "access_token": "",
    "model": ""
}'

Deployment Trends Summary

curl --location 'http://localhost:9696/ai/deployment_frequency_trends' \
--header 'Content-Type: application/json' \
--data '{
    "data": {
        "2024-01-01": 2,
        "2024-01-08": 24,
        "2024-01-15": 15
    },
    "access_token": "",
    "model": ""
}'

CFR Summary

curl --location 'http://localhost:9696/ai/change_failure_rate_trends' \
--header 'Content-Type: application/json' \
--data '{
    "data": {
        "2024-01-01": {
            "change_failure_rate": 10,
            "failed_deployments": 10,
            "total_deployments": 100
        },
        "2024-01-08": {
            "change_failure_rate": 20,
            "failed_deployments": 30,
            "total_deployments": 150
        },
    },
    "access_token": "",
    "model": "LLAMA3p170B"
}'

MTTR Summary

curl --location 'http://localhost:9696/ai/mean_time_to_recovery_trends' \
--header 'Content-Type: application/json' \
--data '{
    "data": {
        "2024-01-01": {
            "incident_count": 25,
            "mean_time_to_recovery": 32400
        },
        "2024-01-08": {
            "incident_count": 32,
            "mean_time_to_recovery": 43200
        },
    },
    "access_token": "",
    "model": "LLAMA3p170B"
}'

DORA Trend Correlation and Summary

curl --location 'http://localhost:9696/ai/dora_trends' \
--header 'Content-Type: application/json' \
--data '{
    "data": {
        "2024-01-01": {
            "lead_time": 141220,
            "deployment_frequency": 32,
            "change_failure_rate": 15,
            "mean_time_to_recovery": 86400
        },
        "2024-01-08": {
            "lead_time": 203242,
            "deployment_frequency": 25,
            "change_failure_rate": 18,
            "mean_time_to_recovery": 105400
        },
    },
    "access_token": "",
    "model": "LLAMA3p170B"
}'

Added Compiled Summary API to take data from all the above
Fetch Models
curl --location 'http://localhost:9696/ai/models'

Further comments

CLAassistant · 2024-07-30T11:03:31Z

All committers have signed the CLA.

dhruvagarwal · 2024-07-30T12:07:24Z

Great work @samad-yar-khan and @jayantbh 🎉

jayantbh · 2024-08-05T07:57:01Z

web-server/pages/api/internal/ai/dora_metrics.ts

+  const doraData = {
+    lead_time,
+    mean_time_to_recovery,
+    change_failure_rate
+  } as any;


Yeah... no. 🤣

backend/analytics_server/mhq/service/ai/ai_analytics_service.py

e-for-eshaan · 2024-08-13T09:49:53Z

web-server/src/content/DoraMetrics/AIAnalysis/AIAnalysis.tsx

+    <FlexBox
+      col
+      gap1
+      sx={{ ['[data-lastpass-icon-root]']: { display: 'none' } }}


what's this?

It's to remove the password manager from showing its autofill icon in the input field.
It's... unfortunately the recommended way to deal with it.

damn, this is interesting !

e-for-eshaan · 2024-08-13T09:53:20Z

web-server/src/content/DoraMetrics/AIAnalysis/AIAnalysis.tsx

+
+  const selectedModel = useEasyState<Model>(Model.GPT4o);
+  const token = useEasyState<string>('');
+  const selectedTab = useEasyState<string>('0');


why string?

No good reason. Hacky code.
Will fix with the enum.

Oh wait, there was a reason. Mui Tabs need the value to be a string.

e-for-eshaan · 2024-08-13T09:56:23Z

web-server/src/content/DoraMetrics/AIAnalysis/AIAnalysis.tsx

+                  <Markdown>{data.change_failure_rate_trends_summary}</Markdown>
+                </TabPanel>
+                <TabPanel
+                  value="5"


all these values can be combined into an enum:

enum MyEnum { dora_trend_summary, change_failure_rate_trends_summary, mean_time_to_recovery_trends_summary ...more... }

this means that:
MyEnum.dora_trend_summary will be equal to 0

e-for-eshaan · 2024-08-13T11:18:14Z

web-server/src/content/DoraMetrics/AIAnalysis/AIAnalysis.tsx

@@ -77,7 +79,9 @@ export const AIAnalysis = () => {

  const selectedModel = useEasyState<Model>(Model.GPT4o);
  const token = useEasyState<string>('');
-  const selectedTab = useEasyState<string>('0');
+  const selectedTab = useEasyState<string>(


could have just removed string:

const selectedTab = useEasyState<AnalysisTabs>(

no more type conversion would have been needed

samad-yar-khan changed the title ~~AI Powered DORA Report~~ [RFC]AI Powered DORA Report Jul 30, 2024

samad-yar-khan changed the title ~~[RFC]AI Powered DORA Report~~ AI Powered DORA Report Jul 30, 2024

dhruvagarwal added the enhancement New feature or request label Jul 31, 2024

jayantbh reviewed Aug 5, 2024

View reviewed changes

samad-yar-khan and others added 20 commits August 12, 2024 20:54

Add service for AI analytics

1287ef6

Add API for AI dora analytics

0e9f4e4

Add ai symmary func for lead metric trends

610d5d1

Add API for deployment frequency trends AI Summary

0bbd93b

Add API for CFR Trends AI Summary

036f8ed

Add API for MTTR Trends AI Summary

d2ff27f

Add API for DORA Trends AI Summary

0514d76

Update model fetching API route

85e8794

Update dora score prompt

17da299

Add API to fetch models

feccb7f

Add BFF API for AI DORA summary

a63a626

Update prompt for lead time and dep frequency trends for AI summary

53be714

Add API for Dora compiled summary tldr

eceb61d

Add Summary API in BFF

278c241

Add UI functionality for DORA metrics AI stuff.

459a643

Set token and response in redux state.

a720fb5

Add copy functionality to summaries.

f35829f

fix parsing issues

39f723a

fix parsing issues

9114b67

fix types

f07f4e1

samad-yar-khan force-pushed the ai-beta branch from 6391560 to f07f4e1 Compare August 13, 2024 07:30

adnanhashmi09 reviewed Aug 13, 2024

View reviewed changes

backend/analytics_server/mhq/service/ai/ai_analytics_service.py Show resolved Hide resolved

adnanhashmi09 previously approved these changes Aug 13, 2024

View reviewed changes

e-for-eshaan requested changes Aug 13, 2024

View reviewed changes

Use enums for tabs, and add graphic for empty state.

9bdad5e

jayantbh dismissed adnanhashmi09’s stale review via 9bdad5e August 13, 2024 11:07

e-for-eshaan reviewed Aug 13, 2024

View reviewed changes

e-for-eshaan approved these changes Aug 13, 2024

View reviewed changes

jayantbh merged commit c43f472 into main Aug 13, 2024
3 checks passed

jayantbh deleted the ai-beta branch August 13, 2024 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Powered DORA Report #493

AI Powered DORA Report #493

samad-yar-khan commented Jul 30, 2024 •

edited

Loading

CLAassistant commented Jul 30, 2024 •

edited

Loading

dhruvagarwal commented Jul 30, 2024

jayantbh Aug 5, 2024

e-for-eshaan Aug 13, 2024

jayantbh Aug 13, 2024

samad-yar-khan Aug 13, 2024

e-for-eshaan Aug 13, 2024

jayantbh Aug 13, 2024

jayantbh Aug 13, 2024

e-for-eshaan Aug 13, 2024

e-for-eshaan Aug 13, 2024

AI Powered DORA Report #493

AI Powered DORA Report #493

Conversation

samad-yar-khan commented Jul 30, 2024 • edited Loading

Pull Request Contents

Acceptance Criteria fulfillment

Evaluation and Results: GPT4o Vs LLAMA 3.1

Mathematical Accuracy

Data Analysis

Actionability

Summarisation

Conclusion

Future Work

Proposed changes (including videos or screenshots)

Added Services

UI Changes

Added APIs

Further comments

CLAassistant commented Jul 30, 2024 • edited Loading

dhruvagarwal commented Jul 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samad-yar-khan commented Jul 30, 2024 •

edited

Loading

CLAassistant commented Jul 30, 2024 •

edited

Loading