Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

AI Powered DORA Report #493

Merged
merged 21 commits into from
Aug 13, 2024
Merged

AI Powered DORA Report #493

merged 21 commits into from
Aug 13, 2024

Conversation

samad-yar-khan
Copy link
Contributor

@samad-yar-khan samad-yar-khan commented Jul 30, 2024

Pull Request Contents

  1. Acceptance Criteria fulfillment
  2. Evaluation and Results: GPT4o Vs LLAMA 3.1
  3. Conclusion
  4. Future Work
  5. Proposed changes (including videos or screenshots)

Quick Look at how it works 👀

Screen.Recording.2024-07-30.at.4.34.33.PM.mov

Acceptance Criteria fulfillment

  • Added Service for extensible model support through FireworksAI and OpenAi keys.
  • Added API to generate dora score based on 4 key data.
  • Added APIs for AI summary for all 4 key trends.
  • Added API to for compiled summarise all the data.
  • Added UI User to add token, select a model and generate AI summary.
  • Enable users to copy summary.

Evaluation and Results: GPT4o Vs LLAMA 3.1

We did the DORA AI analysis for July on the following open-source repositories: facebook/react, middlewarehq/middlware, meta-llama/llama and facebookresearch/dora.

Mathematical Accuracy

  • Middleware generated a DORA Performance Score for the team based on this guide by dora.dev
  • To test out the computational accuracy of the model we provide it with the four key metrics and prompt the LLM to generate a DORA Score and compare the results with Middleware.
  • The four keys was a JSON of the format.
    {
        "lead_time": 4000,
        "mean_time_to_recovery": 200000,
        "change_failure_rate": 20,
        "weekly_deployment_frequency": 2
    }
  • The Actual Dora Score for the repositories was around 5. While OpenAi’s GPT4o was able to predict the score to be 4-5 most of the times, LLAMA 3.1 405B a margin away.

DORA Metrics score: 5/10
Image description

GPT 4o with DORA score 5/10
Image description

LLAMA 3.1 with DORA Score 8/10 (incorrect)
Image description

GPT 4o DORA Score was closer to the actual DORA score than LLAMA 3.1 in 9/10 cases, hence GPT4o was more accurate compared to LLAMA 3.1 in this scenario.

Data Analysis

  • The trend data for the four keys dora metrics, calculated by Middleware, was fed to the LLMs as input along with different experimental prompts to ensure a concrete data analysis.

  • The trend data is usually a JSON object with date strings as keys, representing weeks' start dates mapped to the metric data.

    	{
    	   "2024-01-01": {
                 ...
             },
             "2024-01-08": {
                 ...
             }
    	}
    
  • Mapping Data: Both the models were at par at extracting data from the JSON and inferring the data in the correct manner. Example: Both GPT and LLAMA were able to map the correct data to the input weeks without errors or hallucinations.

    Deployment Trends Summarised: GPT4o
    Image description

    Deployment Trends Summarised: LLAMA 3.1 405B
    Image description

  • Extracting Inferences: Both the models were able to derive solid inferences from data.

    • LLAMA 3.1 identified week with maximum lead time along with the reason for the high lead time.Image description

    • This inference could be verified by the Middleware Trend Charts.Image description

    • GPT4o was also able to extract the week with the maximum lead time and the reason too, which was, high first-response time.Image description

  • Data Presentation: Data representation has been a hit or miss with LLMs. There are cases where GPT performs better at data presentation but lacks behind LLAMA 3.1 in accuracy and there have been cases like the DORA score where GPT was able to do the math better.

    • LLAMA and GPT were both given the lead time value in seconds. LLAMA was able to round off the data closer to the actual value of 16.99 days while GPT rounded off the data to 17 days 2 hours but presented the data in a more detailed format.

      GPT4oImage description

      LLAMA 3.1 405BImage description

Actionability

  • The models were able to output similar actionables for improving teams' efficiency based on all the metrics.
  • Example: Both the models identified the reason for high lead-time to be first-response time and suggested the team to use an alerting tool to avoid delayed PR Reviews. The models also suggested better planning to avoid rework where rework was high in a certain week.

GPT4oImage description

LLAMA 3.1 405BImage description

Summarisation

To test out the summarisation capabilities of the models we asked the model to summarise each metric trend individually and then feed the output results for all the trends back into the LLMs to get a summary or in Internet's slang DORA TLDR for the team.

The summarisation capability of large data is similar in both the LLMs.

LLAMA 3.1 405B
Image description

GPT4o
Image description

Conclusion

For a long time LLAMA was trying to catch up with GPT in terms of data processing and analytical abilities. Our earlier experimentation with older LLAMA models led us to believe that GPT is way ahead, but the recent LLAMA 3.1 405B model is at par with the GPT4o.

If you value data privacy of your customers and want to try out the open-source LLAMA 3.1 models instead of GPT4, go ahead! There will be negligible difference in performance and you will be able to ensure data privacy if you use self hosted models. Open-Source LLMs have finally started to compete with all the closed-source competitors.

Both LLAMA 3.1 and GPT4o are super capable of deriving inferences from processed data and making Middleware’s DORA metrics more actionable and digestible for engineering leaders, leading to more efficient teams.

Future Work

This was an experiment to build an AI powered DORA solution and in the future we will be focusing on adding greater support for self hosted or locally running LLMs from Middleware. Enhanced support for AI powered action-plans throughout the product using self hosted LLMs, while ensuring data privacy, will be our goal for the coming months.

Proposed changes (including videos or screenshots)

Added Services

  • Added AIAnalyticsService to allow summarising and inference of DORA data based on different models.
    • We use Fireworks AI APIs for Large langauge models.
    • We use OpenAI APIs for GPT models.

UI Changes

  • Added UI for choosing and generating summary.
  • Users can copy the generated summary in markdown.
Screen.Recording.2024-07-30.at.4.34.33.PM.mov

Added APIs

  • Dora Score
 curl --location 'http://localhost:9696/ai/dora_score' \
--header 'Content-Type: application/json' \
--data '{
    "data": {
        "lead_time": 4000,
        "mean_time_to_recovery": 200000,
        "change_failure_rate": 20,
        "avg_daily_deployment_frequency": 2
    },
    
    
    "access_token": "",
    "model": ""
}'

  • Lead Time Trend Summary
curl --location 'http://localhost:9696/ai/lead_time_trends' \
--header 'Content-Type: application/json' \
--data '{
    "data": {
        "2024-01-01": {
            "first_commit_to_open": 25,
            "first_response_time": 25,
            "lead_time": 274,
            "merge_time": 224,
            "merge_to_deploy": 0,
            "pr_count": 1,
            "rework_time": 0
        },
        "2024-01-08": {
            "first_commit_to_open": 20.8,
            "first_response_time": 67173,
            "lead_time": 86981.5,
            "merge_time": 3759.2,
            "merge_to_deploy": 0,
            "pr_count": 10,
            "rework_time": 16028.5
        }
    },
    "access_token": "",
    "model": ""
}'
  • Deployment Trends Summary
curl --location 'http://localhost:9696/ai/deployment_frequency_trends' \
--header 'Content-Type: application/json' \
--data '{
    "data": {
        "2024-01-01": 2,
        "2024-01-08": 24,
        "2024-01-15": 15
    },
    "access_token": "",
    "model": ""
}'
  • CFR Summary
curl --location 'http://localhost:9696/ai/change_failure_rate_trends' \
--header 'Content-Type: application/json' \
--data '{
    "data": {
        "2024-01-01": {
            "change_failure_rate": 10,
            "failed_deployments": 10,
            "total_deployments": 100
        },
        "2024-01-08": {
            "change_failure_rate": 20,
            "failed_deployments": 30,
            "total_deployments": 150
        },
    },
    "access_token": "",
    "model": "LLAMA3p170B"
}'

  • MTTR Summary
curl --location 'http://localhost:9696/ai/mean_time_to_recovery_trends' \
--header 'Content-Type: application/json' \
--data '{
    "data": {
        "2024-01-01": {
            "incident_count": 25,
            "mean_time_to_recovery": 32400
        },
        "2024-01-08": {
            "incident_count": 32,
            "mean_time_to_recovery": 43200
        },
    },
    "access_token": "",
    "model": "LLAMA3p170B"
}'
  • DORA Trend Correlation and Summary
curl --location 'http://localhost:9696/ai/dora_trends' \
--header 'Content-Type: application/json' \
--data '{
    "data": {
        "2024-01-01": {
            "lead_time": 141220,
            "deployment_frequency": 32,
            "change_failure_rate": 15,
            "mean_time_to_recovery": 86400
        },
        "2024-01-08": {
            "lead_time": 203242,
            "deployment_frequency": 25,
            "change_failure_rate": 18,
            "mean_time_to_recovery": 105400
        },
    },
    "access_token": "",
    "model": "LLAMA3p170B"
}'

  • Added Compiled Summary API to take data from all the above

  • Fetch Models
    curl --location 'http://localhost:9696/ai/models'

Further comments

@CLAassistant
Copy link

CLAassistant commented Jul 30, 2024

CLA assistant check
All committers have signed the CLA.

@samad-yar-khan samad-yar-khan changed the title AI Powered DORA Report [RFC]AI Powered DORA Report Jul 30, 2024
@samad-yar-khan samad-yar-khan changed the title [RFC]AI Powered DORA Report AI Powered DORA Report Jul 30, 2024
@dhruvagarwal
Copy link
Member

Great work @samad-yar-khan and @jayantbh 🎉

@dhruvagarwal dhruvagarwal added the enhancement New feature or request label Jul 31, 2024
Comment on lines 84 to 88
const doraData = {
lead_time,
mean_time_to_recovery,
change_failure_rate
} as any;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... no. 🤣

adnanhashmi09
adnanhashmi09 previously approved these changes Aug 13, 2024
<FlexBox
col
gap1
sx={{ ['[data-lastpass-icon-root]']: { display: 'none' } }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to remove the password manager from showing its autofill icon in the input field.
It's... unfortunately the recommended way to deal with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn, this is interesting !


const selectedModel = useEasyState<Model>(Model.GPT4o);
const token = useEasyState<string>('');
const selectedTab = useEasyState<string>('0');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good reason. Hacky code.
Will fix with the enum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, there was a reason. Mui Tabs need the value to be a string.

<Markdown>{data.change_failure_rate_trends_summary}</Markdown>
</TabPanel>
<TabPanel
value="5"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these values can be combined into an enum:

enum MyEnum {
dora_trend_summary,
change_failure_rate_trends_summary,
mean_time_to_recovery_trends_summary
...more...
}

this means that:
MyEnum.dora_trend_summary will be equal to 0

@@ -77,7 +79,9 @@ export const AIAnalysis = () => {

const selectedModel = useEasyState<Model>(Model.GPT4o);
const token = useEasyState<string>('');
const selectedTab = useEasyState<string>('0');
const selectedTab = useEasyState<string>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could have just removed string:

const selectedTab = useEasyState<AnalysisTabs>(

no more type conversion would have been needed

@jayantbh jayantbh merged commit c43f472 into main Aug 13, 2024
3 checks passed
@jayantbh jayantbh deleted the ai-beta branch August 13, 2024 11:22
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants