Skip to content

Sde203 build experience

Key Topics

  1. Container (docker)
  2. Testing (Unit Test, Integtaion Test, E2E Test, Load Test)
  3. Runbook - https://www.pagerduty.com/resources/learn/what-is-a-runbook/

Model Bias Wape

Lane OD pair Predicted price Actual price
1 LAX to JFK $400 $420
2 ORD to SFO $600 $550
3 DFW to LGA $700 $750
\[\text{WAPE} = \left(\frac{\sum\limits_{i=1}^{n}|A_i-P_i|}{\sum\limits_{i=1}^{n}A_i}\right) \times 10 = \frac{|20|+|-50|+|50|}{420+550+750} = 0.06977\]
\[\text{Bias (%)} = \left(\frac{\sum\limits_{i=1}^{n}(P_i-A_i)}{\sum\limits_{i=1}^{n}A_i}\right) \times 100 = \frac{-20+50-50}{420+550+750} = 0.01163\]

Eve: Hey Adam, do you know how to evaluate the accuracy of a linear regression model?

Adam: Yeah, one way is to use metrics like WAPE and bias in percentage.

Eve: WAPE? What's that?

Adam: WAPE stands for Weighted Absolute Percentage Error. It's a metric used to measure the accuracy of a model's predictions in terms of percentage error.

Eve: Oh, I see. Can you explain how to calculate it?

Adam: Sure. You take the absolute percentage error for each prediction and then take the weighted average of those errors using the actual values as the weights.

Eve: That makes sense. And what about bias in percentage?

Adam: Bias in percentage is another metric used to evaluate the accuracy of a model. It measures the average deviation of the predicted values from the actual values as a percentage of the average actual value.

Eve: Interesting. How do you calculate bias in percentage?

Adam: You take the difference between the average predicted value and the average actual value, divide it by the average actual value, and then multiply by 100 to express it as a percentage.

Eve: Thanks for explaining that, Adam. I'll keep those metrics in mind when evaluating linear regression models in the future.

Python memory_profiler

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
from functools import wraps
from memory_profiler import profile
import os

def profile_if_env(name, default=False):
    def decorator(func):
        enabled = os.environ.get(name, str(default)).lower() == 'true'
        if enabled:
            return profile(func)
        else:
            return func
    return decorator

class MyClass:
    @profile_if_env('ENABLE_MEMORY_PROFILING')
    def my_func(self):
        # code to profile here

obj = MyClass()
obj.my_func()

Python pandas CSV/TSV

Read large csv from s3 use smart_open

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import pandas as pd
import boto3
import smart_open

def lambda_handler(event, context):
    # Set up S3 connection
    s3 = boto3.client('s3')

    # Set up Smart_open connection to read S3 file
    file_obj = smart_open.smart_open('s3://my-bucket/my-large-file.csv')

    # Read CSV file using Pandas
    df = pd.read_csv(file_obj)

    # Do something with the DataFrame (e.g. filter, aggregate, etc.)
    ...

    # Return the results (e.g. as a JSON object)
    return {
        'result': df.to_json(orient='records')
    }

Read json from env and handle logic as pandas dataframe

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import pandas as pd
import os

def lambda_handler(event, context):
    # Get JSON string from environment variable
    json_str = os.environ.get('MY_JSON_STRING')

    # Load JSON string into pandas DataFrame
    df = pd.read_json(json_str)

    # Transform DataFrame as needed (e.g. filter rows, add columns, etc.)

    # Convert DataFrame back to JSON string
    json_output = df.to_json(orient="records")

    # Return JSON string (or any other output you need)
    return {
        'statusCode': 200,
        'body': json_output
    }