Analysis

Routines for analyzing output data.

class panqec.analysis.Analysis(results: str | ~typing.List[str] = [], verbose: bool = False, overrides: str | dict | None = None, progress: ~typing.Callable | None = <function identity>)

Analysis on large collections of results files.

This is the preferred method because it does not require reading the input files since the input parameters are saved to the results files anyway. It also does not create Simulation objects for every data point, which could be slow.

Parameters:

results (Union[str, List[str]]) – Path to directory or .zip containing .json.gz or .json results. Can also accept list of paths.
overrides (Optional[Union[str, dict]]) – Path to json file that gives specifications on what to override, for instance when to truncate. Can also be a dictionary with the same contents as a json file. As well as replacements values for known analytical threshold values. See scripts/overrides.json for an example.
verbose (bool) – If True, logs will be printed at various steps of the analysis.

results

Results for each set of (code, error model and decoder).

Type:: pd.DataFrame

thresholds

Thresholds for each (code family, error_model and decoder).

Type:: pd.DataFrame

trunc_results

Truncated results that were used for threshold calculation.

Type:: pd.DataFrames

sectors

Dict containing thresholds and trunc_results DataFrames for each sector.

Type:: Dict[str, Any]

__init__(results: str | ~typing.List[str] = [], verbose: bool = False, overrides: str | dict | None = None, progress: ~typing.Callable | None = <function identity>)

aggregate(): Aggregate the raw data into results attribute.

apply_overrides(): Read manual overrides from .json file.

assign_labels(): Assign labels to each entry for filtering.

calculate_min_thresholds()

Use the minimum thresholds for min_thresholds DataFrame attribute.

Go through all the sector thresholds in the sectors attribute and take the minimum for each parameter set.

calculate_sector_thresholds()

Calculate thresholds of each single-qubit logical error type.

When thresholds cannot be calculated, at least check whether we are above or below threshold by giving upper or lower bounds on the threshold.

calculate_single_qubit_error_rates()

Add single error rate estimates and uncertainties to results.

Adds ‘single_qubit_p_est’ and ‘single_qubit_p_se’ as array-valued columns to the results attribute of this class.

Each entry is an array of shape (k, 4), with the i-th row corresponding to the value for i-th logical qubit and the column 0 containing values for the total error rate, column 1 for X, column 2 for Y, and column 3 for Y errors.

Note that it is not checked whether or not the code is in the code space.

calculate_thresholds(ftol_est: float = 1e-05, ftol_std: float = 1e-05, maxfev: int = 2000, p_est: str = 'p_est', n_trials_label: str = 'n_trials', n_fail_label: str = 'n_fail', sector: str | None = None, autotruncate: bool = False)

Extract thresholds using heuristics or manual override limits.

Records thresholds in the sectors attribute for the given sector. If sector is None, then the total logical error rate is used to calculate the thresholds and the threshold information is saved to the sectors[‘total’] part.

Parameters:

results_df (pd.DataFrame) – The results for each (code, error_model, decoder). Should have at least the columns: ‘code’, ‘error_model’, ‘decoder’, ‘n’, ‘k’, ‘d’, ‘n_fail’, ‘n_trials’. If the logical_type keyword argument is given, then then either ‘p_0_est’ and ‘p_est_word’ should be columns too.
ftol_est (float) – Tolerance for the best fit.
ftol_std (float) – Tolerance for the bootstrap fits.
maxfev (int) – Maximum number of iterations for the curve fitting.
logical_type (str) – Pick from ‘total’, ‘single’, or ‘word’, which will take p_est to be ‘p_est’, ‘p_0_est’, ‘p_est_word’ respectively. This is used to adjust which error rate is used as ‘the’ logical error rate for purposes of extracting thresholds with finite-size scaling.
n_fail_label (str) – The column that is ‘n_fail’.
sector (Optional[str]) – The Pauli sector use to calculate the threshold for. Will use overall error if None given.
autotruncate (bool) – Automatically truncate the p range so that only the regime where finite-size scaling is likely to work is used for the fitting. It uses a heuristic algorithm that may or may not be appropriate for the situation. By default it is set false so all data is used unless manually specified otherwise.

calculate_total_error_rates()

Calculate the total error rate.

And add it as a column to the results attribute of this class.

calculate_word_error_rates()

Calculate the word error rate using the formula.

The formula assumes a uniform error rate across all logical qubits.

count_files(): Count how many files were found.

export_latex_tables(): Print out LaTeX tables of thresholds.

find_files(): Find where the results files are.

get_fit_status(entry: Dict[str, Any]) → str

Status of the fit. ‘success’ if successful, comment if not.

Parameters:: entry (Dict[str, Any]) –
Returns:: status – ‘success’ of the fit succeeded and is valid otherwise a reason for why it is not a success is given.
Return type:: str

get_quality_metrics()

Table of quality metrics of data used for analysis.

Returns:: quality – Summary of data quality metric for each input family as index, in particular the minimum number of trials for any data point in the input family and the number of error_rate values for that input family that actually got used in the analysis.
Return type:: pd.DataFrame

load(path): Load previous analysis from .json.gz file.

log(message: str)

Display a message, if the verbose attribute is True.

Parameters:: message (str) – Message to be displayed.

make_collapse_plots(pdf=None, sector=None): Make all the collapse plots.

make_plots(plot_dir: str, include_date=True): Make and display the plots while saving to directory.

make_threshold_vs_bias_plots(pdf=None): Make and save threshold vs bias plots.

plot_sector_collapse(code_family: str, error_model: str, decoder: str, sector: str | None = None)

Plot the data collapse for a given parameter set for both sectors.

Parameters:

code_family (str) – The code family.
error_model (str) – The error model label.
decoder (str) – The decoder label.

plot_threshold_vs_bias(code: str, inf_bias_replacement: float = 1000.0, hashing: bool = True)

Plot the threshold vs bias plots.

Parameters:

code (str) – The code family to plot threshold vs bias for.
inf_bias_replacement (float) – The fintite value of bias at which to plot the infinite bias points at.
hashing (bool) – Will also plot hashing bound if True.

plot_thresholds(sector=None, pdf=None, show=False, include_threshold_estimate=True, include_main_title=True, include_sector_title=True): Make all the threshold plots.

read_files(progress=<function identity>): Read raw data from the files that were found.

reorder_columns(): Reorder the columns of the results dataframe for a more relevant display when printing it

replace_threshold(replacement): Format override replace specification for threshold df.

save(path): Save analysis to a .json.gz file.

panqec.analysis.count_fails(effective_error: ndarray, codespace: ndarray, sector: str) → int

Count the number of sector fails given effective errors as BSFs.

Parmeters

effective_errornp.ndarray: Size (2*k, n_trials) array of ints where each row is a binary symplectic form representing the effective logical error on the codespace of one trial.
codespacenp.ndarray: Size (n_trials,) array of booleans for each trial, where True denotes the the decoding successfully returned the state to the code space for that trial, while False denotes final state was not in the code space.
sector: str: The sector whose errors are to be counted, either ‘X’ or ‘Z’.

panqec.analysis.deduce_bias(error_model: dict, rtol: float = 0.1) → str | float | int

Deduce the eta ratio from the noise model label.

Parameters:

noise_model (str) – The noise model.
rtol (float) – Relative tolerance to consider rounding eta value to int.

Returns:

eta – The eta value. If it’s infinite then the string ‘inf’ is returned.

Return type:

Union[str, float, int]

panqec.analysis.draw_tick_symbol(plt, Line2D, log=False, axis='x', tick_height=0.03, tick_width=0.1, tick_location=2.5, axis_offset=0): Draw a section cut tick symbol on the x axis.

panqec.analysis.fit_fss_params(df_filt: DataFrame, p_left_val: float, p_right_val: float, p_nearest: float | None = None, n_bs: int = 100, ftol_est: float = 1e-05, ftol_std: float = 1e-05, maxfev: int = 2000, p_est: str = 'p_est', n_trials_label: str = 'n_trials', n_fail_label: str = 'n_fail', resample_points: bool = True) → Tuple[ndarray, ndarray, DataFrame]

Get optimized parameters and data table tweaked with heuristics.

Parameters:

df_filt (pd.DataFrame) – Results with columns: ‘error_rate’, ‘code’, p_est, n_trials_label, n_fail_label. The ‘error_rate’ column is the physical error rate p. The ‘code’ column is the code label. The p_est column is the logical error rate.
p_left_val (float) – The left left value of ‘error_rate’ to truncate.
p_right_val (float) – The left right value of ‘error_rate’ to truncate.
p_nearest (float) – The nearest value of ‘error_rate’ to what was previously roughly estimated to be the threshold.
n_bs (int) – The number of bootstrap samples to take.
ftol_est (float) – Tolerance for the best fit.
ftol_std (float) – Tolerance for the bootstrapped fits.
maxfev (int) – Maximum iterations for curve fitting optimizer.
p_est (str) – Label for the logical error rate to use.
n_trials_label (str) – Column label to use for the number of trials.
n_fail_label (str) – Column label to use for the number of logical fails.

Returns:

params_opt (np.ndarray) – Array of optimized parameters
params_bs (np.ndarray) – Array with each row being arrays of optimized parameters for each bootstrap resample.
df_trunc (pd.DataFrame) – The truncated DataFrame used for performing the curve fitting.

panqec.analysis.fit_function(x_data, *params): Quadratic fit function for finite-size scaling.

panqec.analysis.fit_subthreshold_scaling_cubic(results_df, order=3, ansatz='poly'): Get fit parameters for subthreshold scaling ansatz.

panqec.analysis.get_bias_ratios(noise_direction)

Get the bias ratios in each direction given the noise direction.

Parameters:

noise_direction ((float, float, float)) – The (r_x, r_y, r_z) parameters of the Pauli channel.

Returns:

eta_x (float) – The X bias ratio.
eta_y (float) – The Y bias ratio.
eta_z (float) – The Z bias ratio.

panqec.analysis.get_code_df(results_df: DataFrame) → DataFrame

DataFrame of all codes available.

Parameters:: results_df (pd.DataFrame) – Results table with columns ‘code’, ‘n’, ‘k’, ‘d’, plus possibly other columns.
Returns:: code_df – DataFrame with only only [‘code’, ‘n’, ‘k’, ‘d’] as columns, with no duplicates.
Return type:: pd.DataFrame

panqec.analysis.get_deformation(error_model_label)

Get the deformation given the error model label.

Parameters:: error_model_label (str) – The label specifying how the error model object is intialized.
Returns:: deformation – The name of the deformation.
Return type:: str

Examples

>>> error_model_label = "PauliErrorModel()"
>>> get_deformation(error_model_label)
'undeformed'

>>> error_model_label = "PauliErrorModel(r_x=0.00495, r_y=0.00495, "             "r_z=0.990099, deformation_name=None, deformation_kwargs={})"
>>> get_deformation(error_model_label)
'undeformed'

>>> error_model_label = "PauliErrorModel(r_x=0.333333, r_y=0.333333, "             "r_z=0.333333, deformation_name='XZZX', deformation_kwargs={})"
>>> get_deformation(error_model_label)
'XZZX'

>>> error_model_label = "PauliErrorModel(r_x=0, r_y=1, r_z=0, "             "deformation_name='XY')"
>>> get_deformation(error_model_label)
'XY'

panqec.analysis.get_fit_params(p_list: ndarray, d_list: ndarray, f_list: ndarray, params_0: ndarray | List | None = None, ftol: float = 1e-05, maxfev: int = 2000) → ndarray

Get fitting params.

Parameters:

p_list (np.ndarray) – List of physical error rates.
d_list (np.ndarray) – List of code distances.
f_list (np.ndarray) – List of logical error rates.
params_0 (Optional[Union[np.ndarray, List]]) – Hint parameters for the optimizer about where to start minimizing cost function.
ftol (float) – Tolerance for the optimizer.
maxfev (int) – Maximum number of iterations for the optimizer.

Returns:

params_opt – The optimized parameters that fits the data best.

Return type:

Tuple[float]

panqec.analysis.get_p_th_nearest(df_filt: DataFrame, p_est: str = 'p_est') → float

Estimate which p in the results is nearest to the threshold.

This is very very rough heuristic by going along each value of the physical error rate p in the plots of p_est vs p for different code sizes and seeing when the order of the lines changes. The point where it starts to change is deemed p_th_nearest. This is super rough, so a more refined method such as finite-sized scaling will be required to make it more precise and put uncertainties around it.

Parameters:

df_filt (pd.DataFrame) – Results with columns: ‘error_rate’, ‘code’, p_est. The ‘error_rate’ column is the physical error rate p. The ‘code’ column is the code label. The p_est column is the logical error rate.
p_est (str) – The column in the df_filt DataFrame that is to be used as the logical error rate to estimate the threshold.

Returns:

p_th_nearest – The value in the error_rate that is apparently the closes to the threshold.

Return type:

float

panqec.analysis.get_p_th_sd_interp(df_filt: DataFrame, p_nearest: float | None = None, p_est: str = 'p_est') → Tuple[float, float, float]

Estimate threshold by where SD of p_est is local min.

This is a very coarse heuristic to estimate roughly where the crossover point is, if there is one and the left and right limits where one should truncate the data. This can be used as a starting point for something more precise, finite-size scaling. The rationale is that away from the crossing point, the lines of p_est vs p should get wider and wider vertically, but if they start getting narrower again, then that’s a sign that we’re moving to a different regime and finite-size scaling is not likely to work, so we should throw away those data points.

Parameters:

df_filt (pd.DataFrame) – Results with columns: ‘error_rate’, ‘code’, p_est. The ‘error_rate’ column is the physical error rate p. The ‘code’ column is the code label. The p_est column is the logical error rate.
p_nearest (Optional[float]) – A hint for the nearest ‘error_rate’ value that is close to the threshold, to be used as a starting point for searching.
p_est (str) – The column in the df_filt DataFrame that is to be used as the logical error rate to estimate the threshold.

Returns:

p_crossover (float) – The apparent crossover point where the plot of p_est vs p for each code becomes narrowest vertically.
p_left (float) – The left limit where when moving left from p_crossover, the spread of the p_est vs p for all the codes becomes widest. Data to the left of this point is recommended to be truncated out.
p_right (float) – The right limit where when moving right from p_crossover, the spread of the p_est vs p for all the codes becomes widest. Data to the right of this point is recommended to be truncated out.

panqec.analysis.get_single_qubit_error_rate(effective_error_list: List[List[int]] | ndarray, i: int = 0, error_type: str | None = None) → Tuple[float, float]

Estimate single-qubit error rate of i-th qubit and its standard error.

This is the probability of getting an error on the i-th logical qubit, marginalized over the other logical qubits.

Parameters:

effective_error_list – List of many logical effective errors produced by simulation, each of which is given in bsf format.
n (int) – The index of the logical qubit on which estimation is to be done.
error_type – Type of Pauli error to calculate error for, i.e. ‘X’, ‘Y’ or ‘Z’ If None is given, then rate for any error is estimated.

Returns:

p_i_est (float) – Single-qubit error rate estimator.
p_i_se (float) – Standard error for the estimator.

panqec.analysis.get_standard_error(estimator, n_samples)

Get the standard error of mean estimator.

Parameters:

estimator (float) – Number of hits divided by number of samples.
n_samples (int) – Number of samples.

Returns:

standard_error – The standard error taken as the standard deviation of a Beta distribution.

Return type:

float

panqec.analysis.get_word_error_rate(p_est, p_se, k) → Tuple

Calculate the word error rate and its standard error.

Parameters:

p_est (Numerical) – Value or array of estimated logical error rate.
p_se (Numerical) – Value or array of standard error on logical error rate.
k (Numerical) – Number of logical qubits, as value or array.

Returns:

p_est_word (Numerical) – Value or array of estimated word error rate.
p_se_word (Numerical) – Value or array of standard error of word error rate.

panqec.analysis.infer_error_model_family(label: str) → str

Infer the error_model family from the error_model label.

Parameters:: label (str) – The error_model label.
Returns:: family – The error_model family. If the family cannot be inferred, the original code label is returned.
Return type:: str

Examples

>>> infer_error_model_family('Deformed XZZX Pauli X0.0161Y0.0161Z0.9677')
'Deformed XZZX Pauli'
>>> infer_error_model_family('Pauli X0.0000Y0.0000Z1.0000')
'Pauli'

panqec.analysis.read_entry(data: List | Dict, results_file: str | None = None) → List[Dict]

List of entries from data in list or dict format.

Returns an empty list if it is not a valid results dict.

Parameters:

data (Union[List, Dict]) – The data that is parsed raw from a results file.
results_path (str) – The path to the results directory or .zip file.

Returns:

entries – The list of entries, each of which corresponds to a results file.

Return type:

List[Dict]

panqec.analysis.subthreshold_scaling(results_df, chosen_probabilities=None)

Do subthreshold scaling analysis.

This was a legacy method where we tried many different fitting ansatzs.