Sumy, Ukraine FEATURES OF THE TRANSMISSION MECHANISM OF VIRAL HEPATITIS C IN UKRAINE

Introduction/objective. The significant part of young people in the structure of hepatitis C virus (HC/HCV infection) incidence, a great deal of latent cases of this infection and the lack of specific prevention may complicate the epidemic situation regarding this infection in Ukraine in the coming years. The authors developed a mathematical model of the HC epidemiological process to determine the most significant factors in this infection transmission in the country. Materials and methods. The study is based on correlationregression analysis of the relationship between dependent (or responding) and explanatory (factorial or predictors) variables. In total, the analysis involved 3 dependent variables y1, y2, y3, corresponding to the annual number of acute and chronic HC cases and the number of HC virus seropositive individuals, and 17 predictors x1 – x17, including patients who received etiotropic treatment; patients with mental and behavioral disorders due to narcotics use, including opioids; patients with sexually transmitted infections; the number of visits to dentists; the number of patients who had dentures placed; the number of surgical operations, blood transfusions, endoscopic examinations, laboratory blood tests, hemodialysis, etc. The number of observations (n) of dependent and explanatory variables was equal to 25, which corresponds to the number of administrative-territorial units in Ukraine (24 regions and Kyiv). The quality of regression models was evaluated using multiple correlation coefficients (R), determination coefficients (R 2 ) and regression coefficients (b0, b1, b2). Statistical significance of R 2 was determined by F-statistics, regression coefficients – by standard errors (m), t-test, p-value, and the range of 95% confidence intervals (CI). To compare the degree of influence of factor variables over dependent variables in the two-factor regression model, standardized regression coefficients were calculated. The reliability of regression models was evaluated by the statistics of Durbin–Watson (DW), Breusch–Godfrey (BG) and White (W) tests. The relative risk (RR) of HC infection was retrospectively determined in the individuals from behavioral and medical risk groups. Results. In mathematical model of the epidemic process of acute HC, statistical significance was demonstrated for only one variable effect – annual number of dentist visits. The obtained regression equation was as follows:


Автор, відповідальний за листування: galnatmed@gmail.com
Introduction Viral hepatitis C (HC/HCV-infection) is one of the most burning problems in the world and in Ukraine due to the wide range of medical and social-economic consequences of its morbidity: severe course with chronic forms of the disease, cirrhosis and liver cancer, and significant cost of diagnosis and treatment [1,2,3]. The complexity of combating this infection is determined by the lack of specific prevention methods, poor availability of etiotropic treatment, high chronogenic potential of the virus and huge latent epidemiological process. In these conditions, the proportional prevalence of young people in HC incidence among the population of Ukraine [4], as well as insufficient etiotropic treatment coverage of patients with HC [5] may complicate the epidemiological situation in the coming years.
The epidemiological process of any infectious disease in the human population develops involving certain social factors. Determination of certain elements of social life that specifically influence the infectious disease spread is a prerequisite for successful management of the course of its epidemic process. These ideas determined the objective of our study.
The objective was to define the relative risk of HC in population groups with certain behavioural and medical risks; to develop a mathematical model of HC epidemic process in Ukraine in order to find out its most important determinants.
Materials and methods. The authors of the study used data from the analytical report of the Public Health Center on the incidence of acute and chronic HC in Ukraine, on seroprevalence of HC virus, on the number of patients with HC who received etiotropic treatment [4], and the data from open database of the Medical Statistics Center of the Ministry of Health of Ukraine on performance of health care institutions in different regions of Ukraine [6] for 2013-2018.
Retrospectively, according to serological studies conducted in Ukraine for prophylactic purposes, the relative risk of infection with HC pathogen was determined for different groups of population. Relative risk of infection (RR) is the ratio of the number of HC virus seropositive individuals in an exposed group to the number of seropositive individuals in an unexposed group. 95% confidence interval (CI) was calculated for RR. The relative risk of HC infection was determined in the groups of behavioural risk (narcotics users and patients with sexually transmitted diseases) and medical risk (health workers and long-term inpatients), and in comparison groups (pregnant women and donors).
In order to create a mathematical model of HC epidemic process the authors used correlation and regression analysis between dependent (or responding) and explanatory (or factorial feature, or predictor) variables that were logically related. The number of observations (n) of dependent and explanatory variables was equal to 25, which corresponds to the number of administrativeterritorial units in Ukraine (24 regions and Kyiv). All parameters had absolute dimension. The groups of parameters with names, designations, as well as arithmetic mean values (M) and standard deviations (σ) are given in Table 1.
Statistical data were processed using MS Office Excel software package.
Study procedure: 1. The long-term average annual values for dependent and explanatory features (variables) were calculated for every administrative-territorial unit of Ukraine for 2013-2018.
2. The distribution of random variables (dependent and factorial features) was tested for normality. To do this, the coefficients of skewness (A) and kurtosis (E) were calculated using the functions SKEW(number1;[number2];...) and KURT(number1;[number2];...) of MS Office Excel software package, and also the variances for these coefficients D (A), D (Е) were calculated. If the modulo coefficients of skewness and kurtosis did not exceed respectively 3 and 5 square roots of their variances, i.e. inequalities | | √ ( ) та | | √ ( ) were satisfied, then the distribution of random variables was supposed to be close to normal.
3. The matrix of Pearson correlation coefficients (r xy ) between dependent and explanatory features was obtained. The correlation strength and statistical significance of the correlation coefficients were determined. Possible grouping variants were evaluated for features of correlationregression models.
The strength of the correlation between the responding and factorial features was evaluated acording to the Chaddock scale: with the absolute value of r xy up to 0.3, the strength of the correlation was considered weak, 0.3 to 0.5it was considered moderate, 0.5 to 0.7it was considered noticeable, 0.7 to 0.9it was considered high, > 0.9it was considered very high.
The hypothesis of the significance of the Pearson correlation coefficient (r xy ) for the nsample was determined using the Student's t-test (t), which was calculated by the formula: √ √ According to the table of critical values of the Student's distribution (bilateral critical region), when the level of significance ɑ = 0.05 and the number of degrees of freedom f = n -2, the corresponding critical value of the Student's coefficient (t crit ) was found. When t > t crit , the hypothesis Н 0 : r xy = 0 was rejected and an alternative hypothesis was chosen, on the basis of which a conclusion was made about the significance of the correlation coefficient. 95% confidence interval (CI) was calculated for correlation coefficient. The calculations were based on the Fisher z-transformation. The lower (z L ) and upper (z U ) limits of the transformed 95% confidence interval for the Pearson correlation coefficient are: where ln = natural logarithm, n = sample size.
The value of the correlation coefficient for the general population calculated from the sample in 95% of cases will be in the range: , where е = Euler number (е ≈ 2.7). The correlation coefficient was considered statistically significant, if its confidence interval did not include 0. 4. Using a graph of the empirical regression line, the form of the relationship was determined between responding and factorial features.
5. The authors solved the regression equations and evaluated the coefficients of multiple correlation R (that reflect the degree of dependence of a responding feature y on all factor variables x), the coefficients of determination R 2 (that show the proportion of the responding feature variance attributable to independent variables), and regression coefficients (that show what variation y is accounted for by a unit of variation x). It was supposed that the closer the coefficient of determination to 1, the greater variability of all variables is accounted for by the model, and, consequently, the better is the quality.
The significance of the coefficient of determination was found using Fisher's F-test. The critical point was found for significance level  = 0.05 and the values of the degrees of freedom f 1 = k, f 2 = nk1, where n = number of observations, k = number of factorial criteria. Provided that F  F crit , the null hypothesis Н 0 : R 2 = 0 was rejected and the coefficient of determination was considered statistically significant ( ) With the variation of responding feature y was considered to be influenced, generally, by factorial features included in the model.
Regression coefficients were not considered statistically significant if: the confidence intervals included both positive and negative values; predicted values of the t-test were less than the critical value, and standard errors exceeded half the values of the parameters.
6. The theoretically predicted values of the responding feature were analyzed. Mean error of approximation E (average deviation of the predicted values from the actual values) was evaluated: where ̂ = the value of the responding indicator (theoretical value, calculated according to the regression equation by substituting corresponding actual values of the factors); ɛ = error that is present in the model due to the fact that the responding indicator is also affected by other factors not taken into account in the regression equation.
With approximation error of up to 10%, the precision of the regression model selection was considered high, 10 to 20%it was considered good, 20 to 50%satisfactory, more than 50%unsatisfactory.
7. Comparative analysis of variances (total variance, factor variance, residual variance) was performed and the strength of relationship between the features included in the model was evaluated.
8. Exclusion of insignificant and inclusion of additional factors were carried out followed by sections 1-6. At this stage, the algorithm of sequential selection was used (Stepwise) [7]: at each step after inclusion of a new variable in the model, significance of other variables entered earlier was tested. If the significance was not confirmed, then such variables were removed from the model. After adjusting the list of variables included in the model, another iteration of the procedure for finding a new variable was performed, which satisfied the conditions of its inclusion in the model. 9. The statistical significance of regression parameters was evaluated.
10. The reliability of the regression equation was analyzed.
Durbin-Watson test and Breusch-Godfrey test were used to detect possible distortion of standard errors and t-statistics of regression due to autocorrelation between the levels of the studied variable.
Darbin-Watson test evaluation was limited to testing of the hypothesis Н 0 of absence of autocorrelation, when the autocorrelation coefficient ρ was equal to 0 (ρ = 0). In case of rejection, the alternative hypothesis Н 1 : ρ > 0 or ρ < 0 was accepted. The Darbin-Watson test was performed as follows: where DW р = prediction Durbin-Watson test, ( ) = lag size (bias between runs of residuals of the regression model).
The critical lower (DW L ) and upper (DW U ) values of the Durbin-Watson test were found from the corresponding statistical tables at a significance level of α = 0.05 provided the number of observations n and the number of factor variables k. If DW L < DW р і DW U < DW р < 4 -DW U , Н 0 was accepted. Otherwise, Н 0 was rejected and Н 1 was accepted.
The Breusch-Godfrey test was also used to test the null hypothesis of absence of serial correlation. If this hypothesis proved to be true, then the distribution for the criterion ( ) (where = coefficient of determination of the auxiliary model, f = autoregression order) for n-number of observations was close to the chi-square distribution ( ) with fdegrees of freedom. The null hypothesis H 0 was rejected, if the predicted value ( ) exceeded the critical value at the given level of significance . In fact, a certain number of observations was omitted from the sample after developing Breusch-Godfrey auxiliary regression model with lag variables and was afterwards subtracted from the original sample size.
The heterogeneity of observations expressed in unequal (non-constant) variance of the random error of the regression model (heteroscedasticity) was supposed to lead to the inadequacy of the obtained statistical conclusions. To test the heteroscedasticity of random deviations, the White test was used to develop the dependence of the square of random deviations on all exogenous variables, their squares, and cross products. To compare the degree of influence of factor variables x over dependent variables (in the twofactor linear regression equation), standardized regression coefficients were calculated. The standardized regression coefficient (b st ) shows which part of the standard deviation σ y of the dependent variable y will change with the change of the corresponding factor x by the value of its standard deviation σ x under the constant influence of another factor included in the equation. The standardized regression coefficient (b st ) was calculated by multiplying the regression coefficient b x by the standard deviation of the variable x (σ x ) and dividing the result by the standard deviation of the variable y (σ y ). Standard deviations of factor and dependent variables are presented in Table 1.
Results. According to serological studies conducted in Ukraine, the relative risk of HC virus infection (Table 2) is highest for people who use drugs and for children born to HCV-positive mothers: respectively 6.5 and 5.4 times higher than the risk for the rest of population. The risk of HC infection in long-term inpatients and patients with sexually transmitted diseases is more than 2 times higher than that in the rest of population. At the same time, the risk of infection in healthcare workers, pregnant women and donors is 1.5, 2, and 6.8 times lower than that in the rest of population. The rationale for correlation-regression analysis to be used to study the relationship between responding and factorial features was determined through the coefficients of skewness (A), kurtosis (E) and variances of these coefficients D(A) and D(E). The obtained calculations showed that the distribution of the studied random variables in the studied statistical series is close to normal distribution ( Table 3).
Analysis of the correlation coefficients between responding and factorial variables (Table 3)   6. Noticeable and moderate direct correlations between HC incidence and indicators that reflect the therapeutic and diagnostic work in hospitals (Table 4).

Table 3 -Characteristics of random variables distribution in the statistical series of acute and chronic HC incidence and HC virus seroprevalence according to the coefficients of skewness (A), kurtosis (E), and corresponding variances (D)
In the course of successive model modification by exclusion of insignificant and inclusion of significant factors, only the indicators that describe HC epidemiological process in Ukraine best of all were kept and thus the following regression equations were obtained.
1. Regression equation for acute HC epidemiological process y 1 = 0.000021 x 5 -11.353 where y 1 = annual number of patients with acute HC; x 5 = annual number of dentist visits.
All regression coefficients were statistically significant, which was shown by the following data: There was statistically strong relationship between dependent and factorial variables: R was equal to 0.892. The value of the coefficient of determination R 2 was 0.796. Thus, the regression equation was accountable for 79.6% of the responding feature variance (incidence of acute HC), and its residual variance was 20.4%, which indicated a satisfactory approximation and adequacy of the model. The results of F-test (F = 89.9, F crit = 0.0000000021 at degrees of freedom f1 = 1 and f2 = 23) also confirmed the statistical significance of the influence of the factorial variable x (the number of dentist visits) over the number of acute HC cases.
The value of the approximation error E (44.9%) did not exceed 50%, and this characterized the model as satisfactory, which was clearly shown in Fig. 1. Interestingly, by excluding only two regions from the regression analysis -Volyn and Ivano-Frankivsk oblasts, where the predicted incidence of acute HC was, respectively, 1.67 and 3.38 times higher than the actual values of this indicator (apparently due to significant underdiagnosis of acute HC in these areas), the regression coefficient b 0 decreased to -9.8, and the approximation error decreased to 29.8%. The multiple correlation coefficient R for this model was equal to 0.92, which showed the linear nature of the relationship between responding and factorial variables. The coefficient of determination R 2 was 0.842, therefore, the regression equation was accountable for 84.2% of the responding feature variance, and the part of the residual variance of the dependent variable y was 15.8%, which indicated a satisfactory approximation and adequacy of the regression model. The statistical significance of the coefficient of determination was also confirmed by Fisher's test: F (58.62) significantly exceeded F crit (0.00000000153) with f1 = 2 and f2 = 22, and this served to reject the null hypothesis of no influence of factors x 1 and x 2 on the responding variable and an alternative hypothesis that confirms the statistical reliability of regression equation estimation was accepted.

Table 4 -Correlation coefficients between dependent (responding) and explanatory (factorial) indicators and corresponding 95% confidence intervals
The predicted value of the approximation error E for this model was equal to 26.9% and indicated its satisfactory adequacy, which was shown in Fig. 2

to the number of patients with sexually transmitted diseases and number of hematological and biochemical lab tests) incidence of infection in different administrativeterritorial units of Ukraine
White test statistics: R aux = 0.148; W = 0.548; χ 2 (ɑ=0.05;f=5) = 11.07; W < χ 2 , that is the model was homoscedastic.
When analyzing the obtained equation of twofactor linear regression, the question was: which of the two factors -x4 or x15had the greatest influence on the dependent variable y 3 ? In order to answer this question, the authors conducted a procedure of standardization for regression coefficients according to the above-mentioned method. The results showed that the standardized regression coefficient (b st ) for the variable x 15 was twice as high as the standardized regression coefficient for the variable x 4 (0.7 vs 0.3), which indicated that the artificial route of infection associated with blood sampling in laboratories had a more pronounced effect on the epidemic process of HC compared to sexually transmitted infection.
When modeling chronic HC epidemic process, no statistically significant regression coefficients were found for any of the factors influencing the incidence: confidence intervals included both positive and negative values, the predicted t-test value was less than critical value, and standard errors exceeded half of the parameters. In our opinion, this is the result of underreporting of chronic HC cases in Ukraine.
Discussion. In mathematical model of the epidemic process of acute HC, the authors demonstrated statistical significance of the effect of only one variableannual number of dentist visits. The possibility of being infected with parenteral hepatitis in dental offices seems quite expectable, although there is very little direct evidence of this in the literature. In 2013, the US Center for Disease Control and Prevention (CDC) for the first time presented a documented report on patient-to-patient transmission of HC in the dental office [8]; infection was confirmed by molecular epidemiology methods. Somewhat earlier, in 2007, molecularepidemiological evidence of hepatitis B virus infection of a patient during dental surgery was presented [9]. The source of infection in the latter case was another patient operated on in this office 161 minutes earlier. The details of the viral hepatitis transmission mechanism in both cases were not clarified. Meanwhile, inadequate disinfection of high-speed dental handpieces is considered a potential threat in modern dental practice in the context of HC spread [10] Oblast their safe handling include rinsing with water mechanical cleaning with detergents, and sterilization; however, in many cases only one stage of treatment is useddisinfection (at best with the use of alcohol-containing substances) without cleaning the instrument after each patient. In our opinion, the risk of cross-infection may also be related to the effect of aspiration at the time of cessation of air supply to the air-turbine handpiece, which has no option to prevent back suction: the pressure in the air duct becomes lower than in the turbine area and biological fluids are sucked into the air duct, contaminating not only the turbine handpiece, but also the coupling, tube and cleaning unit. This can lead to cross-infection, even if the turbine handpiece is replaced after each patient. According to serological studies conducted in Ukraine [4], about 4% of the population have antibodies to HC virus. Indicators of HC virus seroprevalence show that transmission of the virus in the population occurs much more often as compared to the data on the incidence of acute and chronic HC, so the regression model of the epidemic process of HC, which is based on annual number of HC virus seropositive persons, reflects the basic patterns of its hidden component.
When developing a model of the epidemic process of acute HC taking into account the annual number of seropositive individuals, statistical significance was demonstrated only for two variables: annual number of the sexually transmitted infections and annual number of laboratory blood tests. Only a few reports of viral hepatitis C and B were found in the available literature, the transmission of which had been associated with clinical laboratories. For example, the risk of being infected with viral hepatitis C and B for laboratory staff was estimated to be about 10 times higher than that for the rest of the population [11] and almost 3 times higher than for other hospital staff [12]. In the context of our results, the unusual outbreak of hepatitis B virus described in 1974 among the staff of a clinical diagnostic laboratory, who were involved in the processing of laboratory computer charts [13], is of great interest. According to this study, the determinant and statistically valid factor of the outbreak in the lab staff was represented by obvious and hidden skin lesions, which served as the gateway to the virus during the direct contact with potentially blood-contaminated objects. In our opinion, a similar mechanism of pathogen transmission from hands of a laboratory assistant to the scarification wound of the patient may occur during blood collection, if the laboratory assistant uses rubber gloves repeatedly in terms of glove shortage and significant flow of patients.
It seems quite logical that these factors (the annual number of sexually transmitted infections and the annual number of laboratory blood tests) did not fall into the first model describing the epidemiological process of acute HC. Obviously, when the virus gets into the scarification wound from the laboratory gloves during blood sampling or during sexual intercourse, a small number of HC pathogens enter the human body, so the infectious process will not be manifested by pronounced clinical symptoms.
The study showed statistically insignificant influence of narcotics use, including opioids, over the mechanism of HC transmission in Ukraine (r xy is equal relatively to: 0.354, 95% CІ [-0.047; 0.657] and 0.341, 95% CІ [-0.062; 0.648]), whereas the relative risk of being infected for this group is quite significant (RR = 6.5; 95% CІ [6.39; 6.63]). In our opinion, this may be a consequence of relative social isolation of this category of the population. Poor socialization of drug users makes unlikely active sexual intercourse with people outside the subculture of drug addicts, and is accompanied by restrictions in the use of medical services, including dental services.

Conclusions
Аccording to our data, at least 84% of HC virus infection cases in Ukraine occur through sexual contact and during laboratory blood sampling, and the role of the latter route of transmission in the HC virus spread was even more significant (standardized regression coefficients are 0.3 and 0.7, respectively).
Almost 80% of acute HC cases are associated with dental interventions.
Etiotropic treatment of patients with HC at the current level of treatment coverage can reduce the incidence of complications and the risk of death, but it is ineffective as a measure of influence on the first stage of the epidemiological process (source of infection).
Drug users have little effect on the intensity of the HC epidemiological process in Ukraine as a whole, despite the fact that the relative risk of HC among this population is quite significant (RR = 6.5; 95% CI [6.39; 6.63]).