EPG translation script - an open project - help required.

**jenseneverest** · Oct 28th 2024

With thousands of channels out there, many of which have dual audio, a way to translate epg data has been on my wish list for a very long time.

Finally i have found a way to do so using argos-translate, see https://github.com/argosopente…nslate?tab=readme-ov-file

Argos translate is good in that it works offline, from a downloaded "dictionary" and more importantly is free with no API call limits.

All of the other translators i am aware of cost a fortune when your trying to translate large files, for context.... just sly Germany egp has over 300 000 lines of data, which is billions of character's.

Below is the script i am currently using. Obviously you will need to install all the required module's and argos-translate, see it's readme file for detail.

I have tried several ways of speeding it up while still maintaining the correct XML format, all my attempts to speed it up any further, have failed.

I am not sure if argos-translate supports multi threading, but if it dose i cant get it to work correctly. Half the issue is that it doesn't really support XML structure either.

From what i have read a decent GPU may be the fastest way to do things, again im limited to my lappie with its built in AMD chipset graphics.

So here is the issue ......it takes over 4 hours to translate the 5 files in the script - and that is only for one epg provider!!

I could refine the data to only what i require, mostly sport, movies and documentary channels, but ultimately i would still want all the data, for the all the sats.... and that would still take forever.

So i am after help in refining the script to make it faster - while still maintaining good XML structure....

And / Or .... other users to also translate EPG data, then if we share the translations - we could all get more EPG sources translated to English.

Anyways I'm probably waffling now, script below:

Python

#!/usr/bin/env python3

import os
import requests
import lzma
import argostranslate.package
import argostranslate.translate
from lxml import etree
import time
from tqdm import tqdm

# Define the directory for downloading and saving files
directory = "/tmp"
os.makedirs(directory, exist_ok=True)

# List of URLs to download
urls = [
    "http://epgspot.com/rytec_epg/rytecDE_Basic.xz",
    "http://epgspot.com/rytec_epg/rytecDE_OtherMovies.xz",
    "http://epgspot.com/rytec_epg/rytecCH_Basic.xz",
    "http://epgspot.com/rytec_epg/rytecDE_Common.xz",
    "http://epgspot.com/rytec_epg/rytecDE_SportMovies.xz"
]

def process_file(url):
    """Download, extract, translate, and compress a file from a given URL."""
    local_filename = os.path.join(directory, os.path.basename(url))

    # Download the file
    download_file(url, local_filename)

    # Extract the downloaded file
    extracted_file = extract_file(local_filename)

    # Translate the extracted XML file
    translated_file = translate_xml(extracted_file)

    # Compress the translated XML file
    compressed_file = compress_file(translated_file)

def download_file(url, local_filename):
    """Download a file from the specified URL."""
    response = requests.get(url)
    response.raise_for_status()  # Raise an error for bad responses
    with open(local_filename, 'wb') as f:
        f.write(response.content)  # Write the content to a local file

def extract_file(local_filename):
    """Extract a .xz file to its uncompressed form."""
    with lzma.open(local_filename) as f:
        output_filename = local_filename.replace('.xz', '')
        with open(output_filename, 'wb') as out_file:
            out_file.write(f.read())  # Write the decompressed content
    return output_filename

def translate_text(text, translation):
    """Translate a single text string using the provided translation object."""
    return translation.translate(text)

def translate_xml(file_path):
    """Translate the text within an XML file."""
    from_code = "de"  # Source language code (German)
    to_code = "en"    # Target language code (English)

    # Get installed languages and set up the translation
    installed_languages = argostranslate.translate.get_installed_languages()
    from_lang = next((lang for lang in installed_languages if lang.code == from_code), None)
    to_lang = next((lang for lang in installed_languages if lang.code == to_code), None)

    if from_lang is None or to_lang is None:
        print("Error: Specified languages are not installed.")
        return None

    underlying_translation = from_lang.get_translation(to_lang)  # Set the translation object

    print(f"Translating XML file: {file_path}")

    # Parse the XML file
    tree = etree.parse(file_path)
    root = tree.getroot()

    translation_cache = {}  # Cache for translated strings
    translation_count = 0
    max_translations = 9999000  # Limit for translations

    start_time = time.time()  # Record the start time
    total_elements = sum(1 for _ in root.iter())  # Count total XML elements

    with tqdm(total=total_elements, desc="Translating", unit="element") as pbar:
        for elem in root.iter():
            if translation_count >= max_translations:
                break  # Stop if max translations reached

            # Translate text if present
            if elem.text and elem.text.strip():
                original_text = elem.text.strip()
                if original_text not in translation_cache:
                    translated_text = translate_text(original_text, underlying_translation)
                    translation_cache[original_text] = translated_text  # Cache the translation
                else:
                    translated_text = translation_cache[original_text]
                elem.text = translated_text
                translation_count += 1  # Increment the translation count

            # Handle tail text (text following an element)
            if elem.tail and elem.tail.strip():
                original_tail = elem.tail.strip()
                if original_tail not in translation_cache:
                    translated_tail = translate_text(original_tail, underlying_translation)
                    translation_cache[original_tail] = translated_tail
                else:
                    translated_tail = translation_cache[original_tail]
                elem.tail = translated_tail
                translation_count += 1

            pbar.update(1)  # Update the progress bar

    # Write the translated XML to a new file
    translated_file_path = file_path.replace("rytec", "translated")
    tree.write(translated_file_path, pretty_print=True, xml_declaration=True, encoding='utf-8')

    end_time = time.time()  # Record the end time
    processing_time = end_time - start_time  # Calculate processing time
    print(f"Successfully translated XML file to '{translated_file_path}'.")
    print(f"Total processing time: {processing_time:.2f} seconds.")
    return translated_file_path

def compress_file(file_path):
    """Compress a file using LZMA compression."""
    compressed_filename = file_path + ".xz"
    with open(file_path, 'rb') as f:
        with lzma.open(compressed_filename, 'wb') as out_file:
            out_file.write(f.read())  # Write the content to the compressed file
    print(f"Compressed '{file_path}' to '{compressed_filename}'.")
    return compressed_filename

def main():
    """Main function to process each URL."""
    for url in urls:
        process_file(url)  # Process each file one by one

if __name__ == "__main__":
    main()  # Execute the main function

Display More

As i said earlier, it takes ages to translate the data, here is my run details from earlier today:

Code

jason@Jason-Lappie:~/epgstuff$ python3 translator.py
Translating XML file: /tmp/rytecAT_Basic
Translating rytecAT_Basic: 100%|█████████████████████████████████████████████████████████████████████████████████| 30436/30436 [19:09<00:00, 26.47element/s]
Successfully translated XML file to '/tmp/translatedAT_Basic'.
Total processing time: 1150.00 seconds.
Compressed '/tmp/translatedAT_Basic' to '/tmp/translatedAT_Basic.xz'.
Translating XML file: /tmp/rytecDE_Basic
Translating rytecDE_Basic: 100%|███████████████████████████████████████████████████████████████████████████████| 87546/87546 [1:32:41<00:00, 15.74element/s]
Successfully translated XML file to '/tmp/translatedDE_Basic'.
Total processing time: 5562.21 seconds.
Compressed '/tmp/translatedDE_Basic' to '/tmp/translatedDE_Basic.xz'.
Translating XML file: /tmp/rytecCH_Basic
Translating rytecCH_Basic: 100%|█████████████████████████████████████████████████████████████████████████████████| 62480/62480 [32:49<00:00, 31.73element/s]
Successfully translated XML file to '/tmp/translatedCH_Basic'.
Total processing time: 1969.37 seconds.
Compressed '/tmp/translatedCH_Basic' to '/tmp/translatedCH_Basic.xz'.
Translating XML file: /tmp/rytecDE_Common
Translating rytecDE_Common: 100%|██████████████████████████████████████████████████████████████████████████████| 72050/72050 [1:18:40<00:00, 15.26element/s]
Successfully translated XML file to '/tmp/translatedDE_Common'.
Total processing time: 4720.23 seconds.
Compressed '/tmp/translatedDE_Common' to '/tmp/translatedDE_Common.xz'.
Translating XML file: /tmp/rytecDE_SportMovies
Translating rytecDE_SportMovies: 100%|███████████████████████████████████████████████████████████████████████████| 39091/39091 [38:32<00:00, 16.90element/s]
Successfully translated XML file to '/tmp/translatedDE_SportMovies'.
Total processing time: 2312.80 seconds.
Compressed '/tmp/translatedDE_SportMovies' to '/tmp/translatedDE_SportMovies.xz'.
jason@Jason-Lappie:~/epgstuff$

Display More

Here is the source as i have added it to /etc/epgimport

For now i am just storing the files locally.

XML

<?xml version="1.0" encoding="utf-8"?>
<sources>
 <!-- 
 translated epg data test"
 --> 
    <mappings>
        <channel name="rytec.channels.xml.xz">
            <url>http://epgspot.com/rytec_epg/rytec.channels.xml.xz</url>
        </channel>
    </mappings>
    <sourcecat sourcecatname="Translated DE to EN @19.2east XMLTV">
        <source type="gen_xmltv" channels="rytec.channels.xml.xz">
            <description>Translated Sly DE - pt1</description>
            <url>/media/hdd/translated/translatedDE_SportMovies.xz</url>
        </source>
        <source type="gen_xmltv" channels="rytec.channels.xml.xz">
            <description>Translated Sly DE - pt2</description>
            <url>/media/hdd/translated/translatedDE_Common.xz</url>
        </source>
        <source type="gen_xmltv" channels="rytec.channels.xml.xz">
            <description>Translated Sly DE - pt3</description>
            <url>/media/hdd/translated/translatedDE_Basic.xz</url>
        </source>
        <source type="gen_xmltv" channels="rytec.channels.xml.xz">
            <description>Translated Sly DE - pt4</description>
            <url>/media/hdd/translated/translatedCH_Basic.xz</url>
        </source>
        <source type="gen_xmltv" channels="rytec.channels.xml.xz">
            <description>Translated Sly DE - pt5</description>
            <url>/media/hdd/translated/translatedAT_Basic.xz</url>
        </source>
    </sourcecat>
</sources>

Display More

Ultimately you end up with this:

The translated data is not perfect, but it is mostly fine..... besides, my broken English is way better than my broken German.

Anyways i hope there is some interest in this, it would be a load easier for everyone if a few peeps get on board and help.

Let me know your thoughts...... any help with it would be appreciated

**Mahmoud Hussein** · Oct 28th 2024

This looks great project, for a long time normal users keeping ask for bulk translation for epg to English to get proper understanding for current and incoming event.

This will improve a lot of functions like posters download, rating event search etc.

Unfortunately I am not programmer, but I wonder if we limit the scope of translation to services located in bouquets instead of translation of entire epg file, it will be much faster.

Also limit the translation to spacific number of days instead of make it open, I think limit it to 48 hrs will make sense.

Appreciate your time to start this project and hope more contributions from expert here.

Regards

Chris230291 · Oct 28th 2024

jenseneverest

Look in to concurrent futures for multithreading.

Here's an example:

Code

import concurrent.futures

strings = ["dog", "cat", "fish", "mouse", "elephant"]

def printyMcPrinter(string):
    print(string)       
    
with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(printyMcPrinter, strings)

Or

Code

import concurrent.futures

strings = ["dog", "cat", "fish", "mouse", "elephant"]

def printyMcPrinter(string):
    return string       
    
with concurrent.futures.ThreadPoolExecutor() as executor:
        results = executor.map(printyMcPrinter, strings)

for result in results:
      print(result)

Display More

Chris230291 · Oct 28th 2024

Here is a basic working example, using google translate (just easier for me).

Proxies are required to bypass rate limits. I find TOR works.

Python

import xml.etree.ElementTree as ET
import concurrent.futures
from deep_translator import GoogleTranslator

#####################################
#CONFIG##############################
desired_language = "en"
translate_titles = True
translate_sub_title = True
translate_descriptions = True
proxies = {"http": "192.168.1.200:8118", "https": "192.168.1.200:8118"}
#####################################

tree = ET.parse('tv.xmltv')
root = tree.getroot()
programmes = root.findall("programme")

def translate_programe(programme):
    if translate_titles:
        title_ele = programme.find("title")
        if title_ele != None:
            title_text = title_ele.text
            title_lang = title_ele.get("lang", "auto")
            if title_text and title_lang != desired_language:
                title_translated = GoogleTranslator(source=title_lang, target=desired_language, proxies=proxies).translate(title_text)
                ET.SubElement(programme, "title", lang=desired_language).text = str(title_translated)

    if translate_sub_title:
        sub_title_ele = programme.find("sub-title")
        if sub_title_ele != None:
            sub_title_text = sub_title_ele.text
            sub_title_lang = sub_title_ele.get("lang", "auto")
            if sub_title_text and sub_title_lang != desired_language:
                sub_title_translated = GoogleTranslator(source=sub_title_lang, target=desired_language, proxies=proxies).translate(sub_title_text)
                ET.SubElement(programme, "sub-title", lang=desired_language).text = str(sub_title_translated)

    if translate_descriptions:
        desc_ele = programme.find("desc")
        if desc_ele != None:
            desc_text = desc_ele.text
            desc_lang = desc_ele.get("lang", "auto")
            if desc_text and title_lang != desired_language:
                desc_translated = GoogleTranslator(source=desc_lang, target=desired_language, proxies=proxies).translate(desc_text)
                ET.SubElement(programme, "desc", lang=desired_language).text = str(desc_translated)
   
with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(translate_programe, programmes)

ET.indent(tree, space="\t", level=0)
tree.write('translated_tv.xmltv', encoding="utf-8", xml_declaration=True)

Display More

This input:

XML

<?xml version="1.0" encoding="UTF-8"?>
<tv generator-info-name="Rytec" generator-info-url="https://forums.openpli.org" generator-info-partner="bStream-Panel">
<channel id="OUTtv.de">
<display-name lang="de">OUTtv</display-name>
</channel>

<programme start="20241028000500 +0200" stop="20241028001200 +0200" channel="OUTtv.de">
<title lang="de">Mixed Messages</title>
<sub-title lang="de">[Queer, Comedy]  (S1E5) [Keine Altersangabe]  [GB]</sub-title>
<desc lang="de">Als Ren mit einer Freundin ausgeht, trifft er zufällig auf eine alte bekannte Person Kanchi Wichmann.</desc>
</programme>

<programme start="20241028001200 +0200" stop="20241028002200 +0200" channel="OUTtv.de">
<title lang="de">Mixed Messages</title>
<sub-title lang="de">[Queer, Comedy]  (S1E6) [Keine Altersangabe]  [GB]</sub-title>
<desc lang="de">Einer von Rens Kurskameraden sendet widersprüchliche Signale, die Ren verwirren Kanchi Wichmann.</desc>
</programme>

<programme start="20241028002200 +0200" stop="20241028002900 +0200" channel="OUTtv.de">
<title lang="de">Mixed Messages</title>
<sub-title lang="de">[Queer, Comedy]  (S1E7) [Keine Altersangabe]  [GB]</sub-title>
<desc lang="de">Ren wird dieses Mal von ihrem neuen Date zu einem besonderen Bondage-Workshop mitgenommen Kanchi Wichmann.</desc>
</programme>

<programme start="20241028002900 +0200" stop="20241028004000 +0200" channel="OUTtv.de">
<title lang="de">Mixed Messages</title>
<sub-title lang="de">[Queer, Comedy]  (S1E8) [Keine Altersangabe]  [GB]</sub-title>
<desc lang="de">Ren sucht nach einer Romanze, aber in einer Welt von Tinder war Liebe noch nie so schwierig Kanchi Wichmann.</desc>
</programme>

<programme start="20241028004000 +0200" stop="20241028005500 +0200" channel="OUTtv.de">
<title lang="de">Dropping the Soap</title>
<sub-title lang="de">[Comedy] Man Musk (S1E3) (2017) [Keine Altersangabe]  [US]</sub-title>
<desc lang="de">Julian will mit dem Dreh eines Werbespots beginnen, als Kit behauptet, sie soll auch im Spot sein Paul Witten Kate Mines Suzanne Friedline Michael McKiddy.</desc>
</programme>

<programme start="20241028005500 +0100" stop="20241028012400 +0100" channel="OUTtv.de">
<title lang="de">Boystown</title>
<sub-title lang="de">[Queer, Drama]  (S1E6) (2013) [Keine Altersangabe]  [US]</sub-title>
<desc lang="de">Jake bekommt Angst, wenn Patrick unerwartet Ryan besucht. Patrick will Aufnahmezeiten mit Ryan besprechen. Zudem erreichen Ricks Fantasien mit seinem Klienten eine neue Dimension. Und Michael versucht, T.J. zu überzeugen, dass er Rick verlassen muss J. Hunter Ackerman Albertossy Espinoza Jim Patneaude Ricky Reidling Jesse Seann Atkinson.</desc>
</programme>

<programme start="20241028012400 +0100" stop="20241028015200 +0100" channel="OUTtv.de">
<title lang="de">Boystown</title>
<sub-title lang="de">[Queer, Drama]  (S1E7) (2013) [Keine Altersangabe]  [US]</sub-title>
<desc lang="de">Bain soll Chris helfen, die Originalkopie seiner Audition zu stehlen. Rick versucht, T.J. zu überzeugen, dass er Therapie braucht. Und dass T.J. Antidepressiva nimmt. Schließlich geraten Rick und T.J. wieder aneinander J. Hunter Ackerman Albertossy Espinoza Jim Patneaude Ricky Reidling Jesse Seann Atkinson.</desc>
</programme>

<programme start="20241028015200 +0100" stop="20241028020800 +0100" channel="OUTtv.de">
<title lang="de">Dudes</title>
<sub-title lang="de">[Comedy, Queer]  (S1E4) (2014) [Keine Altersangabe]  [US]</sub-title>
<desc lang="de">David verliebt sich in Lowell, einen Yogalehrer. Tyler findet eine Lösung für seine Frustration.</desc>
</programme>

<programme start="20241028020800 +0100" stop="20241028031600 +0100" channel="OUTtv.de">
<title lang="de">Toy Boy</title>
<sub-title lang="de">[Thriller, Drama] Im Kreis irren (S1E4) (2019) [Keine Altersangabe]  [ES]</sub-title>
<desc lang="de">Die Mutter eines alten Freundes wendet sich verzweifelt mit einer dringenden Bitte an Andrea. Währenddessen hat Hugo alle Hände voll damit, zu verbergen, dass er das Inferno ausgerechnet zum Zeitpunkt des erneuten Mordes verlassen hat Jesús Mosquera Cristina Castaño María Pedraza Álex Gadea Javier Mora Jose Manuel Seda.</desc>
</programme>
</tv>

Display More

Gives:

XML

<?xml version='1.0' encoding='utf-8'?>
<tv generator-info-name="Rytec" generator-info-url="https://forums.openpli.org" generator-info-partner="bStream-Panel">
    <channel id="OUTtv.de">
        <display-name lang="de">OUTtv</display-name>
    </channel>
    <programme start="20241028000500 +0200" stop="20241028001200 +0200" channel="OUTtv.de">
        <title lang="de">Mixed Messages</title>
        <sub-title lang="de">[Queer, Comedy]  (S1E5) [Keine Altersangabe]  [GB]</sub-title>
        <desc lang="de">Als Ren mit einer Freundin ausgeht, trifft er zufällig auf eine alte bekannte Person Kanchi Wichmann.</desc>
        <title lang="en">Mixed Messages</title>
        <sub-title lang="en">[Queer, Comedy] (S1E5) [No age specified] [GB]</sub-title>
        <desc lang="en">When Ren goes out with a friend, he accidentally meets an old acquaintance, Kanchi Wichmann.</desc>
    </programme>
    <programme start="20241028001200 +0200" stop="20241028002200 +0200" channel="OUTtv.de">
        <title lang="de">Mixed Messages</title>
        <sub-title lang="de">[Queer, Comedy]  (S1E6) [Keine Altersangabe]  [GB]</sub-title>
        <desc lang="de">Einer von Rens Kurskameraden sendet widersprüchliche Signale, die Ren verwirren Kanchi Wichmann.</desc>
        <title lang="en">Mixed Messages</title>
        <sub-title lang="en">[Queer, Comedy] (S1E6) [No age specified] [GB]</sub-title>
        <desc lang="en">One of Ren's classmates sends conflicting signals that confuse Ren Kanchi Wichmann.</desc>
    </programme>
    <programme start="20241028002200 +0200" stop="20241028002900 +0200" channel="OUTtv.de">
        <title lang="de">Mixed Messages</title>
        <sub-title lang="de">[Queer, Comedy]  (S1E7) [Keine Altersangabe]  [GB]</sub-title>
        <desc lang="de">Ren wird dieses Mal von ihrem neuen Date zu einem besonderen Bondage-Workshop mitgenommen Kanchi Wichmann.</desc>
        <title lang="en">Mixed Messages</title>
        <sub-title lang="en">[Queer, Comedy] (S1E7) [No age specified] [GB]</sub-title>
        <desc lang="en">This time Ren is taken to a special bondage workshop by her new date Kanchi Wichmann.</desc>
    </programme>
    <programme start="20241028002900 +0200" stop="20241028004000 +0200" channel="OUTtv.de">
        <title lang="de">Mixed Messages</title>
        <sub-title lang="de">[Queer, Comedy]  (S1E8) [Keine Altersangabe]  [GB]</sub-title>
        <desc lang="de">Ren sucht nach einer Romanze, aber in einer Welt von Tinder war Liebe noch nie so schwierig Kanchi Wichmann.</desc>
        <title lang="en">Mixed Messages</title>
        <sub-title lang="en">[Queer, Comedy] (S1E8) [No age specified] [GB]</sub-title>
        <desc lang="en">Ren is looking for romance, but in a world of Tinder, love has never been so difficult Kanchi Wichmann.</desc>
    </programme>
    <programme start="20241028004000 +0200" stop="20241028005500 +0200" channel="OUTtv.de">
        <title lang="de">Dropping the Soap</title>
        <sub-title lang="de">[Comedy] Man Musk (S1E3) (2017) [Keine Altersangabe]  [US]</sub-title>
        <desc lang="de">Julian will mit dem Dreh eines Werbespots beginnen, als Kit behauptet, sie soll auch im Spot sein Paul Witten Kate Mines Suzanne Friedline Michael McKiddy.</desc>
        <title lang="en">Dropping the Soap</title>
        <sub-title lang="en">[Comedy] Man Musk (S1E3) (2017) [No age specified] [US]</sub-title>
        <desc lang="en">Julian is about to start filming a commercial when Kit claims she should be in the commercial too Paul Witten Kate Mines Suzanne Friedline Michael McKiddy.</desc>
    </programme>
    <programme start="20241028005500 +0100" stop="20241028012400 +0100" channel="OUTtv.de">
        <title lang="de">Boystown</title>
        <sub-title lang="de">[Queer, Drama]  (S1E6) (2013) [Keine Altersangabe]  [US]</sub-title>
        <desc lang="de">Jake bekommt Angst, wenn Patrick unerwartet Ryan besucht. Patrick will Aufnahmezeiten mit Ryan besprechen. Zudem erreichen Ricks Fantasien mit seinem Klienten eine neue Dimension. Und Michael versucht, T.J. zu überzeugen, dass er Rick verlassen muss J. Hunter Ackerman Albertossy Espinoza Jim Patneaude Ricky Reidling Jesse Seann Atkinson.</desc>
        <title lang="en">Boystown</title>
        <sub-title lang="en">[Queer, Drama] (S1E6) (2013) [No age specified] [US]</sub-title>
        <desc lang="en">Jake gets scared when Patrick unexpectedly visits Ryan. Patrick wants to discuss recording times with Ryan. Rick's fantasies with his client reach a new dimension. And Michael tries to convince T.J. that he has to leave Rick J. Hunter Ackerman Albertossy Espinoza Jim Patneaude Ricky Reidling Jesse Seann Atkinson.</desc>
    </programme>
    <programme start="20241028012400 +0100" stop="20241028015200 +0100" channel="OUTtv.de">
        <title lang="de">Boystown</title>
        <sub-title lang="de">[Queer, Drama]  (S1E7) (2013) [Keine Altersangabe]  [US]</sub-title>
        <desc lang="de">Bain soll Chris helfen, die Originalkopie seiner Audition zu stehlen. Rick versucht, T.J. zu überzeugen, dass er Therapie braucht. Und dass T.J. Antidepressiva nimmt. Schließlich geraten Rick und T.J. wieder aneinander J. Hunter Ackerman Albertossy Espinoza Jim Patneaude Ricky Reidling Jesse Seann Atkinson.</desc>
        <title lang="en">Boystown</title>
        <sub-title lang="en">[Queer, Drama] (S1E7) (2013) [No age specified] [US]</sub-title>
        <desc lang="en">Bain is supposed to help Chris steal the original copy of his audition. Rick tries to convince T.J. that he needs therapy. And that T.J. is taking antidepressants. Eventually Rick and T.J. clash again J. Hunter Ackerman Albertossy Espinoza Jim Patneaude Ricky Reidling Jesse Seann Atkinson.</desc>
    </programme>
    <programme start="20241028015200 +0100" stop="20241028020800 +0100" channel="OUTtv.de">
        <title lang="de">Dudes</title>
        <sub-title lang="de">[Comedy, Queer]  (S1E4) (2014) [Keine Altersangabe]  [US]</sub-title>
        <desc lang="de">David verliebt sich in Lowell, einen Yogalehrer. Tyler findet eine Lösung für seine Frustration.</desc>
        <title lang="en">dudes</title>
        <sub-title lang="en">[Comedy, Queer] (S1E4) (2014) [Not age-appropriate] [US]</sub-title>
        <desc lang="en">David falls in love with Lowell, a yoga teacher. Tyler finds a solution to his frustration.</desc>
    </programme>
    <programme start="20241028020800 +0100" stop="20241028031600 +0100" channel="OUTtv.de">
        <title lang="de">Toy Boy</title>
        <sub-title lang="de">[Thriller, Drama] Im Kreis irren (S1E4) (2019) [Keine Altersangabe]  [ES]</sub-title>
        <desc lang="de">Die Mutter eines alten Freundes wendet sich verzweifelt mit einer dringenden Bitte an Andrea. Währenddessen hat Hugo alle Hände voll damit, zu verbergen, dass er das Inferno ausgerechnet zum Zeitpunkt des erneuten Mordes verlassen hat Jesús Mosquera Cristina Castaño María Pedraza Álex Gadea Javier Mora Jose Manuel Seda.</desc>
        <title lang="en">Toy Boy</title>
        <sub-title lang="en">[Thriller, Drama] The Circle (S1E4) (2019) [No age specified] [ES]</sub-title>
        <desc lang="en">The mother of an old friend turns to Andrea in desperation with an urgent request. Meanwhile, Hugo has his hands full trying to hide the fact that he left the inferno just at the time of the new murder. Jesús Mosquera Cristina Castaño María Pedraza Álex Gadea Javier Mora Jose Manuel Seda.</desc>
    </programme>
</tv>

Display More

You could use string.capwords() or similar to fix captialisation, if that bothers you.

**KiddaC** · Oct 29th 2024

Here my 2 pence worth via chatgpt. Some ideas for you.

This uses deep_translator
Concurrent futures
Asyncio
Autodetect language
~~Only translates today. Skips others~~ Enter the number of days you want. Skip others.
Outputs to console every 10 seconds to show how many its translated.

Don't know how long it takes. Its been running now 20 mins and translated 5500 entries

Note: like Jensens. This is windows python 3 code. Not enigma2 python code.

Python

# Install required packages
# pip install aiohttp
# pip install deep-translator
# pip install beautifulsoup4
# pip install lzma  # Optional, lzma is part of the standard library in Python 3.3+

import os
import aiohttp
import asyncio
import lzma
from deep_translator import GoogleTranslator
from bs4 import BeautifulSoup
from datetime import datetime, timedelta
from concurrent.futures import ThreadPoolExecutor

# List of URLs to download
urls = [
    "http://epgspot.com/rytec_epg/rytecDE_Basic.xz",
    "http://epgspot.com/rytec_epg/rytecCH_Basic.xz",
    "http://epgspot.com/rytec_epg/rytecDE_Common.xz",
    "http://epgspot.com/rytec_epg/rytecDE_SportMovies.xz"
]

successful_translations = 0
failed_translations = 0

# File paths
base_folder = "C:\\translator"
temp_folder = os.path.join(base_folder, "temp")
output_folder = os.path.join(base_folder, "output")

# Ensure base, temp, and output folders exist
os.makedirs(temp_folder, exist_ok=True)
os.makedirs(output_folder, exist_ok=True)

# Set the number of days (including today)
number_of_days = 3  # Change this value as needed

async def download_file(session, url, save_path):
    print(f"Downloading {url} to {save_path}...")
    try:
        async with session.get(url, timeout=10) as response:
            response.raise_for_status()
            with open(save_path, 'wb') as file:
                while True:
                    chunk = await response.content.read(8192)
                    if not chunk:
                        break
                    file.write(chunk)
        print(f"Downloaded {url} successfully.")
    except Exception as e:
        print(f"Error downloading {url}: {e}")

def extract_xz(file_path, output_path):
    print(f"Extracting {file_path} to {output_path}...")
    try:
        with lzma.open(file_path) as xz_file:
            with open(output_path, 'wb') as extracted_file:
                extracted_file.write(xz_file.read())
        print(f"Extracted {file_path} successfully.")
    except (lzma.LZMAError, IOError) as e:
        print(f"Error extracting {file_path}: {e}")

def translate_texts(texts):
    """Translate a list of texts."""
    global successful_translations, failed_translations
    results = []
    for text in texts:
        try:
            translated_text = GoogleTranslator(source='auto', target='en').translate(text)
            results.append(translated_text)
            successful_translations += 1
        except Exception as e:
            print(f"Translation error for text '{text}': {e}")
            results.append(None)
            failed_translations += 1
    return results

async def translate_xml_descs(file_path):
    """Parse XML, detect and translate <desc> tags in place, and save translated XML."""
    global successful_translations, failed_translations
    print(f"Translating descriptions in {file_path}...")
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            content = f.read()
            soup = BeautifulSoup(content, 'xml')

        today_date = datetime.now().date()
        print("Today's date:", today_date)

        descriptions_to_translate = []
        programmes = []

        for programme in soup.find_all('programme'):
            start_str = programme.get('start', '')
            try:
                start_date = datetime.strptime(start_str[:8], '%Y%m%d').date()
                if start_date > today_date + timedelta(days=number_of_days - 1):
                    print(f"Skipping programme starting on {start_date}, outside range.")
                    continue  # Skip if outside the range

                desc = programme.find('desc')  # No language filter
                if desc:
                    first_line = desc.text.split('\n')[0].strip()
                    if first_line:  # Check if the line is not empty
                        descriptions_to_translate.append(first_line)
                        programmes.append(programme)
                else:
                    print(f"No <desc> tag found for programme: {programme.title.string}")

            except ValueError:
                print(f"Error parsing start date: {start_str}")
                continue

        # Use ThreadPoolExecutor to translate descriptions
        with ThreadPoolExecutor() as executor:
            translated_descriptions = await asyncio.get_running_loop().run_in_executor(executor, translate_texts, descriptions_to_translate)

        # Update the soup with translated descriptions
        for translated_text, programme in zip(translated_descriptions, programmes):
            if translated_text:
                desc = programme.find('desc')  # Updated line to find <desc> tag
                desc.string = translated_text
                desc['lang'] = 'en'

        # Save the updated XML file
        translated_file_path = os.path.join(temp_folder, "translated_" + os.path.basename(file_path))
        with open(translated_file_path, 'wb') as f:
            f.write(soup.prettify(encoding='utf-8'))
        print(f"Saved translated XML file: {translated_file_path}")

        return translated_file_path
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
        return None

def compress_to_xz(file_path, output_path):
    """Compress a file back to .xz format."""
    print(f"Compressing {file_path} to {output_path}...")
    try:
        with open(file_path, 'rb') as input_file:
            with lzma.open(output_path, 'wb') as xz_file:
                xz_file.write(input_file.read())
        print(f"Compressed {file_path} successfully.")
    except (lzma.LZMAError, IOError) as e:
        print(f"Error compressing {file_path}: {e}")

async def process_file(session, url):
    """Process each file: download, extract, translate, and compress."""
    file_name = os.path.basename(url)
    download_path = os.path.join(temp_folder, file_name)
    extracted_path = download_path.replace('.xz', '.xml')
    output_xz_path = os.path.join(output_folder, file_name)

    # Step 1: Download the file
    await download_file(session, url, download_path)

    # Step 2: Extract if compressed (XZ format)
    if download_path.endswith('.xz'):
        extract_xz(download_path, extracted_path)

    # Step 3: Translate XML descriptions
    if os.path.exists(extracted_path):
        translated_xml_path = await translate_xml_descs(extracted_path)

        # Step 4: Compress back to .xz format
        if translated_xml_path:
            compress_to_xz(translated_xml_path, output_xz_path)

    # Cleanup
    for path in [download_path, extracted_path]:
        if path and os.path.exists(path):
            os.remove(path)

async def translation_counter():
    """Print the number of successful translations every 10 seconds."""
    while True:
        await asyncio.sleep(10)
        print(f"Translations completed: {successful_translations}, Failed: {failed_translations}")

async def main():
    # Start the translation counter task
    counter_task = asyncio.create_task(translation_counter())

    async with aiohttp.ClientSession() as session:
        tasks = [process_file(session, url) for url in urls]
        await asyncio.gather(*tasks)

    # Cancel the counter task after processing is complete
    counter_task.cancel()
    try:
        await counter_task
    except asyncio.CancelledError:
        pass

if __name__ == "__main__":
    asyncio.run(main())

Display More

**KiddaC** · Oct 29th 2024

and as I typed above. It gave me an update

**KiddaC** · Oct 29th 2024

OK on my main computer rig which is quite high end.
13th Gen Intel(R) Core(TM) i5-13400F, 2500 Mhz, 10 Core(s), 16 Logical Processor(s)
and 32gb of RAM

Obviously I cant compare that with Jensens Laptop....

It took 36 mins to translate those 4 files. For just todays EPG entries.

Chris230291 · Oct 29th 2024

Code

translated_text = GoogleTranslator(source='de', target='en').translate(text)

That's hard set. Use “auto” instead of “de”.

Also, the translator fails silently, so are you sure it really did translate successfully?

Seems like chatgpt really overcomplicated it… or maybe I'm just dumb.

**KiddaC** · Oct 29th 2024

It was auto... Chatgpt obviously changed again in later amends. It tries to be clever, even though you keep saying use autodetect. :)

And did it translate. Yep. I have 4 output files

4 new compressed files

And example xml for today. It even changed the lang="en" in the output. Cool.

**Lululla** · Oct 29th 2024

I asked chatgpt if he could increase the download time speed and he replied

Code

async def main():
    counter_task = asyncio.create_task(translation_counter())
    
    async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(limit=10)) as session:
        tasks = [process_file(session, url) for url in urls]
        await asyncio.gather(*tasks)

    counter_task.cancel()
    try:
        await counter_task
    except asyncio.CancelledError:
        pass

Display More

and..

Code

with ThreadPoolExecutor(max_workers=10) as executor:
    translated_descriptions = await asyncio.get_running_loop().run_in_executor(executor, translate_texts, descriptions_to_translate)

**KiddaC** · Oct 29th 2024

ON fast internet the downloading of XZ files is reasonable quick. The translations are the slow bit.

**Lululla** · Oct 29th 2024

cache ?

Code

translation_cache = {}

def translate_texts(texts):
    results = []
    for text in texts:
        if text in translation_cache:
            results.append(translation_cache[text])
        else:
            try:
                translated_text = GoogleTranslator(source='de', target='en').translate(text)
                translation_cache[text] = translated_text
                results.append(translated_text)
            except Exception as e:
                results.append(None)
    return results

Display More

**KiddaC** · Oct 29th 2024

I was just faffing. Its up to you guys to perfect. I don't use international epgs

Chris230291 · Oct 29th 2024

Quote from KiddaC

It was auto... Chatgpt obviously changed again in later amends. It tries to be clever, even though you keep saying use autodetect. :)

Did every single programme that was meant to be translated actually get translated?

Scroll to the bottom and double check.

The reason I mention this is that I hit rate limits very quickly when I wasn't going via the tor network.

I added to my example:

- Caching (not convinced it made a difference)

- Days to translate

- Workers variable

- Completion time

- Simplified it further.

I realise it doesn't handle the downloading, extracting, etc, but that's easily added.

The pre downloaded basic sky de epg took "0h 4m 24s" translating all field for 1 day with 60 workers (I have a feeling this is a little high).

Python

import xml.etree.ElementTree as ET
import concurrent.futures
from deep_translator import GoogleTranslator
from threading import Lock
from datetime import datetime

#####################################
# CONFIG##############################
desired_language = "en"
translate_titles = True
translate_sub_titles = True
translate_descriptions = True
proxies = {"http": "192.168.1.200:8118", "https": "192.168.1.200:8118"}
workers = 60
days_to_translate = 1
#####################################


def translate_programe(programme):
    def translate_attribute(attribute):
        element = programme.find(attribute)
        if element != None:
            text = element.text
            lang = element.get("lang", "auto")
            translation = translation_cache.get(text)
            if translation is None and text is not None and lang != desired_language:
                translation = GoogleTranslator(
                    source=lang, target=desired_language, proxies=proxies
                ).translate(text)
            if translation:
                ET.SubElement(programme, attribute, lang=desired_language).text = str(
                    translation
                )
                with lock:
                    translation_cache[text] = translation

    programme_stop = datetime.strptime(programme.get("stop"), "%Y%m%d%H%M%S %z")
    if (
        days_to_translate == 0
        or programme_stop.date() == today
        or (programme_stop.date() - today).days >= 0
        and (programme_stop.date() - today).days <= days_to_translate
    ):

        if translate_titles:
            translate_attribute("title")

        if translate_sub_titles:
            translate_attribute("sub-title")

        if translate_descriptions:
            translate_attribute("desc")


start_time = datetime.now()
today = start_time.date()
translation_cache = {}
lock = Lock()

tree = ET.parse("tv.xmltv")
root = tree.getroot()
programmes = root.findall("programme")

with concurrent.futures.ThreadPoolExecutor(max_workers=workers) as executor:
    executor.map(translate_programe, programmes)

ET.indent(tree, space="\t", level=0)
tree.write("translated_tv.xmltv", encoding="utf-8", xml_declaration=True)

elapsed_time = datetime.now() - start_time
hours, remainder = elapsed_time.seconds // 3600, elapsed_time.seconds % 3600
minutes, seconds = remainder // 60, remainder % 60
print(f"Completed in: {hours}h {minutes}m {seconds}s")

Display More

**KiddaC** · Oct 29th 2024

ok so lets have look

_common was the last to finish

Search for todays date from the end of the file - last entry for today that needed translating

Code

 <programme channel="DeluxeRap.de" start="20241029180000 +0000" stop="20241030000000 +0000">
  <title lang="de">
   Total Rap
  </title>
  <desc lang="en">
   Non-stop rap. Your after-work beats.
  </desc>
 </programme>

before translation

Code

  <programme start="20241029180000 +0000" stop="20241030000000 +0000" channel="DeluxeRap.de">
    <title lang="de">Total Rap</title>
    <desc lang="de">Non-Stop Rap. Deine Feierabend-Beats.

Rap nonstop, den ganzen Abend lang. Hier haben deine Lieblingsstars ein Zuhause, hier lebt der HipHop. Old School trifft New School, von Brooklyn bis Berlin, von Snoop Dogg bis RIN und über Nicki Minaj und Future wieder zurück.</desc>
  </programme>

And shown in the code above I was also restricting translations to the first line break. And only programme decription and not titles.
I was just proof of concepting.

Chris230291 · Oct 29th 2024

Ah OK.

Perhaps it's not the number of requests, but the number of words in those requests.

I just ran mine set to 0 days (all programmes in file) and it took 12 mins to translate everything.

I don't know if the cache is doing work, or if the if statement to calculate whether to translate the programme is expensive.

**KiddaC** · Oct 29th 2024

I am going to try my code with 60 workers.

**KiddaC** · Oct 29th 2024

And notes to Jensen.
You originally were using argon translator. This is an offline translator, so the power of your device matters.
Deep Translator that I am using is an api call. So that is based off a good internet connection. Also has better translations.
The only concern like Chris mentions is getting blocked or hitting limits (I don't know what the limits are). So its finding the happy medium.

Slightly adjusting my original code. This now finishes in about 21 mins for 4 urls. 1 day epg. 60 workers.
So we have given you a few things to experiment with.

Python

# Install required packages
# pip install aiohttp
# pip install deep-translator
# pip install beautifulsoup4
# pip install lzma  # Optional, lzma is part of the standard library in Python 3.3+

import os
import aiohttp
import asyncio
import lzma
from deep_translator import GoogleTranslator
from bs4 import BeautifulSoup
from datetime import datetime, timedelta
from concurrent.futures import ThreadPoolExecutor

# List of URLs to download
urls = [
    "http://epgspot.com/rytec_epg/rytecDE_Basic.xz",
    "http://epgspot.com/rytec_epg/rytecCH_Basic.xz",
    "http://epgspot.com/rytec_epg/rytecDE_Common.xz",
    "http://epgspot.com/rytec_epg/rytecDE_SportMovies.xz"
]

successful_translations = 0
failed_translations = 0

# File paths
base_folder = "C:\\translator"
temp_folder = os.path.join(base_folder, "temp")
output_folder = os.path.join(base_folder, "output")

# Ensure base, temp, and output folders exist
os.makedirs(temp_folder, exist_ok=True)
os.makedirs(output_folder, exist_ok=True)

# Set the number of days (including today)
number_of_days = 1  # Change this value as needed
max_workers = 60  # Adjust the number of workers as needed


async def download_file(session, url, save_path):
    print(f"Downloading {url} to {save_path}...")
    try:
        async with session.get(url, timeout=10) as response:
            response.raise_for_status()
            with open(save_path, 'wb') as file:
                while True:
                    chunk = await response.content.read(8192)
                    if not chunk:
                        break
                    file.write(chunk)
        print(f"Downloaded {url} successfully.")
    except Exception as e:
        print(f"Error downloading {url}: {e}")


def extract_xz(file_path, output_path):
    print(f"Extracting {file_path} to {output_path}...")
    try:
        with lzma.open(file_path) as xz_file:
            with open(output_path, 'wb') as extracted_file:
                extracted_file.write(xz_file.read())
        print(f"Extracted {file_path} successfully.")
    except (lzma.LZMAError, IOError) as e:
        print(f"Error extracting {file_path}: {e}")


def translate_texts(texts):
    """Translate a list of texts."""
    global successful_translations, failed_translations
    results = []
    for text in texts:
        try:
            translated_text = GoogleTranslator(source='auto', target='en').translate(text)
            results.append(translated_text)
            successful_translations += 1
        except Exception as e:
            print(f"Translation error for text '{text}': {e}")
            results.append(None)
            failed_translations += 1
    return results


async def translate_xml_descs(file_path):
    """Parse XML, detect and translate <desc> tags in place, and save translated XML."""
    global successful_translations, failed_translations
    print(f"Translating descriptions in {file_path}...")
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            content = f.read()
            soup = BeautifulSoup(content, 'xml')

        today_date = datetime.now().date()
        print("Today's date:", today_date)

        descriptions_to_translate = []
        programmes = []

        for programme in soup.find_all('programme'):
            start_str = programme.get('start', '')
            try:
                start_date = datetime.strptime(start_str[:8], '%Y%m%d').date()
                if start_date > today_date + timedelta(days=number_of_days - 1):
                    print(f"Skipping programme starting on {start_date}, outside range.")
                    continue  # Skip if outside the range

                desc = programme.find('desc')  # No language filter
                if desc:
                    first_line = desc.text.split('\n')[0].strip()
                    if first_line:  # Check if the line is not empty
                        descriptions_to_translate.append(first_line)
                        programmes.append(programme)
                else:
                    print(f"No <desc> tag found for programme: {programme.title.string}")

            except ValueError:
                print(f"Error parsing start date: {start_str}")
                continue

        # Use ThreadPoolExecutor to translate descriptions
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            translated_descriptions = await asyncio.get_running_loop().run_in_executor(executor, translate_texts, descriptions_to_translate)

        # Update the soup with translated descriptions
        for translated_text, programme in zip(translated_descriptions, programmes):
            if translated_text:
                desc = programme.find('desc')  # Updated line to find <desc> tag
                desc.string = translated_text
                desc['lang'] = 'en'

        # Save the updated XML file
        translated_file_path = os.path.join(temp_folder, "translated_" + os.path.basename(file_path))
        with open(translated_file_path, 'wb') as f:
            f.write(soup.prettify(encoding='utf-8'))
        print(f"Saved translated XML file: {translated_file_path}")

        return translated_file_path
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
        return None


def compress_to_xz(file_path, output_path):
    """Compress a file back to .xz format."""
    print(f"Compressing {file_path} to {output_path}...")
    try:
        with open(file_path, 'rb') as input_file:
            with lzma.open(output_path, 'wb') as xz_file:
                xz_file.write(input_file.read())
        print(f"Compressed {file_path} successfully.")
    except (lzma.LZMAError, IOError) as e:
        print(f"Error compressing {file_path}: {e}")


async def process_file(session, url):
    """Process each file: download, extract, translate, and compress."""
    file_name = os.path.basename(url)
    download_path = os.path.join(temp_folder, file_name)
    extracted_path = download_path.replace('.xz', '.xml')
    output_xz_path = os.path.join(output_folder, file_name)

    # Step 1: Download the file
    await download_file(session, url, download_path)

    # Step 2: Extract if compressed (XZ format)
    if download_path.endswith('.xz'):
        extract_xz(download_path, extracted_path)

    # Step 3: Translate XML descriptions
    if os.path.exists(extracted_path):
        translated_xml_path = await translate_xml_descs(extracted_path)

        # Step 4: Compress back to .xz format
        if translated_xml_path:
            compress_to_xz(translated_xml_path, output_xz_path)

    # Cleanup
    for path in [download_path, extracted_path]:
        if path and os.path.exists(path):
            os.remove(path)


async def translation_counter():
    """Print the number of successful translations every 10 seconds."""
    while True:
        await asyncio.sleep(10)
        print(f"Translations completed: {successful_translations}, Failed: {failed_translations}")


async def main():
    # Start the translation counter task
    counter_task = asyncio.create_task(translation_counter())

    async with aiohttp.ClientSession() as session:
        # Create tasks for processing files
        tasks = [process_file(session, url) for url in urls]
        await asyncio.gather(*tasks)

    # Cancel the counter task after processing is complete
    counter_task.cancel()
    try:
        await counter_task
    except asyncio.CancelledError:
        pass

if __name__ == "__main__":
    asyncio.run(main())

Display More

Chris230291 · Oct 29th 2024

I just ran mine without using a proxy, every day and every attribute.

It completes in 0h 6m 58s and everything seems to be translated.

To be clear, my code is based on an old project of mine from years ago.

Back then, a proxy was 100% needed.

Seems like it's not any more?

**KiddaC** · Oct 29th 2024

So we both seem to be around the 5 min mark per url.
I assume your above message was just 1 url.
I am just trying mine now including titles and not breaking on line breaks.

** edit - titles and full text is taking long time 18 mins in and no output yet ***

When jenseneverest returns to this thread he will be like ....

EPG translation script - an open project - help required.

Your resource for Enigma2 EPG tools

Share

LinuxSat Community

Firmware Images

Receiver Support

Plugins & Tools

Resources