Posted 2023-11-27Updated 2023-11-274 minutes read (About 664 words)

使用 Intel Control-flow Enforcement Technology (CET) 加强 C 程序的安全性

简介

Intel 的 Control-flow Enforcement Technology (CET) 是一种旨在提高软件安全性的硬件级特性，用于防止控制流劫持攻击，如返回导向编程（ROP）和跳转导向编程（JOP）。CET 主要通过两种机制实现：间接分支跟踪（IBT）和阴影栈（Shadow Stack）。

CET 实现原理

**间接分支跟踪 (IBT)**：要求所有间接跳转（如函数指针调用）必须跳转到用 ENDBRANCH 指令标记的目标。这阻止了通过篡改内存来创建恶意跳转的尝试。
**阴影栈 (Shadow Stack)**：为每个线程独立存储返回地址的副本。当函数返回时，处理器会比较当前的返回地址和阴影栈中的地址。如果它们不匹配，处理器会阻止返回操作，防止ROP攻击。

示例代码

下面的示例展示了一个简单的C程序，其中包含函数指针的使用和尝试非法访问的场景。

#include <stdio.h>
#include <stdlib.h>

void safe_function() {
    printf("Safe function called.\n");
}

void unsafe_function() {
    printf("Unsafe function called.\n");
}

int main() {
    // 正常的函数指针使用
    void (*func_ptr)() = safe_function;
    func_ptr();

    // 尝试非法修改函数指针
    // 在一个CET支持的环境中，这种修改会被检测并阻止
    func_ptr = (void (*)())((char*)safe_function + 2);
    func_ptr(); // 这里的调用在CET环境下应该会失败

    return 0;
}

在这个例子中，func_ptr 最初指向 safe_function。然后，代码尝试将 func_ptr 修改为 safe_function 地址的偏移，模拟一个非法的控制流修改尝试。在启用了 CET 的环境中，这种尝试应该会导致程序崩溃或异常终止。

编译和运行

使用支持 CET 的 GCC 编译器，编译上述代码：

1	gcc -fcf-protection=full -o cet_demo cet_demo.c

然后运行编译出的程序：

1	./cet_demo

测试 CET

在支持 CET 的环境中，程序中的第二次 func_ptr 调用（非法调用）将会失败，因为它尝试执行一个没有用 ENDBRANCH 指令标记的地址。这正是 CET 防止控制流劫持攻击的方式。

结论

CET 是在深度防御策略中的一个重要层面，但它并不是万能的。良好的编程实践和其他安全措施仍然是必要的。CET 提供了一个硬件级别的保护层，使得控制流劫持攻击更加困难，从而提高了软件的整体安全性。

Posted 2023-10-27Updated 2023-11-2727 minutes read (About 4064 words)

The Secure Path Forward for eBPF runtime: Challenges and Innovations

Yusheng Zheng

Extended Berkeley Packet Filter (eBPF) represents a significant evolution in the way we interact with and extend the capabilities of modern operating systems. As a powerful technology that enables the Linux kernel to run sandboxed programs in response to events, eBPF has become a cornerstone for system observability, networking, and security features.

However, as with any system that interfaces closely with the kernel, the security of eBPF itself is paramount. In this blog, we delve into the often-overlooked aspect of eBPF security, exploring how the mechanisms intended to safeguard eBPF can themselves be fortified. We’ll dissect the role of the eBPF verifier, scrutinize the current access control model, and investigate potential improvements from ongoing research. Moreover, we’ll navigate through the complexities of securing eBPF, addressing open questions and the challenges they pose to system architects and developers alike.

The Secure Path Forward for eBPF runtime: Challenges and Innovations

How eBPF Ensures Security with Verifier

The security framework of eBPF is largely predicated on the robustness of its verifier. This component acts as the gatekeeper, ensuring that only safe and compliant programs are allowed to run within the kernel space.

What the eBPF Verifier Is and What It Does

At its core, the eBPF verifier is a static code analyzer. Its primary function is to vet the BPF program instructions before they are executed. It scrutinizes a copy of the program within the kernel, operating with the following objectives:

Ensuring Program Termination

The verifier uses depth-first search (DFS) algorithms to traverse the program’s control flow graph, which it ensures is a Directed Acyclic Graph (DAG). This is crucial for guaranteeing that the program cannot enter into an infinite loop, thereby ensuring its termination. It meticulously checks for any unbounded loops and malformed or out-of-bounds jumps that could disrupt the normal operation of the kernel or lead to a system hang.
Ensuring Memory Safety

Memory safety is paramount in kernel operations. The verifier checks for potential out-of-bounds memory accesses that could lead to data corruption or security breaches. It also safeguards against use-after-free bugs and object leaks, which are common vulnerabilities that can be exploited. In addition to these, it takes into account hardware vulnerabilities like Spectre, enforcing mitigations to prevent such side-channel attacks.
Ensuring Type Safety

Type safety is another critical aspect that the verifier ensures. By preventing type confusion bugs, it helps maintain the integrity of data within the kernel. The eBPF verifier utilizes BPF Type Format (BTF), which allows it to accurately understand and check the kernel’s complex data structures, ensuring that the program’s operations on these structures are valid and safe.
Preventing Hardware Exceptions

Hardware exceptions, such as division by zero, can cause abrupt program terminations and kernel panics. To prevent this, the verifier includes checks for divisions by unknown scalars, ensuring that instructions are rewritten or handled in a manner consistent with aarch64 specifications, which dictate safe handling of such exceptions.

Through these mechanisms, the eBPF verifier plays a critical role in maintaining the security and stability of the kernel, making it an indispensable component of the eBPF infrastructure. It not only reinforces the system’s defenses but also upholds the integrity of operations that eBPF programs intend to perform, making it a quintessential part of the eBPF ecosystem.

How the eBPF Verifier Works

The eBPF verifier is essentially a sophisticated simulation engine that exhaustively tests every possible execution path of a given eBPF program. This simulation is not a mere theoretical exercise but a stringent enforcement of security and safety policies in kernel operations.

Follows control flow graph
The verifier begins its analysis by constructing and following the control flow graph (CFG) of the eBPF program. It carefully computes the set of possible states for each instruction, considering the BPF register set and stack. Safety checks are then performed depending on the current instruction context.

One of the critical aspects of this process is register spill/fill tracking for the program’s private BPF stack. This ensures that operations involving the stack do not lead to overflows or underflows, which could corrupt data or provide an attack vector.
Back-edges in control flow graph
To effectively manage loops within the eBPF program, the verifier identifies back-edges in the CFG. Bounded loops are handled by simulating all iterations up to a predefined limit, thus guaranteeing that loops will not lead to indefinite execution.
Dealing with potentially large number of states
The verifier must manage the complexity that comes with the large number of potential states in a program’s execution paths. It employs path pruning logic to compare the current state with prior states, assessing whether the current path is “equivalent” to prior paths and has a safe exit. This reduces the overall number of states that need to be considered.
Function-by-function verification for state reduction
To streamline the verification process, the verifier conducts a function-by-function analysis. This modular approach allows for a reduction in the number of states that need to be analyzed at any given time, thereby improving the efficiency of the verification.
On-demand scalar precision (back-)tracking for state reduction
The verifier uses on-demand scalar precision tracking to reduce the state space further. By back-tracking scalar values when necessary, the verifier can more accurately predict the program’s behavior, optimizing its analysis process.
Terminates with rejection upon surpassing “complexity” threshold
To maintain practical performance, the verifier has a “complexity” threshold. If a program’s analysis surpasses this threshold, the verifier will terminate the process and reject the program. This ensures that only programs that are within the manageable complexity are allowed to execute, balancing security with system performance.

Challenges

Despite its thoroughness, the eBPF verifier faces significant challenges:

Attractive target for exploitation when exposed to non-root users
As the verifier becomes more complex, it becomes an increasingly attractive target for exploitation. The programmability of eBPF, while powerful, also means that if an attacker were to bypass the verifier and gain execution within the OS kernel, the consequences could be severe.
Reasoning about verifier correctness is non-trivial
Ensuring the verifier’s correctness, especially concerning Spectre mitigations, is not a straightforward task. While there is some formal verification in place, it is only partial. Areas such as the Just-In-Time (JIT) compilers and abstract interpretation models are particularly challenging.
Occasions where valid programs get rejected
There is sometimes a disconnect between the optimizations performed by LLVM (the compiler infrastructure used to prepare eBPF programs) and the verifier’s ability to understand these optimizations, leading to valid programs being erroneously rejected.
“Stable ABI” for BPF program types
A “stable ABI” is vital so that BPF programs running in production do not break upon an OS kernel upgrade. However, maintaining this stability while also evolving the verifier and the BPF ecosystem presents its own set of challenges.
Performance vs. security considerations
Finally, the eternal trade-off between performance and security is pronounced in the verification of complex eBPF programs. While the verifier must be efficient to be practical, it also must not compromise on security, as the performance of the programs it is verifying is crucial for modern computing systems.

The eBPF verifier stands as a testament to the ingenuity in modern computing security, navigating the treacherous waters between maximum programmability and maintaining a fortress-like defense at the kernel level.

Other works to improve verifier

Specification and verification in the field: Applying formal methods to BPF just-in-time compilers in the Linux kernel: https://www.usenix.org/conference/osdi20/presentation/nelson
“Sound, Precise, and Fast Abstract Interpretation with Tristate Numbers”, Vishwanathan et al. https://arxiv.org/abs/2105.05398
“Eliminating bugs in BPF JITs using automated formal verification”, Nelson et al. https://arxiv.org/abs/2105.05398
“A proof-carrying approach to building correct and flexible BPF verifiers”, Nelson et al. https://linuxplumbersconf.org/event/7/contributions/685/
“Automatically optimizing BPF programs using program synthesis”, Xu et al. https://linuxplumbersconf.org/event/11/contributions/944/
“Simple and Precise Static Analysis of Untrusted Linux Kernel Extensions”, Gershuni et al. https://linuxplumbersconf.org/event/11/contributions/951/
“An Analysis of Speculative Type Confusion Vulnerabilities in the Wild”, Kirzner et al. https://www.usenix.org/conference/usenixsecurity21/presentation/kirzner

Together, these works signify a robust and multi-faceted research initiative aimed at bolstering the foundations of eBPF verification, ensuring that it remains a secure and performant tool for extending the capabilities of the Linux kernel.

Other reference for you to learn more about eBPF verifier:

BPF and Spectre: Mitigating transient execution attacks: https://popl22.sigplan.org/details/prisc-2022-papers/11/BPF-and-Spectre-Mitigating-transient-execution-attacks

Limitations in eBPF Access Control

After leading Linux distributions, such as Ubuntu and SUSE, have disallowed unprivileged usage of eBPF Socket Filter and CGroup programs, the current eBPF access control model only supports a single permission level. This level necessitates the CAP_SYS_ADMIN capability for all features. However, CAP_SYS_ADMIN carries inherent risks, particularly to containers, due to its extensive privileges.

Addressing this, Linux 5.6 introduces a more granular permission system by breaking down eBPF capabilities. Instead of relying solely on CAP_SYS_ADMIN, a new capability, CAP_BPF, is introduced for invoking the bpf syscall. Additionally, installing specific types of eBPF programs demands further capabilities, such as CAP_PERFMON for performance monitoring or CAP_NET_ADMIN for network administration tasks. This structure aims to mitigate certain types of attacks—like altering process memory or eBPF maps—that still require CAP_SYS_ADMIN.

Nevertheless, these segregated capabilities are not bulletproof against all eBPF-based attacks, such as Denial of Service (DoS) and information theft. Attackers may exploit these to craft eBPF-based malware specifically targeting containers. The emergence of eBPF in cloud-native applications exacerbates this threat, as users could inadvertently deploy containers that contain untrusted eBPF programs.

Compounding the issue, the risks associated with eBPF in containerized environments are not entirely understood. Some container services might unintentionally grant eBPF permissions, for reasons such as enabling filesystem mounting functionality. The existing permission model is inadequate in preventing misuse of these potentially harmful eBPF features within containers.

CAP_BPF

Traditionally, almost all BPF actions required CAP_SYS_ADMIN privileges, which also grant broad system access. Over time, there has been a push to separate BPF permissions from these root privileges. As a result, capabilities like CAP_PERFMON and CAP_BPF were introduced to allow more granular control over BPF operations, such as reading kernel memory and loading tracing or networking programs, without needing full system admin rights.

However, CAP_BPF’s scope is also ambiguous, leading to a perception problem. Unlike CAP_SYS_MODULE, which is well-defined and used for loading kernel modules, CAP_BPF lacks namespace constraints, meaning it can access all kernel memory rather than being container-specific. This broad access is problematic because verifier bugs in BPF programs can crash the kernel, considered a security vulnerability, leading to an excessive number of CVEs (Common Vulnerabilities and Exposures) being filed, even for bugs that are already fixed. This response to verifier bugs creates undue alarm and urgency to patch older kernel versions that may not have been updated.

Additionally, some security startups have been criticized for exploiting the fears around BPF’s capabilities to market their products, paradoxically using BPF itself to safeguard against the issues they highlight. This has led to a contradictory narrative where BPF is both demonized and promoted as a solution.

bpf namespace

The current security model requires the CAP_SYS_ADMIN capability for iterating BPF object IDs and converting these IDs to file descriptors (FDs). This is to prevent non-privileged users from accessing BPF programs owned by others, but it also restricts them from inspecting their own BPF objects, posing a challenge in container environments.

Users can run BPF programs with CAP_BPF and other specific capabilities, yet they lack a generic method to inspect these programs, as tools like bpftool need CAP_SYS_ADMIN. The existing workaround without CAP_SYS_ADMIN is deemed inconvenient, involving SCM_RIGHTS and Unix domain sockets for sharing BPF object FDs between processes.

To address these limitations, Yafang Shao proposes introducing a BPF namespace. This would allow users to create BPF maps, programs, and links within a specific namespace, isolating these objects from users in different namespaces. However, objects within a BPF namespace would still be visible to the parent namespace, enabling system administrators to maintain oversight.

The BPF namespace is conceptually similar to the PID namespace and is intended to be intuitive. The initial implementation focuses on BPF maps, programs, and links, with plans to extend this to other BPF objects like BTF and bpffs in the future. This could potentially enable container users to trace only the processes within their container without accessing data from other containers, enhancing security and usability in containerized environments.

reference:

BPF and security: https://lwn.net/Articles/946389/
Cross Container Attacks: The Bewildered eBPF on Clouds https://www.usenix.org/system/files/usenixsecurity23-he.pdf
bpf: Introduce BPF namespace: https://lwn.net/Articles/927354/
ebpf-running-in-linux-namespaces: https://stackoverflow.com/questions/48815633/ebpf-running-in-linux-namespaces

Unprivileged eBPF

The concept of unprivileged eBPF refers to the ability for non-root users to load eBPF programs into the kernel. This feature is controversial due to security implications and, as such, is currently turned off by default across all major Linux distributions. The concern stems from hardware vulnerabilities like Spectre to kernel bugs and exploits, which can be exploited by malicious eBPF programs to leak sensitive data or attack the system.

To combat this, mitigations have been put in place for various versions of these vulnerabilities, like v1, v2, and v4. However, these mitigations come at a cost, often significantly reducing the flexibility and performance of eBPF programs. This trade-off makes the feature unattractive and impractical for many users and use cases.

Trusted Unprivileged BPF

In light of these challenges, a middle ground known as “trusted unprivileged BPF” is being explored. This approach would involve an allowlist system, where specific eBPF programs that have been thoroughly vetted and deemed trustworthy could be loaded by unprivileged users. This vetting process would ensure that only secure, production-ready programs bypass the privilege requirement, maintaining a balance between security and functionality. It’s a step toward enabling more widespread use of eBPF without compromising the system’s integrity.

Permissive LSM hooks: Rejected upstream given LSMs enforce further restrictions

New Linux Security Module (LSM) hooks specifically for the BPF subsystem, with the intent of offering more granular control over BPF maps and BTF data objects. These are fundamental to the operation of modern BPF applications.

The primary addition includes two LSM hooks: bpf_map_create_security and bpf_btf_load_security, which provide the ability to override the default permission checks that rely on capabilities like CAP_BPF and CAP_NET_ADMIN. This new mechanism allows for finer control, enabling policies to enforce restrictions or bypass checks for trusted applications, shifting the decision-making to custom LSM policy implementations.

This approach allows for a safer default by not requiring applications to have BPF-related capabilities, which are typically required to interact with the kernel’s BPF subsystem. Instead, applications can run without such privileges, with only vetted and trusted cases being granted permission to operate as if they had elevated capabilities.
BPF token concept to delegate subset of BPF via token fd from trusted privileged daemon

the BPF token, a new mechanism allowing privileged daemons to delegate a subset of BPF functionality to trusted unprivileged applications. This concept enables containerized BPF applications to operate safely within user namespaces—a feature previously unattainable due to security restrictions with CAP_BPF capabilities. The BPF token is created and managed via kernel APIs, and it can be pinned within the BPF filesystem for controlled access. The latest version of the patch ensures that a BPF token is confined to its creation instance in the BPF filesystem to prevent misuse. This addition to the BPF subsystem facilitates more secure and flexible unprivileged BPF operations.
BPF signing as gatekeeper: application vs BPF program (no one-size-fits-all)

Song Liu has proposed a patch for unprivileged access to BPF functionality through a new device, /dev/bpf. This device controls access via two new ioctl commands that allow users with write permissions to the device to invoke sys_bpf(). These commands toggle the ability of the current task to call sys_bpf(), with the permission state being stored in the task_struct. This permission is also inheritable by new threads created by the task. A new helper function, bpf_capable(), is introduced to check if a task has obtained permission through /dev/bpf. The patch includes updates to documentation and header files.
RPC to privileged BPF daemon: Limitations depending on use cases/environment

The RPC approach (eg. bpfd) is similar to the BPF token concept, but it uses a privileged daemon to manage the BPF programs. This daemon is responsible for loading and unloading BPF programs, as well as managing the BPF maps. The daemon is also responsible for verifying the BPF programs before loading them. This approach is more flexible than the BPF token concept, as it allows for more fine-grained control over the BPF programs. However, it is also more complex, bring more maintenance challenges and possibilities for single points of failure.

reference

Permissive LSM hooks: https://lore.kernel.org/bpf/[email protected]/
BPF token concept: https://lore.kernel.org/bpf/[email protected]/
BPF signing using fsverity and LSM gatekeeper: https://www.youtube.com/watch?v=9p4qviq60z8
Sign the BPF bytecode: https://lpc.events/event/16/contributions/1357/attachments/1045/1999/BPF%20Signatures.pdf
bpfd: https://bpfd.dev/

Conclusion

As we have traversed the multifaceted domain of eBPF security, it’s clear that while eBPF’s verifier provides a robust first line of defense, there are inherent limitations within the current access control model that require attention. We have considered potential solutions from the realms of virtualization, software fault isolation, and formal methods to WebAssembly or userspace eBPF runtime, each offering unique approaches to fortify eBPF against vulnerabilities.

However, as with any complex system, new questions and challenges continue to surface. The gaps identified between the theoretical security models and their practical implementation invite continued research and experimentation. The future of eBPF security is not only promising but also demands a collective effort to ensure the technology can be adopted with confidence in its capacity to safeguard systems.

We are github.com/eunomia-bpf, build open source projects to make eBPF easier to use, and exploring new technologies, toolchains and runtimes related to eBPF.
For those interested in eBPF technology, check out our tutorial code repository at https://github.com/eunomia-bpf/bpf-developer-tutorial and our tutorials at https://eunomia.dev/tutorials/ for practical understanding and practice.

Posted 2023-09-27Updated 2023-11-2720 minutes read (About 3045 words)

The Evolution and Impact of eBPF: A list of Key Research Papers from Recent Years

This is a list of eBPF related papers I read in recent years, might be helpful for people who are interested in eBPF related research.

eBPF (extended Berkeley Packet Filter) is an emerging technology that allows safe execution of user-provided programs in the Linux kernel. It has gained widespread adoption in recent years for accelerating network processing, enhancing observability, and enabling programmable packet processing.

This document list some key research papers on eBPF over the past few years. The papers cover several aspects of eBPF, including accelerating distributed systems, storage, and networking, formally verifying the eBPF JIT compiler and verifier, applying eBPF for intrusion detection, and automatically generating hardware designs from eBPF programs.

Some key highlights:

eBPF enables executing custom functions in the kernel to accelerate distributed protocols, storage engines, and networking applications with improved throughput and lower latency compared to traditional userspace implementations.
Formal verification of eBPF components like JIT and verifier ensures correctness and reveals bugs in real-world implementations.
eBPF’s programmability and efficiency make it suitable for building intrusion detection and network monitoring applications entirely in the kernel.
Automated synthesis of hardware designs from eBPF programs allows software developers to quickly generate optimized packet processing pipelines in network cards.

The papers demonstrate eBPF’s versatility in accelerating systems, enhancing security, and simplifying network programming. As eBPF adoption grows, it is an important area of systems research with many open problems related to performance, safety, hardware integration, and ease of use.

If you have any suggestions or adding papers, please feel free to open an issue or PR. The list was created in 2023.10, New papers will be added in the future.

Check out our open-source projects at eunomia-bpf and eBPF tutorials at bpf-developer-tutorial. I’m also looking for a PhD position in the area of systems and networking in 2024/2025. My Github and email.

XRP: In-Kernel Storage Functions with eBPF

With the emergence of microsecond-scale NVMe storage devices, the Linux kernel storage stack overhead has become significant, almost doubling access times. We present XRP, a framework that allows applications to execute user-defined storage functions, such as index lookups or aggregations, from an eBPF hook in the NVMe driver, safely bypassing most of the kernel’s storage stack. To preserve file system semantics, XRP propagates a small amount of kernel state to its NVMe driver hook where the user-registered eBPF functions are called. We show how two key-value stores, BPF-KV, a simple B+-tree key-value store, and WiredTiger, a popular log-structured merge tree storage engine, can leverage XRP to significantly improve throughput and latency.

OSDI ‘22 Best Paper: https://www.usenix.org/conference/osdi22/presentation/zhong

Specification and verification in the field: Applying formal methods to BPF just-in-time compilers in the Linux kernel

This paper describes our experience applying formal methods to a critical component in the Linux kernel, the just-in-time compilers (“JITs”) for the Berkeley Packet Filter (BPF) virtual machine. We verify these JITs using Jitterbug, the first framework to provide a precise specification of JIT correctness that is capable of ruling out real-world bugs, and an automated proof strategy that scales to practical implementations. Using Jitterbug, we have designed, implemented, and verified a new BPF JIT for 32-bit RISC-V, found and fixed 16 previously unknown bugs in five other deployed JITs, and developed new JIT optimizations; all of these changes have been upstreamed to the Linux kernel. The results show that it is possible to build a verified component within a large, unverified system with careful design of specification and proof strategy.

OSDI 20: https://www.usenix.org/conference/osdi20/presentation/nelson

λ-IO: A Unified IO Stack for Computational Storage

The emerging computational storage device offers an opportunity for in-storage computing. It alleviates the overhead of data movement between the host and the device, and thus accelerates data-intensive applications. In this paper, we present λ-IO, a unified IO stack managing both computation and storage resources across the host and the device. We propose a set of designs – interface, runtime, and scheduling – to tackle three critical issues. We implement λ-IO in full-stack software and hardware environment, and evaluate it with synthetic and real applications against Linux IO, showing up to 5.12× performance improvement.

FAST23: https://www.usenix.org/conference/fast23/presentation/yang-zhe

Extension Framework for File Systems in User space

User file systems offer numerous advantages over their in-kernel implementations, such as ease of development and better system reliability. However, they incur heavy performance penalty. We observe that existing user file system frameworks are highly general; they consist of a minimal interposition layer in the kernel that simply forwards all low-level requests to user space. While this design offers flexibility, it also severely degrades performance due to frequent kernel-user context switching.

This work introduces ExtFUSE, a framework for developing extensible user file systems that also allows applications to register “thin” specialized request handlers in the kernel to meet their specific operative needs, while retaining the complex functionality in user space. Our evaluation with two FUSE file systems shows that ExtFUSE can improve the performance of user file systems with less than a few hundred lines on average. ExtFUSE is available on GitHub.

ATC 19: https://www.usenix.org/conference/atc19/presentation/bijlani

Electrode: Accelerating Distributed Protocols with eBPF

Implementing distributed protocols under a standard Linux kernel networking stack enjoys the benefits of load-aware CPU scaling, high compatibility, and robust security and isolation. However, it suffers from low performance because of excessive user-kernel crossings and kernel networking stack traversing. We present Electrode with a set of eBPF-based performance optimizations designed for distributed protocols. These optimizations get executed in the kernel before the networking stack but achieve similar functionalities as were implemented in user space (e.g., message broadcasting, collecting quorum of acknowledgments), thus avoiding the overheads incurred by user-kernel crossings and kernel networking stack traversing. We show that when applied to a classic Multi-Paxos state machine replication protocol, Electrode improves its throughput by up to 128.4% and latency by up to 41.7%.

NSDI 23: https://www.usenix.org/conference/nsdi23/presentation/zhou

BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing

In-memory key-value stores are critical components that help scale large internet services by providing low-latency access to popular data. Memcached, one of the most popular key-value stores, suffers from performance limitations inherent to the Linux networking stack and fails to achieve high performance when using high-speed network interfaces. While the Linux network stack can be bypassed using DPDK based solutions, such approaches require a complete redesign of the software stack and induce high CPU utilization even when client load is low.

To overcome these limitations, we present BMC, an in-kernel cache for Memcached that serves requests before the execution of the standard network stack. Requests to the BMC cache are treated as part of the NIC interrupts, which allows performance to scale with the number of cores serving the NIC queues. To ensure safety, BMC is implemented using eBPF. Despite the safety constraints of eBPF, we show that it is possible to implement a complex cache service. Because BMC runs on commodity hardware and requires modification of neither the Linux kernel nor the Memcached application, it can be widely deployed on existing systems. BMC optimizes the processing time of Facebook-like small-size requests. On this target workload, our evaluations show that BMC improves throughput by up to 18x compared to the vanilla Memcached application and up to 6x compared to an optimized version of Memcached that uses the SO_REUSEPORT socket flag. In addition, our results also show that BMC has negligible overhead and does not deteriorate throughput when treating non-target workloads.

NSDI 21: https://www.usenix.org/conference/nsdi21/presentation/ghigoff

hXDP: Efficient Software Packet Processing on FPGA NICs

FPGA accelerators on the NIC enable the offloading of expensive packet processing tasks from the CPU. However, FPGAs have limited resources that may need to be shared among diverse applications, and programming them is difficult.

We present a solution to run Linux’s eXpress Data Path programs written in eBPF on FPGAs, using only a fraction of the available hardware resources while matching the performance of high-end CPUs. The iterative execution model of eBPF is not a good fit for FPGA accelerators. Nonetheless, we show that many of the instructions of an eBPF program can be compressed, parallelized or completely removed, when targeting a purpose-built FPGA executor, thereby significantly improving performance. We leverage that to design hXDP, which includes (i) an optimizing-compiler that parallelizes and translates eBPF bytecode to an extended eBPF Instruction-set Architecture defined by us; a (ii) soft-processor to execute such instructions on FPGA; and (iii) an FPGA-based infrastructure to provide XDP’s maps and helper functions as defined within the Linux kernel.

We implement hXDP on an FPGA NIC and evaluate it running real-world unmodified eBPF programs. Our implementation is clocked at 156.25MHz, uses about 15% of the FPGA resources, and can run dynamically loaded programs. Despite these modest requirements, it achieves the packet processing throughput of a high-end CPU core and provides a 10x lower packet forwarding latency.

OSDI 20: https://www.usenix.org/conference/osdi20/presentation/brunella

Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code

Microservices are becoming more complicated, posing new challenges for traditional performance monitoring solutions. On the one hand, the rapid evolution of microservices places a significant burden on the utilization and maintenance of existing distributed tracing frameworks. On the other hand, complex infrastructure increases the probability of network performance problems and creates more blind spots on the network side. In this paper, we present DeepFlow, a network-centric distributed tracing framework for troubleshooting microservices. DeepFlow provides out-of-the-box tracing via a network-centric tracing plane and implicit context propagation. In addition, it eliminates blind spots in network infrastructure, captures network metrics in a low-cost way, and enhances correlation between different components and layers. We demonstrate analytically and empirically that DeepFlow is capable of locating microservice performance anomalies with negligible overhead. DeepFlow has already identified over 71 critical performance anomalies for more than 26 companies and has been utilized by hundreds of individual developers. Our production evaluations demonstrate that DeepFlow is able to save users hours of instrumentation efforts and reduce troubleshooting time from several hours to just a few minutes.

SIGCOMM 23: https://dl.acm.org/doi/10.1145/3603269.3604823

Fast In-kernel Traffic Sketching in eBPF

The extended Berkeley Packet Filter (eBPF) is an infrastructure that allows to dynamically load and run micro-programs directly in the Linux kernel without recompiling it.

In this work, we study how to develop high-performance network measurements in eBPF. We take sketches as case-study, given their ability to support a wide-range of tasks while providing low-memory footprint and accuracy guarantees. We implemented NitroSketch, the state-of-the-art sketch for user-space networking and show that best practices in user-space networking cannot be directly applied to eBPF, because of its different performance characteristics. By applying our lesson learned we improve its performance by 40% compared to a naive implementation.

SIGCOMM 23: https://dl.acm.org/doi/abs/10.1145/3594255.3594256

SPRIGHT: extracting the server from serverless computing! high-performance eBPF-based event-driven, shared-memory processing

Serverless computing promises an efficient, low-cost compute capability in cloud environments. However, existing solutions, epitomized by open-source platforms such as Knative, include heavyweight components that undermine this goal of serverless computing. Additionally, such serverless platforms lack dataplane optimizations to achieve efficient, high-performance function chains that facilitate the popular microservices development paradigm. Their use of unnecessarily complex and duplicate capabilities for building function chains severely degrades performance. ‘Cold-start’ latency is another deterrent.

We describe SPRIGHT, a lightweight, high-performance, responsive serverless framework. SPRIGHT exploits shared memory processing and dramatically improves the scalability of the dataplane by avoiding unnecessary protocol processing and serialization-deserialization overheads. SPRIGHT extensively leverages event-driven processing with the extended Berkeley Packet Filter (eBPF). We creatively use eBPF’s socket message mechanism to support shared memory processing, with overheads being strictly load-proportional. Compared to constantly-running, polling-based DPDK, SPRIGHT achieves the same dataplane performance with 10× less CPU usage under realistic workloads. Additionally, eBPF benefits SPRIGHT, by replacing heavyweight serverless components, allowing us to keep functions ‘warm’ with negligible penalty.

Our preliminary experimental results show that SPRIGHT achieves an order of magnitude improvement in throughput and latency compared to Knative, while substantially reducing CPU usage, and obviates the need for ‘cold-start’.

https://dl.acm.org/doi/10.1145/3544216.3544259

Programmable System Call Security with eBPF

System call filtering is a widely used security mechanism for protecting a shared OS kernel against untrusted user applications. However, existing system call filtering techniques either are too expensive due to the context switch overhead imposed by userspace agents, or lack sufficient programmability to express advanced policies. Seccomp, Linux’s system call filtering module, is widely used by modern container technologies, mobile apps, and system management services. Despite the adoption of the classic BPF language (cBPF), security policies in Seccomp are mostly limited to static allow lists, primarily because cBPF does not support stateful policies. Consequently, many essential security features cannot be expressed precisely and/or require kernel modifications.
In this paper, we present a programmable system call filtering mechanism, which enables more advanced security policies to be expressed by leveraging the extended BPF language (eBPF). More specifically, we create a new Seccomp eBPF program type, exposing, modifying or creating new eBPF helper functions to safely manage filter state, access kernel and user state, and utilize synchronization primitives. Importantly, our system integrates with existing kernel privilege and capability mechanisms, enabling unprivileged users to install advanced filters safely. Our evaluation shows that our eBPF-based filtering can enhance existing policies (e.g., reducing the attack surface of early execution phase by up to 55.4% for temporal specialization), mitigate real-world vulnerabilities, and accelerate filters.

https://arxiv.org/abs/2302.10366

Cross Container Attacks: The Bewildered eBPF on Clouds

The extended Berkeley Packet Filter (eBPF) provides powerful and flexible kernel interfaces to extend the kernel functions for user space programs via running bytecode directly in the kernel space. It has been widely used by cloud services to enhance container security, network management, and system observability. However, we discover that the offensive eBPF that have been extensively discussed in Linux hosts can bring new attack surfaces to containers. With eBPF tracing features, attackers can break the container’s isolation and attack the host, e.g., steal sensitive data, DoS, and even escape the container. In this paper, we study the eBPF-based cross container attacks and reveal their security impacts in real world services. With eBPF attacks, we successfully compromise five online Jupyter/Interactive Shell services and the Cloud Shell of Google Cloud Platform. Furthermore, we find that the Kubernetes services offered by three leading cloud vendors can be exploited to launch cross-node attacks after the attackers escape the container via eBPF. Specifically, in Alibaba’s Kubernetes services, attackers can compromise the whole cluster by abusing their over-privileged cloud metrics or management Pods. Unfortunately, the eBPF attacks on containers are seldom known and can hardly be discovered by existing intrusion detection systems. Also, the existing eBPF permission model cannot confine the eBPF and ensure secure usage in shared-kernel container environments. To this end, we propose a new eBPF permission model to counter the eBPF attacks in containers.

https://www.usenix.org/conference/usenixsecurity23/presentation/he

Comparing Security in eBPF and WebAssembly

This paper examines the security of eBPF and WebAssembly (Wasm), two technologies that have gained widespread adoption in recent years, despite being designed for very different use cases and environments. While eBPF is a technology primarily used within operating system kernels such as Linux, Wasm is a binary instruction format designed for a stack-based virtual machine with use cases extending beyond the web. Recognizing the growth and expanding ambitions of eBPF, Wasm may provide instructive insights, given its design around securely executing arbitrary untrusted programs in complex and hostile environments such as web browsers and clouds. We analyze the security goals, community evolution, memory models, and execution models of both technologies, and conduct a comparative security assessment, exploring memory safety, control flow integrity, API access, and side-channels. Our results show that eBPF has a history of focusing on performance first and security second, while Wasm puts more emphasis on security at the cost of some runtime overheads. Considering language-based restrictions for eBPF and a security model for API access are fruitful directions for future work.

https://dl.acm.org/doi/abs/10.1145/3609021.3609306

More about can be found in the first workshop: https://conferences.sigcomm.org/sigcomm/2023/workshop-ebpf.html

A flow-based IDS using Machine Learning in eBPF

eBPF is a new technology which allows dynamically loading pieces of code into the Linux kernel. It can greatly speed up networking since it enables the kernel to process certain packets without the involvement of a userspace program. So far eBPF has been used for simple packet filtering applications such as firewalls or Denial of Service protection. We show that it is possible to develop a flow based network intrusion detection system based on machine learning entirely in eBPF. Our solution uses a decision tree and decides for each packet whether it is malicious or not, considering the entire previous context of the network flow. We achieve a performance increase of over 20% compared to the same solution implemented as a userspace program.

https://arxiv.org/abs/2102.09980

Femto-containers: lightweight virtualization and fault isolation for small software functions on low-power IoT microcontrollers

Low-power operating system runtimes used on IoT microcontrollers typically provide rudimentary APIs, basic connectivity and, sometimes, a (secure) firmware update mechanism. In contrast, on less constrained hardware, networked software has entered the age of serverless, microservices and agility. With a view to bridge this gap, in the paper we design Femto-Containers, a new middleware runtime which can be embedded on heterogeneous low-power IoT devices. Femto-Containers enable the secure deployment, execution and isolation of small virtual software functions on low-power IoT devices, over the network. We implement Femto-Containers, and provide integration in RIOT, a popular open source IoT operating system. We then evaluate the performance of our implementation, which was formally verified for fault-isolation, guaranteeing that RIOT is shielded from logic loaded and executed in a Femto-Container. Our experiments on various popular micro-controller architectures (Arm Cortex-M, ESP32 and RISC-V) show that Femto-Containers offer an attractive trade-off in terms of memory footprint overhead, energy consumption, and security.

https://dl.acm.org/doi/abs/10.1145/3528535.3565242

Posted 2023-07-27Updated 2023-11-27an hour read (About 6952 words)

eBPF 软件应用市场设计方案

背景

eBPF（扩展伯克利数据包过滤器）是 Linux 内核的一项技术，它允许在内核空间运行一些预定义的、有限的程序，不需要修改内核代码或加载任何内核模块。由于其高效和灵活的特性，eBPF 被广泛应用于网络流量过滤、性能监控、安全和其他领域。然而，目前社区中的 eBPF 程序分发不够统一和规范，不同的组件和工具集都有自己的管理和打包方式，例如 cilium、bcc 和 openEuler 内核的 eBPF 插件。eBPF 程序也可能使用多种用户态语言开发（如 Go，Rust，C/C++，Python 脚本等），具有各种不同的接口，甚至并没有预先编译好的二进制程序，用户必须自行配置环境和编译才能使用。

这种分散和缺乏标准化的情况带来了一些问题：首先，eBPF 程序的升级和功能添加通常依赖于整体软件的发布，这可能导致升级周期过长，单个 eBPF 组件的发布需要等待整体软件的发布周期；其次，开发 eBPF 程序需要对内核 eBPF 程序框架有深入的理解，这增加了开发难度。因此，这个项目的目标是希望能借鉴 docker hub 的管理模式，提供一种统一的 eBPF 程序管理方式、openEuler 内核 eBPF 开发模板，以及一个编译和分发工具，以解决上述问题。

综上，整个 eBPF 生态缺少对新手友好的开发方案，和一个类似于 Github 或 Docker hub 的通用分发和托管平台。

项目产出要求

项目产出要求主要分为两个部分：

构建 openEuler 应用市场的基础设施，提供类似于 docker hub 的 eBPF 程序管理模式：这意味着我们需要创建一个可以公开存储、管理和分发 eBPF 程序的平台，就像 docker hub 对 docker 镜像所做的那样。这个平台应该允许开发者上传他们的 eBPF 程序，用户可以下载、安装和升级这些程序。此外，这个平台应该支持版本管理，以便用户可以选择安装特定版本的 eBPF 程序。
提供 openEuler eBPF 软件编写模板，简化编译和打包及分发流程：这意味着我们需要创建一个模板，来帮助开发者更容易地编写、编译、打包和分发他们的 eBPF 程序。这个模板应该包含基本的代码结构、编译和打包脚本，以及使用说明。这样，开发者只需要按照模板来编写他们的代码，然后使用脚本来编译、打包和分发他们的程序。对于初学者而言，提供一个模板也可以帮助更快速的上手进行开发工作。

需求分析

理解用户需求：项目的主要用户为两类：开发者和用户。开发者需要一个方便的平台来上传、管理和分发他们的 eBPF 程序，而用户需要一个便利的方式来搜索、下载、安装和更新 eBPF 程序。
功能需求定义：
- eBPF 程序存储和管理平台（网页前端）：平台需要支持开发者上传 eBPF 程序，并提供版本控制功能。用户应能下载、安装和更新程序。此外，平台应具备良好的用户界面和易用性，同时提供搜索功能，以帮助用户找到所需的 eBPF 程序。
- eBPF 软件编写模板：提供一个模板以帮助开发者更方便地编写、编译、打包和分发他们的 eBPF 程序。模板应包含基本的代码结构、编译和打包脚本，以及使用说明。同时，模板需要满足以下特性：可移植性、隔离性、跨语言支持和轻量级。
- 包管理器：用户可以用一行命令就能下载、启动程序，无需配置环境或重新编译，或者一行命令创建新项目、打包发布项目。管理器需要提供清晰的文档，方便用户使用。
非功能需求定义：
- 性能：平台需要能够快速处理上传和下载请求，即使在高并发请求的情况下，性能也不应下降。
- 安全性：所有上传和下载的 eBPF 程序都应保证安全性。
- 稳定性：平台需要具备高可用性，确保用户在任何时候都能访问。
- 兼容性：编写模板需要兼容多种用户态语言（如 C、Go、Rust、Java、TypeScript等），以适应不同开发者的需求。

打包发布格式与存储格式

打包发布格式与存储格式是项目的关键部分，因为它们决定了如何将 eBPF 程序打包、分发和存储。

OCI 镜像

OCI（Open Container Initiative）是一个开放的行业标准，旨在定义容器格式和运行时的规范，以确保所有容器运行时（如 Docker、containerd、CRI-O 等）之间的互操作性。OCI 规范主要包括两部分：

运行时规范（runtime-spec）：定义了容器运行时的行为，包括如何执行容器以及容器应该满足哪些条件等。
镜像规范（image-spec）：定义了容器镜像的格式，包括镜像的层次结构、配置、文件系统等。

OCI registry 则是用于存储和分发 OCI 镜像的服务。Docker Hub 和 Google Container Registry 都是 OCI registry 的例子。它们提供了一个公开的平台，用户可以在上面上传、存储和分发他们的容器镜像。

OCI 镜像格式主要由两部分组成：manifest 和 layers。Manifest 是一个 JSON 文件，描述了镜像的元数据，包括镜像的配置以及构成镜像的各个层。Layers 则是镜像的实际内容，每一层都是一个文件系统的增量变化。当运行一个 OCI 镜像时，这些层会被叠加在一起，形成一个统一的文件系统。

首先，我们来看一下 OCI 镜像的 manifest。Manifest 是一个 JSON 文件，它包含了镜像的元数据，例如镜像的配置和构成镜像的各个层。一个典型的 manifest 文件可能看起来像这样：

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "size": 7023,
    "digest": "sha256:b5b2b2c507a0944348e0303114d8d93aaaa081732b86451d9bce1f432a537bc7"
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "size": 32654,
      "digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0"
    },
    ...
  ]
}

在这个例子中，config 字段指向了一个包含镜像配置的 JSON 文件的摘要（digest），而 layers 字段则是一个数组，包含了构成镜像的各个层的信息。每一层都有一个媒体类型（mediaType）、大小（size）和摘要（digest）。

然后，我们来看一下 OCI 镜像的 layers。每一层都是一个文件系统的增量变化，它包含了从上一层到当前层的所有文件和目录的添加、修改和删除。当运行一个 OCI 镜像时，这些层会被叠加在一起，形成一个统一的文件系统。

例如，假设我们有一个 OCI 镜像，它有两层。第一层添加了一个文件 /etc/passwd，第二层修改了这个文件。当我们运行这个镜像时，我们会看到第二层的修改，因为它覆盖了第一层的文件。

这就是 OCI 镜像格式的基本概念。通过使用 manifest 和 layers，我们可以创建非常复杂和灵活的镜像，满足各种不同的需求。

Docker 和 OCI

Docker 镜像和 OCI（Open Container Initiative）镜像在很大程度上是相同的，因为 OCI 镜像规范实际上就是从 Docker 镜像规范中派生出来的。

Docker 是容器技术的先驱，它定义了自己的容器和镜像格式。然而，随着容器技术的发展和其他容器运行时（如 rkt、containerd 等）的出现，业界开始寻求一种标准化的容器和镜像格式，以确保不同的容器运行时之间的互操作性。这就是 OCI 的由来。

OCI 是一个开放的行业标准，它定义了容器格式和运行时的规范。OCI 镜像规范就是基于 Docker 镜像规范创建的，它保留了 Docker 镜像的主要特性，如镜像的层次结构、镜像的分发和存储等，同时也添加了一些新的特性，如更严格的规范定义、更多的安全特性等。

因此，你可以把 OCI 镜像看作是 Docker 镜像的一个超集。实际上，大多数现代的容器运行时，包括 Docker 自己，都支持 OCI 镜像格式。这意味着你可以在 Docker 中运行 OCI 镜像，也可以在其他支持 OCI 的容器运行时中运行 Docker 镜像。

标准容器的五个原则

定义了一个名为标准容器的软件交付单元。标准容器的目标是将软件组件及其所有依赖项封装在一个自描述和可移植的格式中，以便任何符合标准的运行时都可以在不需要额外依赖的情况下运行它，而不受底层机器和容器内容的影响。

标准容器的规范定义了：

配置文件格式
一组标准操作
执行环境。

这与运输行业使用的物理运输容器有很大的类比。运输容器是交付的基本单位，它们可以被举起、堆放、锁定、装载、卸载和标记。通过标准化容器本身，无论其内容如何，都可以定义一组一致、更流畅和高效的流程。对于软件，标准容器通过成为软件包的基本标准交付单元，提供了类似的功能。

1. 标准操作

标准容器定义了一组标准操作。可以使用标准容器工具创建、启动和停止它们；使用标准文件系统工具复制和快照它们；使用标准网络工具下载和上传它们。

2. 不受内容限制

标准容器是不受内容限制的:所有标准操作的效果都是相同的，无论其内容如何。无论其包含一个postgres数据库，一个带有其依赖项和应用程序服务器的php应用程序，还是Java构建工件，它们都以相同的方式启动。

3. 不受基础设施限制

标准容器是不受基础设施限制的:它们可以在任何OCI支持的基础设施中运行。例如，标准容器可以捆绑在笔记本电脑上，上传到云存储，由弗吉尼亚州的构建服务器运行和快照，上传到在自制私有云集群中的10个分段服务器，然后发送到3个公共云区域的30个生产实例。

4. 自动化设计

标准容器是为自动化设计的：因为它们提供相同的标准操作，无论内容和基础设施如何，标准容器都非常适合自动化。事实上，你可以说自动化是它们的秘密武器。

许多曾经需要耗费时间和容易出错的人力的事情现在可以编程完成。在标准容器之前，当一个软件组件在生产环境中运行时，它已经由10个不同的人在10台不同的计算机上分别构建、配置、打包、文档化、修补、供应商化、模板化、微调和仪器化。生成失败，库冲突，镜像崩溃，便笺丢失，日志错位，集群更新半破。这个过程缓慢、效率低下、花费巨大，而且完全取决于语言和基础设施提供者。

5. 工业级交付

标准容器使工业级交付成为现实。利用上述所有属性，标准容器使大型和小型企业能够简化和自动化其软件交付流程。无论是内部devOps流程还是外部基于客户的软件交付机制，标准容器正在改变社区对软件打包和交付的思考方式。.

https://github.com/opencontainers/runtime-spec/blob/main/runtime.md

eBPF OCI

我们可以为特定的 eBPF 程序定制专门的 OCI 镜像格式，并且使用标准的 OCI registry 来同时存储和分发不同的镜像，例如通常的 Docker 镜像，仅内核态的 eBPF 应用，eBPF 平台插件等。

要添加新的 eBPF OCI 类型，我们需要定义一个新的 OCI 镜像格式，这个格式应该包含运行 eBPF 程序所需的所有文件和配置。例如，我们可以定义一个包含 eBPF 程序、加载程序以及相关配置的 OCI 镜像。然后，我们可以使用标准的 OCI 工具（如 Docker 或 Buildah）来创建、管理和分发这些镜像。

以下是一些可能的存储格式：

用户态 + 内核态：这种格式的镜像包含了运行 eBPF 程序所需的所有用户态和内核态组件。这可能包括 eBPF 程序本身、加载程序到内核的工具（如 bpftool 或 libbpf）、以及任何必要的用户态库或服务。这种格式的优点是它提供了一个完整的运行环境，用户无需安装任何额外的依赖项。然而，这也可能使得镜像变得相对较大。
仅内核态：这种格式的镜像只包含 eBPF 程序本身和加载程序到内核的工具。这种格式的优点是它非常轻量，适合于资源受限的环境。然而，用户可能需要手动安装和配置任何必要的用户态组件。

打包、运行时分类：

仅内核态；
传统的 Docker 镜像；
内核态 + 一些配置文件、Shell 脚本，需要转发和解压到不同的地方；

优缺点

使用 OCI 镜像作为打包发布格式与存储格式的优点包括：

标准化：OCI 镜像格式是一个开放的标准，被广泛接受和使用。使用 OCI 镜像格式可以使得 eBPF 程序的打包、分发和部署流程与现有的容器化应用流程保持一致，降低了用户的学习成本。
易于管理：OCI 镜像可以被存储在各种容器镜像仓库中，可以利用现有的容器镜像管理工具进行管理。
易于分发：OCI 镜像可以被轻松地推送到远程的镜像仓库中，用户可以从镜像仓库中拉取镜像，进行部署。
对于内核态的应用而言，设计一种新的 OCI 镜像格式非常轻量级。
安全性：OCI 镜像格式支持数字签名和加密，可以确保镜像的完整性和安全性。
灵活性：OCI 镜像格式允许使用者根据自己的需求选择使用哪个镜像，以及如何配置镜像。由于OCI 镜像格式是开放的和标准化的，容器厂商和开发者可以基于此进行扩展和创新。

然而，使用 OCI 镜像作为打包发布格式与存储格式也可能有一些缺点：

镜像大小：OCI 镜像可能会比其他格式的二进制打包更大，这可能会增加存储和传输的成本。
兼容性问题：虽然 OCI 镜像格式是一个开放的标准，但是不同的容器运行时可能对 OCI 镜像的支持程度不同，这可能会导致一些兼容性问题。

总的来说，使用 OCI 镜像作为打包发布格式与存储格式是一个值得考虑的方案，它可以提供一种标准化、易于管理和分发的方式来处理 eBPF 程序。

案例

目前 bumblebee 项目和 eunomia-bpf 项目都使用了 OCI 镜像来进行存储，分别有 1.1k Github star 和 300+ Gitub star.

bumblebee

项目 bumblebee 是一个用于创建、管理和发布 eBPF 程序的工具，它使用 OCI 镜像作为打包发布格式与存储格式。这个项目的特点包括：

提供了一种新的方式来打包、分发和部署 eBPF 程序，使得管理 eBPF 程序变得更加方便。
使用 OCI 镜像作为打包发布格式与存储格式，这使得 eBPF 程序可以像容器镜像一样被管理和分发。
提供了一套完整的工具链，包括创建、构建、推送和运行 eBPF 程序的工具。

它的 OCI 镜像定义如下：

[
  {
    "mediaType": "application/ebpf.oci.image.config.v1+json",
    "digest": "sha256:d0a165298ae270c5644be8e9938036a3a7a5191f6be03286c40874d761c18abf",
    "size": 15,
    "annotations": {
      "org.opencontainers.image.title": "config.json"
    }
  },
  {
    "mediaType": "application/ebpf.oci.image.program.v1+binary",
    "digest": "sha256:5e82b945b59d03620fb360193753cbd08955e30a658dc51735a0fcbc2163d41c",
    "size": 1043056,
    "annotations": {
      "org.opencontainers.image.title": "program.o"
    }
  }
]

参考：https://github.com/solo-io/bumblebee

eunomia-bpf

eunomia-bpf 也提供了类似的使用方式，使用 OCI 镜像从云端运行 eBPF 程序，不过使用的是 wasm 的 OCI 镜像格式：

Untitled

用户体验设计

eBPF 软件编写模板

为了降低开发者的入门门槛并提高开发效率，我们提供了一系列的 eBPF 项目模板。这些模板基于不同的编程语言和框架，包括 C 语言和 libbpf、Go 语言和 cilium/ebpf、Rust 语言和 libbpf-rs，以及 C 语言和 eunomia-bpf。开发者可以根据自己的需求和熟悉的语言选择合适的模板进行开发。

这是目前 eunomia-bpf 社区已经提供的内容，我们准备了一系列 GitHub 模板，以便您快速启动一个全新的eBPF项目。只需在GitHub上点击 Use this template 按钮，即可开始使用。

https://github.com/eunomia-bpf/libbpf-starter-template C 语言和 libbpf 框架的eBPF 项目模板
https://github.com/eunomia-bpf/cilium-ebpf-starter-template：基于 Go 语言和cilium/框架的的 eBPF 项目模板
https://github.com/eunomia-bpf/libbpf-rs-starter-template：基于 Rust 语言和libbpf-rs 框架的 eBPF 项目模板
https://github.com/eunomia-bpf/eunomia-template：基于 C 语言和 eunomia-bpf 框架的eBPF 项目模板

以 libbpf-starter-template 为例，这是一个基于 C 语言和 libbpf 框架的 eBPF 项目模板。它提供了一套完整的项目结构，包括源代码目录、头文件目录、构建脚本等，以及一份详细的 README 文档，帮助开发者快速理解和使用模板。

此外，这个模板还内置了 Dockerfile 和 GitHub Actions，支持容器化环境的构建和自动化的构建、测试和发布流程。这意味着开发者可以专注于 eBPF 程序的开发，而无需花费大量时间在环境配置和流程管理上。

所有的模板都托管在 GitHub 上，并开放源代码，遵循 Apache-2.0 许可证。开发者可以自由使用和修改这些模板，以适应自己的项目需求。

我们计划将这些模板进一步完善，并且适配 Gitee 的基础设施，并在 openEuler 仓库中发布，以便更多的开发者可以使用。我们相信，这些模板将大大提高 eBPF 程序的开发效率，推动 eBPF 生态的发展。

包管理器

我们将使用 Docker 或 OCI（Open Container Initiative）来打包和存储 eBPF Hub 的内容。这些内容将被存储在 OCI 镜像仓库中，用户可以通过命令行工具在本地一键拉取和使用。

角色 1：普通用户/user

考虑一个开发人员的用例，他想使用 eBPF 二进制文件或程序，但不知道如何或在哪里找到它。他可以直接运行以下命令：

1	$ ecli run ghcr.io/eunomia-bpf/opensnoop:latest

这将运行一个名为 “opensnoop” 的程序。如果本地没有这个程序，命令将从网络上的相应仓库下载它。用户也可以指定版本号，使用 HTTP API，或者指定本地路径来运行程序。例如：

$ ecli install ghcr.io/eunomia-bpf/opensnoop:latest
--> cp xxx
--> mv yyy
$ ecli run ghcr.io/eunomia-bpf/opensnoop:latest
根据 config.json 去执行具体的运行时
--> docker run xxx
--> bee run xxx
--> ./run.sh xxxx
--> exporter xxxx
--> 裸机运行 ./aaa
$ ecli stop ghcr.io/eunomia-bpf/opensnoop:latest
--> exporter stop ....
$ ecli run ./opensnoop

用户还可以使用参数来运行程序，例如：

1	$ ecli run ghcr.io/eunomia-bpf/opensnoop:latest -h

“run” 命令实际上包含了 “pull” 命令。如果本地没有对应的 eBPF 文件，命令将从网络上下载它。如果本地有，命令将直接使用本地的文件。例如：

1 2	$ ecli pull ghcr.io/eunomia-bpf/sigsnoop:latest $ ecli run ghcr.io/eunomia-bpf/sigsnoop:latest

用户可以切换源，例如从 GitHub 切换到 ecli 静态服务器

角色 2：通用 ebpf 数据文件发布者/ebpf developer

我们的第二个角色是一个开发人员，他想要创建一个通用的 eBPF 工程，并在任何机器和操作系统上分发和运行它。这对于命令行工具或者可以直接在 Shell 中运行的任何东西都很有用，也可以作为大型项目的插件使用。

开发人员可以使用以下命令来生成 ebpf 数据文件：

$ ecli init opensnoop
$ cd opensnoop
$ ls
$ ecli build
$ sudo ./ecli run opensnoop -h

开发人员还可以发布 ebpf 数据文件。他们可以使用以下命令来登录，构建，发布，和推送新的文件：

$ ecli login
$ ecli build ghcr.io/eunomia-bpf/sigsnoop:latest
$ ecli publish
$ ecli push ghcr.io/eunomia-bpf/sigsnoop:latest

eBPF 程序存储和管理平台（网页前端）

我们希望创建一个平台，该平台可用于公开存储、管理和分发 eBPF 程序，就像 Docker Hub 对 Docker 镜像所做的那样。

我们将采用流行的前端技术来建立一个网页平台，以更友好的方式提供相关的发布和检索服务。

页面内容（以 Docker 举例）

主页：

主页将展示一些特色的 eBPF 程序，以及最新或最受欢迎的 eBPF 程序，以及简单的 logo 和介绍。此外，主页还将提供一个搜索框，用户可以输入关键词来搜索他们需要的 eBPF 程序。

例如：

Untitled

项目详情页面：

项目详情页面将展示特定 eBPF 程序的详细信息，包括其描述、版本、作者、源代码链接等。此外，这个页面还将提供一个“下载”按钮，用户可以点击这个按钮来下载 eBPF 程序。

Untitled

搜索结果页面：

搜索结果页面将展示用户搜索关键词的结果。每个结果将包括 eBPF 程序的名称、简短描述和一个链接，该链接将指向相应的项目详情页面。

Untitled

关于页面：

关于页面将提供关于这个平台的信息，包括其目的、如何使用、联系信息等。

额外页面

用户个人页面：

用户个人页面将展示用户的个人信息，包括他们上传的 eBPF 程序、他们的收藏、他们的关注等。用户可以在这个页面管理他们的 eBPF 程序。

Untitled

系统设计方案：Serverless 架构

我们希望采用前后端分离的方案，将前后端代码分别部署到 Vercel。这种设计架构具有灵活性、可维护性和高性能，可以让前端和后端代码分别部署和维护，更快地修复和更新代码，同时减少整个开发流程的时间成本和维护成本。

Serverless 架构本身具有弹性扩展能力、依托于 Vercel 平台的高可用性、更快的部署时间、更好的安全性，不仅让平台本身能够应对突发的高负载请求，而且可以让开发人员更好地应对业务需求和变化，提高开发效率和应用程序的性能和可维护性。

Untitled

API 设计

实现简单的登录注册功能、以及上传、查找 eBPF 项目的功能

Untitled

Serverless SQL 数据库

可以使用 Serverless SQL 数据库来存储元信息，例如 Vercel Postgres 作为数据库应用存储平台的关键信息。

Untitled

进度规划

主要分工为三个部分：

模板（已完成一部分内容）
命令行包管理器（目前已有一些 demo）
前端网页（正在开发）

Posted 2023-06-14Updated 2023-11-2739 minutes read (About 5820 words)

L7 Tracing with eBPF: HTTP and Beyond via Socket Filters and Syscall Tracing

In today’s technology landscape, with the rise of microservices, cloud-native applications, and complex distributed systems, observability of systems has become a crucial factor in ensuring their health, performance, and security. Especially in a microservices architecture, application components may be distributed across multiple containers and servers, making traditional monitoring methods often insufficient to provide the depth and breadth needed to fully understand the behavior of the system. This is where observing seven-layer protocols such as HTTP, gRPC, MQTT, and more becomes particularly important.

Seven-layer protocols provide detailed insights into how applications interact with other services and components. In a microservices environment, understanding these interactions is vital, as they often serve as the root causes of performance bottlenecks, failures, and security issues. However, monitoring these protocols is not a straightforward task. Traditional network monitoring tools like tcpdump, while effective at capturing network traffic, often fall short when dealing with the complexity and dynamism of seven-layer protocols.

This is where eBPF (extended Berkeley Packet Filter) technology comes into play. eBPF allows developers and operators to delve deep into the kernel layer, observing and analyzing system behavior in real-time without the need to modify or insert instrumentation into application code. This presents a unique opportunity to handle application layer traffic more simply and efficiently, particularly in microservices environments.

In this tutorial, we will delve into the following:

Tracking seven-layer protocols such as HTTP and the challenges associated with them.
eBPF’s socket filter and syscall tracing: How these two technologies assist in tracing HTTP network request data at different kernel layers, and the advantages and limitations of each.
eBPF practical tutorial: How to develop an eBPF program and utilize eBPF socket filter or syscall tracing to capture and analyze HTTP traffic.

As network traffic increases and applications grow in complexity, gaining a deeper understanding of seven-layer protocols becomes increasingly important. Through this tutorial, you will acquire the necessary knowledge and tools to more effectively monitor and analyze your network traffic, ultimately enhancing the performance of your applications and servers.

This article is part of the eBPF Developer Tutorial, and for more detailed content, you can visit here. The source code is available on the GitHub repository.

Challenges in Tracking HTTP, HTTP/2, and Other Seven-Layer Protocols

In the modern networking environment, seven-layer protocols extend beyond just HTTP. In fact, there are many seven-layer protocols such as HTTP/2, gRPC, MQTT, WebSocket, AMQP, and SMTP, each serving critical roles in various application scenarios. These protocols provide detailed insights into how applications interact with other services and components. However, tracking these protocols is not a simple task, especially within complex distributed systems.

Diversity and Complexity: Each seven-layer protocol has its specific design and workings. For example, gRPC utilizes HTTP/2 as its transport protocol and supports multiple languages, while MQTT is a lightweight publish/subscribe messaging transport protocol designed for low-bandwidth and unreliable networks.
Dynamism: Many seven-layer protocols are dynamic, meaning their behavior can change based on network conditions, application requirements, or other factors.
Encryption and Security: With increased security awareness, many seven-layer protocols employ encryption technologies such as TLS/SSL. This introduces additional challenges for tracking and analysis, as decrypting traffic is required for in-depth examination.
High-Performance Requirements: In high-traffic production environments, capturing and analyzing traffic for seven-layer protocols can impact system performance. Traditional network monitoring tools may struggle to handle a large number of concurrent sessions.
Data Completeness and Continuity: Unlike tools like tcpdump, which capture individual packets, tracking seven-layer protocols requires capturing complete sessions, which may involve multiple packets. This necessitates tools capable of correctly reassembling and parsing these packets to provide a continuous session view.
Code Intrusiveness: To gain deeper insights into the behavior of seven-layer protocols, developers may need to modify application code to add monitoring functionalities. This not only increases development and maintenance complexity but can also impact application performance.

As mentioned earlier, eBPF provides a powerful solution, allowing us to capture and analyze seven-layer protocol traffic in the kernel layer without modifying application code. This approach not only offers insights into system behavior but also ensures optimal performance and efficiency. This is why eBPF has become the preferred technology for modern observability tools, especially in production environments that demand high performance and low latency.

eBPF Socket Filter vs. Syscall Tracing: In-Depth Analysis and Comparison

eBPF Socket Filter

What Is It?
eBPF socket filter is an extension of the classic Berkeley Packet Filter (BPF) that allows for more advanced packet filtering directly within the kernel. It operates at the socket layer, enabling fine-grained control over which packets are processed by user-space applications.

Key Features:

Performance: By handling packets directly within the kernel, eBPF socket filters reduce the overhead of context switches between user and kernel spaces.
Flexibility: eBPF socket filters can be attached to any socket, providing a universal packet filtering mechanism for various protocols and socket types.
Programmability: Developers can write custom eBPF programs to define complex filtering logic beyond simple packet matching.

Use Cases:

Traffic Control: Restrict or prioritize traffic based on custom conditions.
Security: Discard malicious packets before they reach user-space applications.
Monitoring: Capture specific packets for analysis without affecting other traffic.

eBPF Syscall Tracing

What Is It?
System call tracing using eBPF allows monitoring and manipulation of system calls made by applications. System calls are the primary mechanism through which user-space applications interact with the kernel, making tracing them a valuable way to understand application behavior.

Key Features:

Granularity: eBPF allows tracing specific system calls, even specific parameters within those system calls.
Low Overhead: Compared to other tracing methods, eBPF syscall tracing is designed to have minimal performance impact.
Security: Kernel validates eBPF programs to ensure they do not compromise system stability.

How It Works:
eBPF syscall tracing typically involves attaching eBPF programs to tracepoints or kprobes related to the system calls being traced. When the traced system call is invoked, the eBPF program is executed, allowing data collection or even modification of system call parameters.

Comparison of eBPF Socket Filter and Syscall Tracing

Aspect	eBPF Socket Filter	eBPF Syscall Tracing
Operational Layer	Socket layer, primarily dealing with network packets received from or sent to sockets.	System call layer, monitoring and potentially altering the behavior of system calls made by applications.
Primary Use Cases	Mainly used for filtering, monitoring, and manipulation of network packets.	Used for performance analysis, security monitoring, and debugging of interactions with the network.
Granularity	Focuses on individual network packets.	Can monitor a wide range of system activities, including those unrelated to networking.
Tracking HTTP Traffic	Can be used to filter and capture HTTP packets passed through sockets.	Can trace system calls associated with networking operations, which may include HTTP traffic.

In summary, both eBPF socket filters and syscall tracing can be used to trace HTTP traffic, but socket filters are more direct and suitable for this purpose. However, if you are interested in the broader context of how an application interacts with the system (e.g., which system calls lead to HTTP traffic), syscall tracing can be highly valuable. In many advanced observability setups, both tools may be used simultaneously to provide a comprehensive view of system and network behavior.

Capturing HTTP Traffic with eBPF Socket Filter

eBPF code consists of user-space and kernel-space components, and here we primarily focus on the kernel-space code. Below is the main logic for capturing HTTP traffic in the kernel using eBPF socket filter technology, and the complete code is provided:

SEC("socket")
int socket_handler(struct __sk_buff *skb)
{
    struct so_event *e;
    __u8 verlen;
    __u16 proto;
    __u32 nhoff = ETH_HLEN;
    __u32 ip_proto = 0;
    __u32 tcp_hdr_len = 0;
    __u16 tlen;
    __u32 payload_offset = 0;
    __u32 payload_length = 0;
    __u8 hdr_len;

    bpf_skb_load_bytes(skb, 12, &proto, 2);
    proto = __bpf_ntohs(proto);
    if (proto != ETH_P_IP)
        return 0;

    if (ip_is_fragment(skb, nhoff))
        return 0;

    // ip4 header lengths are variable
    // access ihl as a u8 (linux/include/linux/skbuff.h)
    bpf_skb_load_bytes(skb, ETH_HLEN, &hdr_len, sizeof(hdr_len));
    hdr_len &= 0x0f;
    hdr_len *= 4;

    /* verify hlen meets minimum size requirements */
    if (hdr_len < sizeof(struct iphdr))
    {
        return 0;
    }

    bpf_skb_load_bytes(skb, nhoff + offsetof(struct iphdr, protocol), &ip_proto, 1);

    if (ip_proto != IPPROTO_TCP)
    {
        return 0;
    }

    tcp_hdr_len = nhoff + hdr_len;
    bpf_skb_load_bytes(skb, nhoff + 0, &verlen, 1);
    bpf_skb_load_bytes(skb, nhoff + offsetof(struct iphdr, tot_len), &tlen, sizeof(tlen));

    __u8 doff;
    bpf_skb_load_bytes(skb, tcp_hdr_len + offsetof(struct __tcphdr, ack_seq) + 4, &doff, sizeof(doff)); // read the first byte past __tcphdr->ack_seq, we can't do offsetof bit fields
    doff &= 0xf0; // clean-up res1
    doff >>= 4; // move the upper 4 bits to low
    doff *= 4; // convert to bytes length

    payload_offset = ETH_HLEN + hdr_len + doff;
    payload_length = __bpf_ntohs(tlen) - hdr_len - doff;

    char line_buffer[7];
    if (payload_length < 7 || payload_offset < 0)
    {
        return 0;
    }
    bpf_skb_load_bytes(skb, payload_offset, line_buffer, 7);
    bpf_printk("%d len %d buffer: %s", payload_offset, payload_length, line_buffer);
    if (bpf_strncmp(line_buffer, 3, "GET") != 0 &&
        bpf_strncmp(line_buffer, 4, "POST") != 0 &&
        bpf_strncmp(line_buffer, 3, "PUT") != 0 &&
        bpf_strncmp(line_buffer, 6, "DELETE") != 0 &&
        bpf_strncmp(line_buffer, 4, "HTTP") != 0)
    {
        return 0;
    }

    /* reserve sample from BPF ringbuf */
    e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0);
    if (!e)
        return 0;

    e->ip_proto = ip_proto;
    bpf_skb_load_bytes(skb, nhoff + hdr_len, &(e->ports), 4);
    e->pkt_type = skb->pkt_type;
    e->ifindex = skb->ifindex;

    e->payload_length = payload_length;
    bpf_skb_load_bytes(skb, payload_offset, e->payload, MAX_BUF_SIZE);

    bpf_skb_load_bytes(skb, nhoff + offsetof(struct iphdr, saddr), &(e->src_addr), 4);
    bpf_skb_load_bytes(skb, nhoff + offsetof(struct iphdr, daddr), &(e->dst_addr), 4);
    bpf_ringbuf_submit(e, 0);

    return skb->len;
}

When analyzing this eBPF program, we will explain it in detail according to the content of each code block and provide relevant background knowledge:

SEC("socket")
int socket_handler(struct __sk_buff *skb)
{
    // ...
}

This is the entry point of the eBPF program, defining a function named socket_handler that the kernel uses to handle incoming network packets. This function is located in an eBPF section named socket, indicating that it is intended for socket handling.

struct so_event *e;
__u8 verlen;
__u16 proto;
__u32 nhoff = ETH_HLEN;
__u32 ip_proto = 0;
__u32 tcp_hdr_len = 0;
__u16 tlen;
__u32 payload_offset = 0;
__u32 payload_length = 0;
__u8 hdr_len;

In this code block, several variables are defined to store information needed during packet processing. These variables include struct so_event *e for storing event information, verlen, proto, nhoff, ip_proto, tcp_hdr_len, tlen, payload_offset, payload_length, and hdr_len for storing packet information.

struct so_event *e;: This is a pointer to the so_event structure for storing captured event information. The specific definition of this structure is located elsewhere in the program.
__u8 verlen;, __u16 proto;, __u32 nhoff = ETH_HLEN;: These variables are used to store various pieces of information, such as protocol types, packet offsets, etc. nhoff is initialized to the length of the Ethernet frame header, typically 14 bytes, as Ethernet frame headers include destination MAC address, source MAC address, and frame type fields.
__u32 ip_proto = 0;: This variable is used to store the type of the IP protocol and is initialized to 0.
__u32 tcp_hdr_len = 0;: This variable is used to store the length of the TCP header and is initialized to 0.
__u16 tlen;: This variable is used to store the total length of the IP packet.
__u32 payload_offset = 0;, __u32 payload_length = 0;: These two variables are used to store the offset and length of the HTTP request payload.
__u8 hdr_len;: This variable is used to store the length of the IP header.

bpf_skb_load_bytes(skb, 12, &proto, 2);
proto = __bpf_ntohs(proto);
if (proto != ETH_P_IP)
    return 0;

Here, the code loads the Ethernet frame type field from the packet, which tells us the network layer protocol being used in the packet. It then uses the __bpf_ntohs function to convert the network byte order type field into host byte order. Next, the code checks if the type field is not equal to the Ethernet frame type for IPv4 (0x0800). If it’s not equal, it means the packet is not an IPv4 packet, and the function returns 0, indicating that the packet should not be processed.

Key concepts to understand here:

Ethernet Frame: The Ethernet frame is a data link layer (Layer 2) protocol used for transmitting data frames within a local area network (LAN). Ethernet frames typically include destination MAC address, source MAC address, and frame type fields.
Network Byte Order: Network protocols often use big-endian byte order to represent data. Therefore, data received from the network needs to be converted into host byte order for proper interpretation on the host. Here, the type field from the network is converted to host byte order for further processing.
IPv4 Frame Type (ETH_P_IP): This represents the frame type field in the Ethernet frame, where 0x0800 indicates IPv4.

1 2	if (ip_is_fragment(skb, nhoff)) return 0;

This part of the code checks if IP fragmentation is being handled. IP fragmentation is a mechanism for splitting larger IP packets into multiple smaller fragments for transmission. Here, if the packet is an IP fragment, the function returns 0, indicating that only complete packets will be processed.

static inline int ip_is_fragment(struct __sk_buff *skb, __u32 nhoff)
{
    __u16 frag_off;

    bpf_skb_load_bytes(skb, nhoff + offsetof(struct iphdr, frag_off), &frag_off, 2);
    frag_off = __bpf_ntohs(frag_off);
    return frag_off & (IP_MF | IP_OFFSET);
}

The above code is a helper function used to check if the incoming IPv4 packet is an IP fragment. IP fragmentation is a mechanism where, if the size of an IP packet exceeds the Maximum Transmission Unit (MTU) of the network, routers split it into smaller fragments for transmission across the network. The purpose of this function is to examine the fragment flags and fragment offset fields within the packet to determine if it is a fragment.

Here’s an explanation of the code line by line:

__u16 frag_off;: Defines a 16-bit unsigned integer variable frag_off to store the fragment offset field.
bpf_skb_load_bytes(skb, nhoff + offsetof(struct iphdr, frag_off), &frag_off, 2);: This line of code uses the bpf_skb_load_bytes function to load the fragment offset field from the packet. nhoff is the offset of the IP header within the packet, and offsetof(struct iphdr, frag_off) calculates the offset of the fragment offset field within the IPv4 header.
frag_off = __bpf_ntohs(frag_off);: Converts the loaded fragment offset field from network byte order (big-endian) to host byte order. Network protocols typically use big-endian to represent data, and the conversion to host byte order is done for further processing.
return frag_off & (IP_MF | IP_OFFSET);: This line of code checks the value of the fragment offset field using a bitwise AND operation with two flag values:
- IP_MF: Represents the “More Fragments” flag. If this flag is set to 1, it indicates that the packet is part of a fragmented sequence and more fragments are expected.
- IP_OFFSET: Represents the fragment offset field. If the fragment offset field is non-zero, it indicates that the packet is part of a fragmented sequence and has a fragment offset value.
  If either of these flags is set to 1, the result is non-zero, indicating that the packet is an IP fragment. If both flags are 0, it means the packet is not fragmented.

It’s important to note that the fragment offset field in the IP header is specified in units of 8 bytes, so the actual byte offset is obtained by left-shifting the value by 3 bits. Additionally, the “More Fragments” flag (IP_MF) in the IP header indicates whether there are more fragments in the sequence and is typically used in conjunction with the fragment offset field to indicate the status of fragmented packets.

bpf_skb_load_bytes(skb, ETH_HLEN, &

hdr_len, sizeof(hdr_len));
hdr_len &= 0x0f;
hdr_len *= 4;

In this part of the code, the length of the IP header is loaded from the packet. The IP header length field contains information about the length of the IP header in units of 4 bytes, and it needs to be converted to bytes. Here, it is converted by performing a bitwise AND operation with 0x0f and then multiplying it by 4.

Key concept:

IP Header: The IP header contains fundamental information about a packet, such as the source IP address, destination IP address, protocol type, total length, identification, flags, fragment offset, time to live (TTL), checksum, source port, and destination port.

if (hdr_len < sizeof(struct iphdr))
{
    return 0;
}

This code segment checks if the length of the IP header meets the minimum length requirement, typically 20 bytes. If the length of the IP header is less than 20 bytes, it indicates an incomplete or corrupted packet, and the function returns 0, indicating that the packet should not be processed.

Key concept:

struct iphdr: This is a structure defined in the Linux kernel, representing the format of an IPv4 header. It includes fields such as version, header length, service type, total length, identification, flags, fragment offset, time to live, protocol, header checksum, source IP address, and destination IP address, among others.

bpf_skb_load_bytes(skb, nhoff + offsetof(struct iphdr, protocol), &ip_proto, 1);
if (ip_proto != IPPROTO_TCP)
{
    return 0;
}

Here, the code loads the protocol field from the IP header to determine the transport layer protocol used in the packet. Then, it checks if the protocol field is not equal to the value for TCP (IPPROTO_TCP). If it’s not TCP, it means the packet is not an HTTP request or response, and the function returns 0.

Key concept:

Transport Layer Protocol: The protocol field in the IP header indicates the transport layer protocol used in the packet, such as TCP, UDP, or ICMP.

1	tcp_hdr_len = nhoff + hdr_len;

This line of code calculates the offset of the TCP header. It adds the length of the Ethernet frame header (nhoff) to the length of the IP header (hdr_len) to obtain the starting position of the TCP header.

1	bpf_skb_load_bytes(skb, nhoff + 0, &verlen, 1);

This line of code loads the first byte of the TCP header from the packet, which contains information about the TCP header length. This length field is specified in units of 4 bytes and requires further conversion.

1	bpf_skb_load_bytes(skb, nhoff + offsetof(struct iphdr, tot_len), &tlen, sizeof(tlen));

This line of code loads the total length field of the IP header from the packet. The IP header’s total length field represents the overall length of the IP packet, including both the IP header and the data portion.

__u8 doff;
bpf_skb_load_bytes(skb, tcp_hdr_len + offsetof(struct __tcphdr, ack_seq) + 4, &doff, sizeof(doff));
doff &= 0xf0;
doff >>= 4;
doff *= 4;

This piece of code is used to calculate the length of the TCP header. It loads the Data Offset field (also known as the Header Length field) from the TCP header, which represents the length of the TCP header in units of 4 bytes. The code clears the high four bits of the offset field, then shifts it right by 4 bits, and finally multiplies it by 4 to obtain the actual length of the TCP header.

Key points to understand:

TCP Header: The TCP header contains information related to the TCP protocol, such as source port, destination port, sequence number, acknowledgment number, flags (e.g., SYN, ACK, FIN), window size, and checksum.

1 2	payload_offset = ETH_HLEN + hdr_len + doff; payload_length = __bpf_ntohs(tlen) - hdr_len - doff;

These two lines of code calculate the offset and length of the HTTP request payload. They add the lengths of the Ethernet frame header, IP header, and TCP header together to obtain the offset to the data portion of the HTTP request. Then, by subtracting the total length, IP header length, and TCP header length from the total length field, they calculate the length of the HTTP request data.

Key point:

HTTP Request Payload: The actual data portion included in an HTTP request, typically consisting of the HTTP request headers and request body.

char line_buffer[7];
if (payload_length < 7 || payload_offset < 0)
{
    return 0;
}
bpf_skb_load_bytes(skb, payload_offset, line_buffer, 7);
bpf_printk("%d len %d buffer: %s", payload_offset, payload_length, line_buffer);

This portion of the code loads the first 7 bytes of the HTTP request line and stores them in a character array named line_buffer. It then checks if the length of the HTTP request data is less than 7 bytes or if the offset is negative. If these conditions are met, it indicates an incomplete HTTP request, and the function returns 0. Finally, it uses the bpf_printk function to print the content of the HTTP request line to the kernel log for debugging and analysis.

if (bpf_strncmp(line_buffer, 3, "GET") != 0 &&
    bpf_strncmp(line_buffer, 4, "POST") != 0 &&
    bpf_strncmp(line_buffer, 3, "PUT") != 0 &&
    bpf_strncmp(line_buffer, 6, "DELETE") != 0 &&
    bpf_strncmp(line_buffer, 4, "HTTP") != 0)
{
    return 0;
}

This piece of code uses the bpf_strncmp function to compare the data in line_buffer with HTTP request methods (GET, POST, PUT, DELETE, HTTP). If there is no match, indicating that it is not an HTTP request, it returns 0, indicating that it should not be processed.

1
2
3

e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0);
if (!e)
    return 0;

This section of the code attempts to reserve a block of memory from the BPF ring buffer to store event information. If it cannot reserve the memory block, it returns 0. The BPF ring buffer is used to pass event data between the eBPF program and user space.

Key point:

BPF Ring Buffer: The BPF ring buffer is a mechanism for passing data between eBPF programs and user space. It can be used to store event information for further processing or analysis by user space applications.

e->ip_proto = ip_proto;
bpf_skb_load_bytes(skb, nhoff + hdr_len, &(e->ports), 4);
e->pkt_type = skb->pkt_type;
e->ifindex = skb->ifindex;

e->payload_length = payload_length;
bpf_skb_load_bytes(skb, payload_offset, e->payload, MAX_BUF_SIZE);

bpf_skb_load_bytes(skb, nhoff + offsetof(struct iphdr, saddr), &(e->src_addr), 4);
bpf_skb_load_bytes(skb, nhoff + offsetof(struct iphdr, daddr), &(e->dst_addr), 4);
bpf_ringbuf_submit(e, 0);

return skb->len;

Finally, this code segment stores the captured event information in the e structure and submits it to the BPF ring buffer. It includes information such as the captured IP protocol, source and destination ports, packet type, interface index, payload length, source IP address, and destination IP address. Finally, it returns the length of the packet, indicating that the packet was successfully processed.

This code is primarily used to store captured event information for further processing. The BPF ring buffer is used to pass this information to user space for additional handling or logging.

In summary, this eBPF program’s main task is to capture HTTP requests. It accomplishes this by parsing the Ethernet frame, IP header, and TCP header of incoming packets to determine if they contain HTTP requests. Information about the requests is then stored in the so_event structure and submitted to the BPF ring buffer. This is an efficient method for capturing HTTP traffic at the kernel level and is suitable for applications such as network monitoring and security analysis.

Potential Limitations

The above code has some potential limitations, and one of the main limitations is that it cannot handle URLs that span multiple packets.

Cross-Packet URLs: The code checks the URL in an HTTP request by parsing a single data packet. If the URL of an HTTP request spans multiple packets, it will only examine the URL in the first packet. This can lead to missing or partially capturing long URLs that span multiple data packets.

To address this issue, a solution often involves reassembling multiple packets to reconstruct the complete HTTP request. This may require implementing packet caching and assembly logic within the eBPF program and waiting to collect all relevant packets until the HTTP request is detected. This adds complexity and may require additional memory to handle cases where URLs span multiple packets.

User-Space Code

The user-space code’s main purpose is to create a raw socket and then attach the previously defined eBPF program in the kernel to that socket, allowing the eBPF program to capture and process network packets received on that socket. Here’s an example of the user-space code:

/* Create raw socket for localhost interface */
sock = open_raw_sock(interface);
if (sock < 0) {
    err = -2;
    fprintf(stderr, "Failed to open raw socket\n");
    goto cleanup;
}

/* Attach BPF program to raw socket */
prog_fd = bpf_program__fd(skel->progs.socket_handler);
if (setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))) {
    err = -3;
    fprintf(stderr, "Failed to attach to raw socket\n");
    goto cleanup;
}

sock = open_raw_sock(interface);: This line of code calls a custom function open_raw_sock, which is used to create a raw socket. Raw sockets allow a user-space application to handle network packets directly without going through the protocol stack. The interface parameter might specify the network interface from which to receive packets, determining where to capture packets from. If creating the socket fails, it returns a negative value, otherwise, it returns the file descriptor of the socket sock.
If the value of sock is less than 0, indicating a failure to open the raw socket, it sets err to -2 and prints an error message on the standard error stream.
prog_fd = bpf_program__fd(skel->progs.socket_handler);: This line of code retrieves the file descriptor of the socket filter program (socket_handler) previously defined in the eBPF program. It is necessary to attach this program to the socket. skel is a pointer to an eBPF program object, and it provides access to the program collection.
setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd)): This line of code uses the setsockopt system call to attach the eBPF program to the raw socket. It sets the SO_ATTACH_BPF option and passes the file descriptor of the eBPF program to the option, letting the kernel know which eBPF program to apply to this socket. If the attachment is successful, the socket starts capturing and processing network packets received on it.
If setsockopt fails, it sets err to -3 and prints an error message on the standard error stream.

Compilation and Execution

The complete source code can be found at https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/23-http. To compile and run the code:

$ git submodule update --init --recursive
$ make
  BPF      .output/sockfilter.bpf.o
  GEN-SKEL .output/sockfilter.skel.h
  CC       .output/sockfilter.o
  BINARY   sockfilter
$ sudo ./sockfilter 
...

In another terminal, start a simple web server using Python:

1
2
3

python3 -m http.server
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
127.0.0.1 - - [18/Sep/2023 01:05:52] "GET / HTTP/1.1" 200 -

You can use curl to make requests:

$ curl http://0.0.0.0:8000/
<!DOCTYPE HTML>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Directory listing for /</title>
....

In the eBPF program, you can see that it prints the content of HTTP requests:

127.0.0.1:34552(src) -> 127.0.0.1:8000(dst)
payload: GET / HTTP/1.1
Host: 0.0.0.0:8000
User-Agent: curl/7.88.1
...
127.0.0.1:8000(src) -> 127.0.0.1:34552(dst)
payload: HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.11.4
...

It captures both request and response content.

Capturing HTTP Traffic Using eBPF Syscall Tracepoints

eBPF provides a powerful mechanism for tracing system calls at the kernel level. In this example, we’ll use eBPF to trace the accept and read system calls to capture HTTP traffic. Due to space limitations, we’ll provide a brief overview of the code framework.

struct
{
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 4096);
    __type(key, u64);
    __type(value, struct accept_args_t);
} active_accept_args_map SEC(".maps");

// Define a tracepoint at the entry of the accept system call
SEC("tracepoint/syscalls/sys_enter_accept")
int sys_enter_accept(struct trace_event_raw_sys_enter *ctx)
{
    u64 id = bpf_get_current_pid_tgid();
    // ... Get and store the arguments of the accept call
    bpf_map_update_elem(&active_accept_args_map, &id, &accept_args, BPF_ANY);
    return 0;
}

// Define a tracepoint at the exit of the accept system call
SEC("tracepoint/syscalls/sys_exit_accept")
int sys_exit_accept(struct trace_event_raw_sys_exit *ctx)
{
    // ... Process the result of the accept call
    struct accept_args_t *args =
        bpf_map_lookup_elem(&active_accept_args_map, &id);
    // ... Get and store the socket file descriptor obtained from the accept call
    __u64 pid_fd = ((__u64)pid << 32) | (u32)ret_fd;
    bpf_map_update_elem(&conn_info_map, &pid_fd, &conn_info, BPF_ANY);
    // ...
}

struct
{
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 4096);
    __type(key, u64);
    __type(value, struct data_args_t);
} active_read_args_map SEC(".maps");

// Define a tracepoint at the entry of the read system call
SEC("tracepoint/syscalls/sys_enter_read")
int sys_enter_read(struct trace_event_raw_sys_enter *ctx)
{
    // ... Get and store the arguments of the read call
    bpf_map_update_elem(&active_read_args_map, &id, &read_args, BPF_ANY);
    return 0;
}

// Helper function to check if it's an HTTP connection
static inline bool is_http_connection(const char *line_buffer, u64 bytes_count)
{
    // ... Check if the data is an HTTP request or response
}

// Helper function to process the read data
static inline void process_data(struct trace_event_raw_sys_exit *ctx,
                                u64 id, const struct data_args_t *args, u64 bytes_count)
{
    // ... Process the read data, check if it's HTTP traffic, and send events
    if (is_http_connection(line_buffer, bytes_count))
    {
        // ...
        bpf_probe_read_kernel(&event.msg, read_size, args->buf);
        // ...
        bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
                              &event, sizeof(struct socket_data_event_t));
    }
}

// Define a tracepoint at the exit of the read system call
SEC("tracepoint/syscalls/sys_exit_read")
int sys_exit_read(struct trace_event_raw_sys_exit *ctx)
{
    // ... Process the result of the read call
    struct data_args_t *read_args = bpf_map_lookup_elem(&active_read_args_map, &id);
    if (read_args != NULL)
    {
        process_data(ctx, id, read_args, bytes_count);
    }
    // ...
    return 0;
}

char _license[] SEC("license") = "GPL";

This code briefly demonstrates how to use eBPF to trace system calls in the Linux kernel to capture HTTP traffic. Here’s a detailed explanation of the hook locations and the flow, as well as the complete set of system calls that need to be hooked for comprehensive request tracing:

Hook Locations and Flow

The code uses eBPF Tracepoint functionality. Specifically, it defines a series of eBPF programs and binds them to specific system call Tracepoints to capture entry and exit events of these system calls.
First, it defines two eBPF hash maps (active_accept_args_map and active_read_args_map) to store system call parameters. These maps are used to track accept and read system calls.
Next, it defines multiple Tracepoint tracing programs, including:
- sys_enter_accept: Defined at the entry of the accept system call, used to capture the arguments of the accept system call and store them in the hash map.
- sys_exit_accept: Defined at the exit of the accept system call, used to process the result of the accept system call, including obtaining and storing the new socket file descriptor and related connection information.
- sys_enter_read: Defined at the entry of the read system call, used to capture the arguments of the read system call and store them in the hash map.
- sys_exit_read: Defined at the exit of the read system call, used to process the result of the read system call, including checking if the read data is HTTP traffic and sending events.
In sys_exit_accept and sys_exit_read, there is also some data processing and event sending logic, such as checking if the data is an HTTP connection, assembling event data, and using bpf_perf_event_output to send events to user space for further processing.

Complete Set of System Calls to Hook

To fully implement HTTP request tracing, the system calls that typically need to be hooked include:

socket: Used to capture socket creation for tracking new connections.
bind: Used to obtain port information where the socket is bound.
listen: Used to start listening for connection requests.
accept: Used to accept connection requests and obtain new socket file descriptors.
read: Used to capture received data and check if it contains HTTP requests.
write: Used to capture sent data and check if it contains HTTP responses.

The provided code already covers the tracing of accept and read system calls. To complete HTTP request tracing, additional system calls need to be hooked, and corresponding logic needs to be implemented to handle the parameters and results of these system calls.

The complete source code can be found at https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/23-http.

Summary

In today’s complex technological landscape, system observability has become crucial, especially in the context of microservices and cloud-native applications. This article explores how to leverage eBPF technology for tracing the seven-layer protocols, along with the challenges and solutions that may arise in this process. Here’s a summary of the content covered in this article:

Introduction:
- Modern applications often consist of multiple microservices and distributed components, making it essential to observe the behavior of the entire system.
- Seven-layer protocols (such as HTTP, gRPC, MQTT, etc.) provide detailed insights into application interactions, but monitoring these protocols can be challenging.
Role of eBPF Technology:
- eBPF allows developers to dive deep into the kernel layer for real-time observation and analysis of system behavior without modifying or inserting application code.
- eBPF technology offers a powerful tool for monitoring seven-layer protocols, especially in a microservices environment.
Tracing Seven-Layer Protocols:
- The article discusses the challenges of tracing seven-layer protocols, including their complexity and dynamism.
- Traditional network monitoring tools struggle with the complexity of seven-layer protocols.
Applications of eBPF:
- eBPF provides two primary methods for tracing seven-layer protocols: socket filters and syscall tracing.
- Both of these methods help capture network request data for protocols like HTTP and analyze them.
eBPF Practical Tutorial:
- The article provides a practical eBPF tutorial demonstrating how to capture and analyze HTTP traffic using eBPF socket filters or syscall tracing.
- The tutorial covers the development of eBPF programs, the use of the eBPF toolchain, and the implementation of HTTP request tracing.

Through this article, readers can gain a deep understanding of how to use eBPF technology for tracing seven-layer protocols, particularly HTTP traffic. This knowledge will help enhance the monitoring and analysis of network traffic, thereby improving application performance and security. If you’re interested in learning more about eBPF and its practical applications, you can visit our tutorial code repository at https://github.com/eunomia-bpf/bpf-developer-tutorial or our website at https://eunomia.dev/tutorials/ for more examples and complete tutorials.

Posted 2023-06-14Updated 2023-11-27an hour read (About 9165 words)

OpenAI 新发布GPT 最佳实践：落地大模型应用的策略和战术

在今年六月份，OpenAI 在其官方文档中更新了一篇关于提高 GPT 效果的策略和方法。这篇文章包含了六种核心策略，以及一些实际的提示词案例，和知识检索和代码执行等技术来优化GPT模型的最佳实践。通过使用这些最佳实践，用户可以更好地使用 GPT 模型，并提高其效果和性能。

大部分的示例主要针对 GPT-4 模型，但对于其他模型而言也会有不少参考价值。

本文主要翻译和整理自 OpenAI 的官方文档，原文地址：https://platform.openai.com/docs/guides/gpt-best-practices

一些相关的开源资料仓库：

关于提示词工程（prompt）的指南、论文、讲座、笔记本和资源大全：https://github.com/yunwei37/Prompt-Engineering-Guide-zh-CN

使用 OpenAI API 的例子和中文指南：https://github.com/yunwei37/openai-cookbook-zh-cn

关于大语言模型的安全问题：提示词对抗、破解攻防的工具和案例集合：https://github.com/yunwei37/prompt-adversarial-collections

文末附有更多相关参考资料。

提高结果的六种策略

编写清晰的指令

GPT 无法读取您的思想。如果它们的输出过长，请要求简洁回复。如果它们的输出过于简单，请要求专业水平的写作。如果您不喜欢某种格式，请展示您想要看到的格式。GPT 越少猜测您想要的内容，您获得的可能性就越大。

策略：

在查询中包含详细信息，以获得更相关的答案。
要求模型扮演某个角色。
使用分隔符清晰地表示输入的不同部分。
指定完成任务所需的步骤。
提供示例。
指定输出的期望长度。
提供参考文本。

提供参考文本

GPT 可以自信地编造假答案，特别是当被询问奇特的话题、引用和网址时。就像一张笔记可以帮助学生在考试中取得更好的成绩一样，为 GPT 提供参考文本可以帮助它以较少的虚构进行回答。

策略：

指示模型使用参考文本进行回答。
指示模型使用参考文本中的引用进行回答。

将复杂任务分解为简单子任务

就像在软件工程中将复杂系统分解为一组模块化组件一样，提交给 GPT 的任务也是如此。相比较而言，复杂任务的错误率往往较高。此外，复杂任务通常可以重新定义为一系列较简单任务的工作流程，其中早期任务的输出用于构建后续任务的输入。

策略：

使用意图分类来识别用户查询的最相关指令。
对于需要非常长对话的对话应用程序，总结或过滤以前的对话。
逐段概括长文档并递归构建完整概要。

给予 GPT 足够的时间进行“思考”

如果被要求计算 17 乘以 28，您可能无法立即知道答案，但可以通过时间来计算出来。同样，GPT 在试图立即回答问题时会出现更多的推理错误，而不是花时间思考答案。在得出答案之前，要求进行一连串的推理过程可以帮助 GPT 更可靠地推理出正确答案。

策略：

指示模型在得出结论之前自行解决问题。
使用内心独白或一系列查询来隐藏模型的推理过程。
询问模型是否在之前的处理中漏掉了任何内容。

使用外部工具

通过向 GPT 提供其他工具的输出来弥补 GPT 的不足之处。例如，文本检索系统可以向 GPT 提供相关文档信息。代码执行引擎可以帮助 GPT 进行数学计算和代码运行。如果通过工具而不是 GPT 可以更可靠或更高效地完成任务，则将其卸载以获得最佳结果。

策略：

使用基于嵌入的搜索来实现高效的知识检索。
使用代码执行来执行更准确的计算或调用外部 API。

系统地测试变更

如果您能够进行衡量，那么改进性能就会更容易。在某些情况下，对提示的修改可能会在一些孤立的示例上实现更好的性能，但在更具代表性的一组示例上导致更差的综合性能。因此，为了确保变更对性能有正面的影响，可能需要定义一个全面的测试套件（也称为“评估”）。

策略：

使用参考标准答案评估模型输出。

具体的示例

每个策略都可以通过具体的战术进行实施。这些战术旨在提供尝试的思路。它们并不是完全详尽的，您可以随意尝试不在此处列出的创造性想法。本文为每个具体的策略与战术提供了一些提示词示例。

策略：编写清晰的指令

战术：在查询中包含细节以获得更相关的回答

为了获得高度相关的回复，请确保请求提供任何重要的细节或上下文。否则，您将让模型猜测您的意思。

更差的指令	更好的指令
如何在Excel中添加数字？	如何在Excel中累加一行美元金额？我想要自动为整个工作表的行求和，所有总数都显示在右侧的名为”Total”的列中。
谁是总统？	2021年墨西哥的总统是谁？选举多久举行一次？
编写计算斐波那契数列的代码。	编写一个高效计算斐波那契数列的TypeScript函数。详细注释代码，解释每个部分的作用以及为什么这样编写。
总结会议记录。	用一段话总结会议记录。然后，使用Markdown列表列出发言者及其主要观点。最后，列出发言者建议的下一步行动或待办事项（如果有）。

战术：要求模型扮演角色

系统消息可以用于指定模型在回复中扮演的角色。

USER
写一封感谢信给我的螺栓供应商，感谢他们准时并在短时间内交货。这使我们能够交付一份重要的订单。

SYSTEM
当我请求帮助写东西时，你将在每个段落中至少加入一个笑话或俏皮话。

战术：使用分隔符清晰标示输入的不同部分

像三重引号、XML标记、节标题等分隔符可以帮助标示需要以不同方式处理的文本部分。

USER
使用三重引号中的文本撰写一首俳句。

"""在这里插入文本"""

SYSTEM
你将获得一对关于同一主题的文章（用XML标记分隔）。首先总结每篇文章的论点。然后指出哪篇文章提出了更好的论点，并解释为什么。

USER
<article>在这里插入第一篇文章</article>

<article>在这里插入第二篇文章</article>

SYSTEM
你将获得一份论文摘要和一个建议的标题。论文标题应该给读者一个很好的论文主题概念，同时也要引人注目。如果标题不符合这些标准，请

提出5个备选方案。

USER
摘要：在这里插入摘要

标题：在这里插入标题

对于像这样的简单任务，使用分隔符可能不会对输出质量产生影响。然而，任务越复杂，将任务细节澄清变得越重要。不要让 GPTs 努力理解您究竟在要求什么。

战术：指定完成任务所需的步骤

某些任务最好指定为一系列步骤。明确写出这些步骤可以让模型更容易跟随。

SYSTEM
使用以下逐步说明来回应用户输入。

步骤1 - 用户将用三重引号提供给您文本。用前缀"Summary: "对这段文本进行一句总结。

步骤2 - 将步骤1中的总结翻译成西班牙语，并使用前缀"Translation: "。

战术：提供示例

通常情况下，提供适用于所有示例的一般说明比通过示例演示任务的所有变体更高效，但在某些情况下，提供示例可能更容易。例如，如果您打算让模型复制一种难以明确描述的用户查询响应风格。这被称为”few-shot”提示。

SYSTEM
用一致的风格回答。

USER
教我什么是耐心。

ASSISTANT
刻出最深峡谷的河流起源于一处平凡的泉眼；最壮丽的交响乐源于一颗单独的音符；最复杂的挂毯始于一根孤立的线缕。

USER
教我关于海洋。

Open in Playground

战术：指定所需的输出长度

您可以要求模型生成指定长度的输出。目标输出长度可以根据词数、句子数、段落数、项目符号数等来指定。但请注意，指示模型生成特定数量的单词并不具有高精确度。模型更可靠地生成具有特定段落或项目符号数量的输出。

USER
用大约50个词总结由三重引号分隔的文本。

"""在这里插入文本"""

USER
用2个段落总结由三重引号分隔的文本。

"""在这里插入文本"""

USER
用3个项目符号总结由三重引号分隔的文本。

"""在这里插入文本"""

策略：提供参考文本

战术：指示模型使用参考文本回答问题

如果我们能够为模型提供与当前查询相关的可信信息，那么我们可以指示模型使用提供的信息来组成其答案。

SYSTEM
使用由三重引号分隔的所提供的文章来回答问题。如果答案在文章中找不到，写下"I could not find an answer."
USER
<插入文章，每篇文章由三重引号分隔>

问题：<插入问题>

鉴于GPT具有有限的上下文窗口，为了应用此策略，我们需要某种方式动态查找与被提问的问题相关的信息。可以使用嵌入来实现高效的知识检索。查看策略”使用基于嵌入的搜索实现高效知识检索”了解更多关于如何实现这一点的细节。

战术：指示模型使用参考文本的引文进行回答

如果输入已经被相关知识补充，直接要求模型通过引用所提供文档的段落来添加引文到其回答中就很简单了。请注意，可以通过在所提供的文档中进行字符串匹配来编程验证输出中的引文。

SYSTEM
你将得到一个由三重引号分隔的文档和一个问题。你的任务是只使用提供的文档来回答问题，并引用用来回答问题的文档段落。如果文档不包含回答此问题所需的信息，那么只需写下：“信息不足”。如果提供了问题的答案，必须用引文进行注释。使用以下格式引用相关段落 ({"citation": …})。
USER
"""<插入文档>"""

<插入问题>

策略：将复杂任务分解为更简单的子任务

战术：使用意图分类来识别用户查询最相关的指令

对于需要大量独立的指令集来处理不同情况的任务，首先分类查询类型并使用分类来确定需要哪些指令可能是有益的。这可以通过定义固定的类别并硬编码与处理给定类别任务相关的指令来实现。此过程也可以递归应用以将任务分解为一系列阶段。这种方法的优点是每个查询只包含执行任务的下一阶段所需的那些指令，这可能导致比使用单个查询执行整个任务时的错误率更低。这也可能导致成本更低，因为更大的提示运行成本更高（参见价格信息）。

假设例如，对于客户服务应用，查询可能被有用地分类如下：

SYSTEM
你将得到客户服务查询。将每个查询分类为主要类别和次要类别。以json格式提供你的输出，包含主要和次要两个关键字。

主要类别：计费，技术支持，账户管理，或一般咨询。

计费次要类别：
- 退订或升级
- 添加付款方式
- 收费解释
- 争议收费

技术支持次要类别：
- 故障排除
- 设备兼容性
- 软件更新

账户管理次要类别：
- 密码重置
- 更新个人信息
- 关闭账户
- 账户安全

一般咨询次要类别：
- 产品信息
- 价格
- 反馈
- 要求与人对话
USER
我需要让我的互联网再次工作。

基于客户查询的分类，可以向GPT模型提供一组更具体的指令来处理下一步。例如，假设客户需要帮助”故障排除”。

SYSTEM
你将得到需要在技术支持环境中进行故障排除的客户服务查询。通过以下方式帮助用户：

- 让他们检查到/从路由器的所有电缆是否已连接。注意，电缆随着时间的推移会常常松动。
- 如果所有电缆都已连接并且问题仍然存在，询问他们正在使用哪种路由器模型
- 现在你将建议他们如何重新启动他们的设备：
-- 如果型号是MTD-327J，建议他们按下红色按钮并保持5秒钟，然后等待5分钟后再测试连接。
-- 如果型号是MTD-327S，建议他们拔掉并重新插入，然后等待5分钟后再测试连接。
- 如果客户在重启设备并等待5分钟后的问题仍然存在，通过输出{"IT support requested"}将他们连接到IT支持。
- 如果用户开始提问与此主题无关的问题，那么请确认他们是否希望结束当前关于故障排除的聊天，并根据以下方案对他们的请求进行分类：

<插入上述主/次分类方案>
USER
我需要让我的互联网再次工作。

请注意，已经指示模型在会话状态改变时发出特殊的字符串。这使我们能够将我们的系统转变为状态机，其中状态决定哪些指令被注入。通过跟踪状态，什么指令在那个状态下是相关的，以及从那个状态允许什么状态转换，我们可以在用户体验周围设置保护，这在一种不太结构化的方法中很难实现。

战术

：对需要进行非常长对话的对话应用程序，对先前的对话进行汇总或过滤由于GPT具有固定的上下文长度，用户和助手之间的对话不能无限地继续，如果整个对话都包含在上下文窗口中。

解决这个问题的方法有很多，其中之一是对对话的前几个回合进行汇总。一旦输入的大小达到预定的阈值长度，这可能会触发一个查询，该查询会汇总对话的一部分，先前对话的汇总可以作为系统消息的一部分。或者，先前的对话可以在整个对话过程中异步地进行汇总。

另一个解决方案是动态选择与当前查询最相关的对话的先前部分。参见战术”使用基于嵌入的搜索来实现高效的知识检索”。

战术：分段汇总长文档并递归构造完整汇总

由于GPT具有固定的上下文长度，它们不能在单个查询中用来汇总超过上下文长度减去生成汇总长度的文本。

要汇总一个非常长的文档，如一本书，我们可以使用一系列的查询来汇总每一部分的文档。部分汇总可以被连接起来并进行汇总，产生汇总的汇总。这个过程可以递归地进行，直到整个文档被汇总。如果需要使用关于早期部分的信息来理解后来的部分，那么在对那一点的内容进行汇总时包含一个运行汇总的文本，这可能是一个有用的技巧。OpenAI在之前的研究中已经研究了这种用于汇总书籍的程序的有效性，使用的是GPT-3的变体。

策略：给予 GPT 时间“思考”

战术：在匆忙得出结论前，指导模型自行找出解决方案

有时候，我们明确地指导模型在得出结论之前先进行从头至尾的推理，会得到更好的结果。比如，假设我们希望模型评估一个学生解决数学问题的方法。最直观的做法就是直接询问模型学生的解决方案是否正确。

SYSTEM
判断学生的解答是否正确。

USER
问题描述：我正在建造一个太阳能发电装置，我需要帮助计算财务状况。
- 土地费用为每平方英尺100美元
- 我可以以每平方英尺250美元的价格购买太阳能板
- 我已经商定了一个每年固定费用为10万美元的维护合同，以及每平方英尺额外的10美元
第一年运营的总成本是多少，作为面积（平方英尺）的函数来表示。

学生的解答：设x为装置的面积（平方英尺）。
1. 土地费用：100x
2. 太阳能板费用：250x
3. 维护费用：100,000 + 100x
总成本：100x + 250x + 100,000 + 100x = 450x + 100,000

ASSISTANT
学生的解答是正确的。

但实际上，学生的解答并不正确！我们可以通过提示模型首先生成自己的解答，从而让模型成功地注意到这一点。

SYSTEM
首先自己解决这个问题，然后将你的解答与学生的解答进行比较，评估学生的解答是否正确。在你自己解决这个问题之前，不要判断学生的解答是否正确。

USER
问题描述：我正在建造一个太阳能发电装置，我需要帮助计算财务状况。
- 土地费用为每平方英尺100美元
- 我可以以每平方英尺250美元的价格购买太阳能板
- 我已经商定了一个每年固定费用为10万美元的维护合同，以及每平方英尺额外的10美元
第一年运营的总成本是多少，作为面积（平方英尺）的函数来表示。

学生的解答：设x为装置的面积（平方英尺）。
1. 土地费用：100x
2. 太阳能板费用：250x
3. 维护费用：100,000 + 100x
总成本：100x

 + 250x + 100,000 + 100x = 450x + 100,000

ASSISTANT
设x为装置的面积（平方英尺）。
1. 土地费用：100x
2. 太阳能板费用：250x
3. 维护费用：100,000 + 10x
总成本：100x + 250x + 100,000 + 10x = 360x + 100,000

学生的解答是不正确的。他们在计算维护费用时犯了错误，使用了100x而不是10x。第一年运营的正确总成本应为360x + 100,000。

战术：使用内心独白或一系列查询来隐藏模型的推理过程

前一个战术表明，模型有时需要在回答特定问题之前详细地推理一个问题。对于某些应用程序，模型用来得出最终答案的推理过程是不适合与用户分享的。例如，在辅导应用程序中，我们可能希望鼓励学生自己找出答案，但是模型对学生解答的推理过程可能会向学生透露答案。

内心独白是一种可以用来解决这个问题的战术。内心独白的想法是，指导模型将那些应该对用户隐藏的输出部分放入一种结构化的格式中，使其易于解析。然后，在向用户呈现输出之前，解析输出并只使部分输出可见。

SYSTEM
按照以下步骤回答用户的查询。

步骤 1 - 首先自己解决问题。不要依赖学生的解答，因为它可能是错误的。将此步骤的所有工作都包含在三引号（"""）中。

步骤 2 - 将你的解答与学生的解答进行比较，并评估学生的解答是否正确。将此步骤的所有工作都包含在三引号（"""）中。

步骤 3 - 如果学生犯了错误，确定你可以给学生什么提示，而不透露答案。将此步骤的所有工作都包含在三引号（"""）中。

步骤 4 - 如果学生犯了错误，将上一步的提示提供给学生（在三引号之外）。写"提示："，而不是"步骤4 - ...”。

USER
问题描述： <插入问题描述>

学生的解答： <插入学生的解答>

或者，这也可以通过一系列查询实现，其中所有查询的输出（除了最后一个）都对最终用户隐藏。

首先，我们可以让模型自己解决问题。

由于这个初始查询不需要学生的解答，所以可以省略它。这提供了额外的优势，即模型的解答不可能被学生尝试的解答偏倚。

1
2
3

USER
<插入问题描述>

接下来，我们可以让模型使用所有可用的信息来评估学生解答的正确性。

SYSTEM
将你的解答与学生的解答进行比较，然后评估学生的解答是否正确。

USER
问题描述："""<插入问题描述>"""

你的解答："""<插入模型生成的解答>"""

学生的解答："""<插入学生的解答>"""

最后，我们可以让模型使用自己的分析，构造一个有帮助的导师的回复。

SYSTEM
你是一位数学导师。如果学生犯了错误，以不透露答案的方式给学生提供一个提示。如果学生没有犯错误，简单地对他们提供鼓励性的评论。

USER
问题描述："""<插入问题描述>"""

你的解答："""<插入模型生成的解答>"""

学生的解答："""<插入学生的解答>"""

分析："""<插入模型在前一步生成的分析>"""

战术：询问模型是否在前几轮查询中遗漏了任何内容

假设我们正在使用模型列出与特定问题相关的源文本的摘录。在列出每一篇摘录之后，模型需要确定是否应开始编写另一篇，或者是否应该停止。如果源文档很大，模型通常会过早地停止，无法列出所有相关的摘录。在这种情况下，通过提示模型用后续查询找出它在前几轮中遗漏的摘录，通常可以获得更好的性能。

SYSTEM
你将获得一个用三引号分隔的文档。你的任务是选择与以下问题相关的摘录："人工智能历史上发生过哪些重大的范式转变。"

确保摘录包含所有相关的上下文，以便解释它们 - 换句话说，不要提取缺少重要上下文的小片段。以如下的JSON格式提供输出：

[{"摘录": "..."},
...
{"摘录": "..."}]

USER
"""<插入文档>"""

ASSISTANT
[{"摘录": "模型在这里写下一段摘录"},
...
{"摘录": "模型在这里写下另一段摘录"}]

USER
还有其他相关的摘录吗？请注意不要重复摘录。并确保

新的摘录确实与问题有关，而且在旧的摘录中还没有涵盖。

ASSISTANT
[{"摘录": "模型在这里写下一段摘录"},
...
{"摘录": "模型在这里写下另一段摘录"}]

注意，如果文档特别长，这个战术可能需要多次应用。

策略：使用外部工具

战术：利用基于嵌入的搜索实现高效的知识检索

模型可以利用作为其输入的外部信息源。这可以帮助模型生成更有依据和最新的响应。例如，如果用户询问关于特定电影的问题，将有关该电影的高质量信息（如演员、导演等）添加到模型的输入可能是有用的。嵌入可以用来实现高效的知识检索，以便在运行时动态地将相关信息添加到模型输入。

文本嵌入是一种可以测量文本字符串之间相关性的向量。相似或相关的字符串将比无关的字符串更接近。这个事实，再加上快速向量搜索算法的存在，意味着嵌入可以被用来实现高效的知识检索。具体来说，文本语料库可以被切割成块，每个块可以被嵌入并存储。然后，给定的查询可以被嵌入，向量搜索可以被执行，以找到与查询最相关的文本块（即，在嵌入空间中最接近的）。

实施示例可以在OpenAI Cookbook中找到。请参阅战术”Instruct the model to use retrieved knowledge to answer queries”，以获取如何使用知识检索来最小化模型制造错误事实的可能性的例子。

战术：使用代码执行进行更精确的计算或调用外部API

我们不能依赖GPT自己精确地进行算术或长时间的计算。在需要的情况下，可以指导模型编写和运行代码，而不是自己进行计算。特别是，可以指导模型将要运行的代码放入特定格式，如三重反引号。在产生输出后，可以提取并运行代码。最后，如果必要，可以将代码执行引擎（如Python解释器）的输出作为模型下一次查询的输入。

SYSTEM
你可以通过将代码包含在三重反引号中来编写和执行Python代码，例如 ```代码在此```。使用这种方式来进行计算。
USER
找出以下多项式的所有实数根：3*x**5 - 5*x**4 - 3*x**3 - 7*x - 10。

代码执行的另一个好用途是调用外部API。如果模型在API的正确使用上得到了指导，它就可以编写使用这个API的代码。可以通过向模型提供文档和/或代码示例来指导模型如何使用API。

SYSTEM
你可以通过将代码包含在三重反引号中来编写和执行Python代码。另外注意，你可以使用以下模块帮助用户向朋友发送消息：

```python
import

 message
message.write(to="John", message="Hey, want to meetup after work?")

警告：执行由模型产生的代码本质上并不安全，任何希望执行此操作的应用都应该采取预防措施。特别地，需要一个沙箱化的代码执行环境来限制不受信任的代码可能导致的危害。

策略：系统地测试改变

有时候，很难确定一个改变——例如，新的指令或新的设计——是否使你的系统更好或更差。观察几个例子可能会暗示哪个更好，但是在小样本的情况下，很难区分真正的改进和随机运气。可能这种变化在某些输入上提高了性能，但在其他输入上却降低了性能。

评估程序（或“评估”）对优化系统设计很有用。良好的评估具有以下特性：

代表现实世界的使用情况（或至少多样化）
包含许多测试用例，以获得更大的统计能力（见下表作为指南）
易于自动化或重复

检测到的差异	95%置信度所需的样本大小
30%	~10
10%	~100
3%	~1,000
1%	~10,000

输出的评估可以由计算机、人或两者混合完成。计算机可以使用目标标准（例如，具有单一正确答案的问题）以及某些主观或模糊的标准自动化评估，其中模型输出由其他模型查询进行评估。OpenAI Evals 是一个开源软件框架，提供用于创建自动化评估的工具。

当存在一系列被认为是同等高质量的可能输出（例如，对于具有长答案的问题）时，基于模型的评估可能有用。哪些可以用基于模型的评估真实地进行评估，哪些需要人来评估的边界是模糊的，随着模型变得越来越有能力，这个边界正在不断地移动。我们鼓励进行实验，以确定基于模型的评估对你的用例有多大的效果。

战术：参照标准答案评估模型输出

假设已知一个问题的正确答案应该参考一组特定的已知事实。然后，我们可以使用模型查询来计算答案中包含了多少必需的事实。

例如，使用以下的系统消息：

SYSTEM
您将获得由三个引号界定的文本，这应该是问题的答案。检查以下的信息是否直接包含在答案中：

- 尼尔·阿姆斯特朗是第一个登上月球的人。
- 尼尔·阿姆斯特朗第一次走上月球的日期是1969年7月21日。

对于这些点，请执行以下步

骤：

1 - 重述这一点。
2 - 提供一个来自答案的引用，这个引用最接近这一点。
3 - 考虑一个不了解这个主题的人读了引用是否可以直接推断出这一点。在做决定之前，解释为什么或为什么不。
4 - 如果3的答案是肯定的，写“是”，否则写“否”。

最后，提供一个“是”的答案的数量。将这个数量作为{"count": <insert count here>}提供。

下面是一个例子，其中两个要点都得到了满足：

SYSTEM
<insert system message above>
USER
"""尼尔·阿姆斯特朗因为是第一个踏上月球的人而闻名。这个历史性的事件发生在1969年7月21日，是阿波罗11号任务的一部分。"""

这是一个只满足一个要点的输入示例：

SYSTEM
<insert system message above>
USER
"""尼尔·阿姆斯特朗在他从月球模块走下来时创造了历史，成为第一个在月球上行走的人。"""

这是一个没有满足任何要点的输入示例：

SYSTEM
<insert system message above>
USER
"""在'69年的夏天，一个宏大的旅程，
阿波罗11号，像传说的手一样大胆。
阿姆斯特朗迈出了一步，历史展开，
他说的'一个小步'，是为了一个新的世界。"""

这种类型的基于模型的评估有许多可能的变体。考虑下面这个跟踪候选答案和金标准答案之间的重叠种类，以及候选答案是否与金标准答案的任何部分矛盾的变体。

SYSTEM
按照以下步骤进行。

步骤1：逐步推理提交的答案与专家答案比较，是：不相交、子集、超集，还是有相等的信息集。

步骤2：逐步推理提交的答案是否与专家答案的任何部分矛盾。

步骤3：输出一个JSON对象，结构如下：{"containment": "disjoint" or "subset" or "superset" or "equal", "contradiction": True or False}

这是一个输入例子，其中的答案质量较差：

SYSTEM
<insert system message above>
USER
Question: """尼尔·阿姆斯特朗最著名的事件是什么，它发生在什么时候？假设UTC时间。"""

Submitted Answer: """他在月球上

走了一圈吗？"""

Expert Answer: """尼尔·阿姆斯特朗最为人所知的是他是第一个踏上月球的人。这一历史性的事件发生在1969年7月21日，是NASA的阿波罗11号任务的一部分。阿姆斯特朗踏上月球表面时说的名言："那是人类的一小步，却是人类的一大步"，至今仍被广泛引用。
"""

这是一个有好答案的输入示例：

SYSTEM
<insert system message above>
USER
Question: """尼尔·阿姆斯特朗最著名的事件是什么，它发生在什么时候？假设UTC时间。"""

Submitted Answer: """在1969年7月21日的大约02:56 UTC时间，尼尔·阿姆斯特朗成为第一个踏上月球表面的人，标志着人类历史上的一项伟大成就。大约20分钟后，奥尔德林也加入到他的行列。"""

Expert Answer: """尼尔·阿姆斯特朗最为人所知的是他是第一个踏上月球的人。这一历史性的事件发生在1969年7月21日，是阿波罗11号任务的一部分。"""

build and run

for x86

Below is an example of how you can modify your code to perform an inline hook for the my_function. This is a simplistic approach and works specifically for this case. This is just an illustrative example. For real-world scenarios, a more complex method would need to be employed, considering thread-safety, re-entrant code, and more.

void inline_hook(void *orig_func, void *hook_func) {
    // Store the original bytes of the function.
    unsigned char orig_bytes[5];
    memcpy(orig_bytes, orig_func, 5);

    // Make the memory page writable.
    mprotect(get_page_addr(orig_func), getpagesize(), PROT_READ | PROT_WRITE | PROT_EXEC);

    // Write a jump instruction at the start of the original function.
    *((unsigned char *)orig_func + 0) = 0xE9; // JMP instruction
    *((void **)((unsigned char *)orig_func + 1)) = (unsigned char *)hook_func - (unsigned char *)orig_func - 5;

    // Make the memory page executable only.
    mprotect(get_page_addr(orig_func), getpagesize(), PROT_READ | PROT_EXEC);
}

In this example, my_function is the original function that is hooked. my_hook_function is the function that gets called instead of my_function. The inline_hook function performs the actual hook by overwriting the start of my_function with a jump (JMP) instruction to my_hook_function.

When you now call my_function in your main, my_hook_function is called instead.

Please note that this code is simplified and makes a few assumptions:

The functions my_function and my_hook_function are in the same memory page. If they aren’t, the jump offset from my_function to my_hook_function might not fit in the 4 bytes available in the jump instruction.
The first 5 bytes of my_function can be safely overwritten. If there’s a multi-byte instruction that starts within the first 5 bytes but doesn’t end before the 6th byte, this will crash.
The functions my_function and my_hook_function don’t move in memory. If they do (for example, if they’re in a shared library that gets unloaded and reloaded at a different address), the jump instruction will jump to the wrong place and likely crash.

$ make
$ ./maps
Hello, world!
Hello from hook!
Hello, world!

for arm32

Note that in ARM32, the Program Counter (PC) is usually 2 instructions ahead, which is why we subtract 8 (2 instructions * 4 bytes/instruction) when calculating the offset. This might differ between different ARM versions or modes (Thumb vs ARM, etc.) so please adjust accordingly to your target’s specifics.

Also, you need to increase the SIZE_ORIG_BYTES from 16 to 20 because the minimal branch instruction in ARM is 4 bytes and you’re going to replace 5 instructions. This is needed because the branch instruction uses a relative offset and you cannot be sure how far your hook function will be. If your function and hook are within 32MB of each other, you could only replace the first 4 bytes with a branch and wouldn’t need to touch the rest.

Remember that manipulating code at runtime can be error-prone and architecture-specific. The code can behave differently based on where it’s loaded in memory, how the compiler has optimized it, whether it’s running in Thumb or ARM mode, and so on. Always thoroughly test the code in the exact conditions where it will be used.

$ make arm
$ ./maps-arm32
Hello, world!
Hello from hook!
Hello, world!

for arm64

Similar to ARM32, ARM64 uses the ARM instruction set. However, there are differences and specifics to consider for ARM64. For example, the encoding of the branch instruction is different and because of the larger address space, you have to create a trampoline for larger offsets that can’t be reached by a single branch instruction. The trampoline should be close to the original function so it can be reached by a branch instruction and from there, it will load the full 64 bit address of the hook function.

$ make arm64
$ ./maps-arm64
Hello, world!
Hello from hook!
Hello, world!

Posted 2023-06-14Updated 2023-11-2724 minutes read (About 3534 words)

eBPF Practical Tutorial: Capturing SSL/TLS Plain Text Data Using uprobe

With the widespread use of TLS in modern network environments, tracing microservices RPC messages has become increasingly challenging. Traditional traffic sniffing techniques often face limitations in accessing only encrypted data, preventing a genuine observation of the original communication content. This restriction poses significant obstacles to system debugging and analysis.

However, a new solution is now available. Through the use of eBPF technology and its capability to perform probing in user space, a method has emerged to regain plain text data, allowing us to intuitively view the pre-encrypted communication content. Nevertheless, each application might utilize different libraries, and each library comes in multiple versions, introducing complexity to the tracking process.

In this tutorial, we will guide you through an eBPF tracing technique that spans across various user-space SSL/TLS libraries. This technique not only allows simultaneous tracing of user-space libraries like GnuTLS and OpenSSL but also significantly reduces maintenance efforts for new library versions compared to previous methods.

Background Knowledge

Before delving into the main topic of this tutorial, we need to grasp some core concepts that will serve as the foundation for our subsequent discussions.

SSL and TLS

SSL (Secure Sockets Layer): Developed by Netscape in the early 1990s, SSL provides data encryption for communication between two machines on a network. However, due to known security vulnerabilities, SSL has been succeeded by its successor, TLS.

TLS (Transport Layer Security): TLS is the successor to SSL, aiming to provide stronger and more secure data encryption methods. TLS operates through a handshake process during which a client and a server select an encryption algorithm and corresponding keys. Once the handshake is complete, data transmission begins, with all data being encrypted using the chosen algorithm and keys.

Operation Principles of TLS

Transport Layer Security (TLS) is a cryptographic protocol designed to provide security for communication over computer networks. Its primary goal is to provide security, including privacy (confidentiality), integrity, and authenticity, for two or more communicating computer applications over a network using cryptography, such as certificates. TLS consists of two sub-layers: the TLS Record Protocol and the TLS Handshake Protocol.

Handshake Process

When a client connects to a TLS-enabled server and requests a secure connection, the handshake process begins. The handshake allows the client and server to establish security parameters for the connection using asymmetric cryptography. The complete process is as follows:

Initial Handshake: The client connects to the TLS-enabled server, requests a secure connection, and provides a list of supported cipher suites (encryption algorithms and hash functions).
Selecting Cipher Suite: From the provided list, the server chooses a cipher suite and hash function it also supports and notifies the client of the decision.
Providing Digital Certificate: Usually, the server then provides identity authentication in the form of a digital certificate. This certificate includes the server’s name, trusted certificate authorities (guaranteeing the certificate’s authenticity), and the server’s public encryption key.
Certificate Verification: The client verifies the certificate’s validity before proceeding.
Generating Session Key: To create a session key for a secure connection, the client has two methods:
- Encrypt a random number (PreMasterSecret) with the server’s public key and send the result to the server (only the server can decrypt it with its private key); both parties then use this random number to generate a unique session key for encrypting and decrypting data during the session.
- Use Diffie-Hellman key exchange (or its variant, Elliptic Curve DH) to securely generate a random and unique session key for encryption and decryption. This key has the additional property of forward secrecy: even if the server’s private key is exposed in the future, it can’t be used to decrypt the current session, even if a third party intercepts and records the session.

Once these steps are successfully completed, the handshake process concludes, and the encrypted connection begins. This connection uses the session key for encryption and decryption until the connection is closed. If any of the above steps fail, the TLS handshake fails, and the connection won’t be established.

TLS in the OSI Model

TLS and SSL don’t perfectly align with any single layer of the OSI model or the TCP/IP model. TLS “runs over some reliable transport protocol (such as TCP),” which means it sits above the transport layer. It provides encryption for higher layers, typically the presentation layer. However, applications using TLS often consider it the transport layer, even though applications using TLS must actively control the initiation of TLS handshakes and the handling of exchanged authentication certificates.

eBPF and uprobes

eBPF (Extended Berkeley Packet Filter): It’s a kernel technology that allows users to run predefined programs in the kernel space without modifying kernel source code or reloading modules. It creates a bridge that enables interaction between user space and kernel space, providing unprecedented capabilities for tasks like system monitoring, performance analysis, and network traffic analysis.

uprobes are a significant feature of eBPF, allowing dynamic insertion of probe points in user space applications, particularly useful for tracking function calls in SSL/TLS libraries.

User-Space Libraries

The implementation of the SSL/TLS protocol heavily relies on user-space libraries. Here are some common ones:

OpenSSL: An open-source, feature-rich cryptographic library widely used in many open-source and commercial projects.
BoringSSL: A fork of OpenSSL maintained by Google, focusing on simplification and optimization for Google’s needs.
GnuTLS: Part of the GNU project, offering an implementation of SSL, TLS, and DTLS protocols. GnuTLS differs from OpenSSL and BoringSSL in API design, module structure, and licensing.

OpenSSL API Analysis

OpenSSL is a widely used open-source library providing a complete implementation of the SSL and TLS protocols, ensuring data transmission security in various applications. Among its functions, SSL_read() and SSL_write() are two core API functions for reading from and writing to TLS/SSL connections. In this section, we’ll delve into these functions to help you understand their mechanisms.

1. SSL_read Function

When we want to read data from an established SSL connection, we can use the SSL_read or SSL_read_ex function. The function prototype is as follows:

1 2	int SSL_read_ex(SSL ssl, void buf, size_t num, size_t readbytes); int SSL_read(SSL ssl, void *buf, int num);

SSL_read and SSL_read_ex attempt to read up to num bytes of data from the specified ssl into the buffer buf. Upon success, SSL_read_ex stores the actual number of read bytes in *readbytes.

2. Function SSL_write

When we want to write data into an established SSL connection, we can use the SSL_write or SSL_write_ex functions.

Function prototype:

1 2	int SSL_write_ex(SSL s, const void buf, size_t num, size_t written); int SSL_write(SSL ssl, const void *buf, int num);

SSL_write and SSL_write_ex will write up to num bytes of data from the buffer buf into the specified ssl connection. Upon success, SSL_write_ex will store the actual number of written bytes in *written.

Writing eBPF Kernel Code

In our example, we use eBPF to hook the ssl_read and ssl_write functions to perform custom actions when data is read from or written to an SSL connection.

Data Structures

Firstly, we define a data structure probe_SSL_data_t to transfer data between kernel and user space:

#define MAX_BUF_SIZE 8192
#define TASK_COMM_LEN 16

struct probe_SSL_data_t {
    __u64 timestamp_ns;  // Timestamp (nanoseconds)
    __u64 delta_ns;      // Function execution time
    __u32 pid;           // Process ID
    __u32 tid;           // Thread ID
    __u32 uid;           // User ID
    __u32 len;           // Length of read/write data
    int buf_filled;      // Whether buffer is filled completely
    int rw;              // Read or Write (0 for read, 1 for write)
    char comm[TASK_COMM_LEN]; // Process name
    __u8 buf[MAX_BUF_SIZE];  // Data buffer
    int is_handshake;    // Whether it's handshake data
};

Hook Functions

Our goal is to hook into the SSL_read and SSL_write functions. We define a function SSL_exit to handle the return values of these two functions. This function determines whether to trace and collect data based on the current process and thread IDs.

static int SSL_exit(struct pt_regs *ctx, int rw) {
    int ret = 0;
    u32 zero = 0;
    u64 pid_tgid = bpf_get_current_pid_tgid();
    u32 pid = pid_tgid >> 32;
    u32 tid = (u32)pid_tgid;
    u32 uid = bpf_get_current_uid_gid();
    u64 ts = bpf_ktime_get_ns();

    if (!trace_allowed(uid, pid)) {
        return 0;
    }

    /* store arg info for later lookup */
    u64 *bufp = bpf_map_lookup_elem(&bufs, &tid);
    if (bufp == 0)
        return 0;

    u64 *tsp = bpf_map_lookup_elem(&start_ns, &tid);
    if (!tsp)
        return 0;
    u64 delta_ns = ts - *tsp;

    int len = PT_REGS_RC(ctx);
    if (len <= 0)  // no data
        return 0;

    struct probe_SSL_data_t *data = bpf_map_lookup_elem(&ssl_data, &zero);
    if (!data)
        return 0;

    data->timestamp_ns = ts;
    data->delta_ns = delta_ns;
    data->pid = pid;
    data->tid = tid;
    data->uid = uid;
    data->len = (u32)len;
    data->buf_filled = 0;
    data->rw = rw;
    data->is_handshake = false;
    u32 buf_copy_size = min((size_t)MAX_BUF_SIZE, (size_t)len);

    bpf_get_current_comm(&data->comm, sizeof(data->comm));

    if (bufp != 0)
        ret = bpf_probe_read_user(&data->buf, buf_copy_size, (char *)*bufp);

    bpf_map_delete_elem(&bufs, &tid);
    bpf_map_delete_elem(&start_ns, &tid);

    if (!ret)
        data->buf_filled = 1;
    else
        buf_copy_size = 0;

    bpf_perf_event_output(ctx, &perf_SSL_events, BPF_F_CURRENT_CPU, data,
                            EVENT_SIZE(buf_copy_size));
    return 0;
}

The rw parameter here indicates whether it’s a read or write operation. 0 represents read, and 1 represents write.

Data Collection Process

Obtain the ID of the current process and thread, along with the ID of the current user.
Use trace_allowed to determine if tracing is allowed for this process.
Get the start time to calculate the execution time of the function.
Attempt to retrieve relevant data from the bufs and start_ns maps.
If data retrieval is successful, create or locate a probe_SSL_data_t structure to populate the data.
Copy the data from user space to the buffer, ensuring it doesn’t exceed the designated size.
Finally, send the data to user space.

Note: We use two user-level return probes uretprobe to respectively hook the returns of SSL_read and SSL_write:

SEC("uretprobe/SSL_read")
int BPF_URETPROBE(probe_SSL_read_exit) {
    return (SSL_exit(ctx, 0));  // 0 indicates read operation
}

SEC("uretprobe/SSL_write")
int BPF_URETPROBE(probe_SSL_write_exit) {
    return (SSL_exit(ctx, 1));  // 1 indicates write operation
}

Hooking into the Handshake Process

In SSL/TLS, the handshake is a special process used to establish a secure connection between a client and a server. To analyze this process, we hook into the do_handshake function to track the start and end of the handshake.

Entering the Handshake

We use a uprobe to set a probe for the do_handshake function:


SEC("uprobe/do_handshake")
int BPF_UPROBE(probe_SSL_do_handshake_enter, void *ssl) {
    u64 pid_tgid = bpf_get_current_pid_tgid();
    u32 pid = pid_tgid >> 32;
    u32 tid = (u32)pid_tgid;
    u64 ts = bpf_ktime_get_ns();
    u32 uid = bpf_get_current_uid_gid();

    if (!trace_allowed(uid, pid)) {
        return 0;
    }

    /* store arg info for later lookup */
    bpf_map_update_elem(&start_ns, &tid, &ts, BPF_ANY);
    return 0;
}

The main functionality of this code is as follows:

Obtain the current pid, tid, ts, and uid.
Use trace_allowed to verify if the process is allowed to be traced.
Store the current timestamp in the start_ns map, which will be used to calculate the duration of the handshake process later.

Exiting the Handshake

Similarly, we’ve set a uretprobe for the return of do_handshake:

SEC("uretprobe/do_handshake")
int BPF_URETPROBE(handle_do_handshake_exit) {
    // Code to execute upon exiting the do_handshake function.
    return 0;
}

In this context, the uretprobe will execute the provided code when the do_handshake function exits.


SEC("uretprobe/do_handshake")
int BPF_URETPROBE(probe_SSL_do_handshake_exit) {
    u32 zero = 0;
    u64 pid_tgid = bpf_get_current_pid_tgid();
    u32 pid = pid_tgid >> 32;
    u32 tid = (u32)pid_tgid;
    u32 uid = bpf_get_current_uid_gid();
    u64 ts = bpf_ktime_get_ns();
    int ret = 0;

    /* use kernel terminology here for tgid/pid: */
    u32 tgid = pid_tgid >> 32;

    /* store arg info for later lookup */
    if (!trace_allowed(tgid, pid)) {
        return 0;
    }

    u64 *tsp = bpf_map_lookup_elem(&start_ns, &tid);
    if (tsp == 0)
        return 0;

    ret = PT_REGS_RC(ctx);
    if (ret <= 0)  // handshake failed
        return 0;

    struct probe_SSL_data_t *data = bpf_map_lookup_elem(&ssl_data, &zero);
    if (!data)
        return 0;

    data->timestamp_ns = ts;
    data->delta_ns = ts - *tsp;
    data->pid = pid;
    data->tid = tid;
    data->uid = uid;
    data->len = ret;
    data->buf_filled = 0;
    data->rw = 2;
    data->is_handshake = true;
    bpf_get_current_comm(&data->comm, sizeof(data->comm));
    bpf_map_delete_elem(&start_ns, &tid);

    bpf_perf_event_output(ctx, &perf_SSL_events, BPF_F_CURRENT_CPU, data,
                            EVENT_SIZE(0));
    return 0;
}

Logic of this Function:

Obtain the current pid, tid, ts, and uid.
Use trace_allowed to recheck if tracing is allowed.
Look up the timestamp in the start_ns map for calculating handshake duration.
Use PT_REGS_RC(ctx) to get the return value of do_handshake and determine if the handshake was successful.
Find or initialize the probe_SSL_data_t data structure associated with the current thread.
Update the data structure’s fields, including timestamp, duration, process information, etc.
Use bpf_perf_event_output to send the data to user space.

Our eBPF code not only tracks data transmission for ssl_read and ssl_write but also focuses on the SSL/TLS handshake process. This information is crucial for a deeper understanding and optimization of the performance of secure connections.

Through these hook functions, we can obtain data regarding the success of the handshake, the time taken for the handshake, and related process information. This provides us with insights into the behavior of the system’s SSL/TLS, enabling us to perform more in-depth analysis and optimization when necessary.

User-Space Assisted Code Analysis and Interpretation

In the eBPF ecosystem, user-space and kernel-space code often work in collaboration. Kernel-space code is responsible for data collection, while user-space code manages, processes, and handles this data. In this section, we will explain how the above user-space code collaborates with eBPF to trace SSL/TLS interactions.

1. Supported Library Attachment

In the provided code snippet, based on the setting of the env environment variable, the program can choose to attach to three common encryption libraries (OpenSSL, GnuTLS, and NSS). This means that we can trace calls to multiple libraries within the same tool.

To achieve this functionality, the find_library_path function is first used to determine the library’s path. Then, depending on the library type, the corresponding attach_ function is called to attach the eBPF program to the library function.

if (env.openssl) {
    char *openssl_path = find_library_path("libssl.so");
    printf("OpenSSL path: %s\n", openssl_path);
    attach_openssl(obj, "/lib/x86_64-linux-gnu/libssl.so.3");
}
if (env.gnutls) {
    char *gnutls_path = find_library_path("libgnutls.so");
    printf("GnuTLS path: %s\n", gnutls_path);
    attach_gnutls(obj, gnutls_path);
}
if (env.nss) {
    char *nss_path = find_library_path("libnspr4.so");
    printf("NSS path: %s\n", nss_path);
    attach_nss(obj, nss_path);
}

This section primarily covers the attachment logic for the OpenSSL, GnuTLS, and NSS libraries. NSS is a set of security libraries designed for organizations, supporting the creation of secure client and server applications. Originally developed by Netscape, they are now maintained by Mozilla. The other two libraries have been introduced earlier and are not reiterated here.

2. Detailed Attachment Logic

The specific attach functions are as follows:

#define __ATTACH_UPROBE(skel, binary_path, sym_name, prog_name, is_retprobe)   \
    do {                                                                       \
      LIBBPF_OPTS(bpf_uprobe_opts, uprobe_opts, .func_name = #sym_name,        \
                  .retprobe = is_retprobe);                                    \
      skel->links.prog_name = bpf_program__attach_uprobe_opts(                 \
          skel->progs.prog_name, env.pid, binary_path, 0, &uprobe_opts);       \
    } while (false)
    
int attach_openssl(struct sslsniff_bpf *skel, const char *lib) {
    ATTACH_UPROBE_CHECKED(skel, lib, SSL_write, probe_SSL_rw_enter);
    ATTACH_URETPROBE_CHECKED(skel, lib, SSL_write, probe_SSL_write_exit);
    ATTACH_UPROBE_CHECKED(skel, lib, SSL_read, probe_SSL_rw_enter);
    ATTACH_URETPROBE_CHECKED(skel, lib, SSL_read, probe_SSL_read_exit);

    if (env.latency && env.handshake) {
        ATTACH_UPROBE_CHECKED(skel, lib, SSL_do_handshake,
                            probe_SSL_do_handshake_enter);
        ATTACH_URETPROBE_CHECKED(skel, lib, SSL_do_handshake,
                                probe_SSL_do_handshake_exit);
    }

    return 0;
}

int attach_gnutls(struct sslsniff_bpf *skel, const char *lib) {
    ATTACH_UPROBE_CHECKED(skel, lib, gnutls_record_send, probe_SSL_rw_enter);
    ATTACH_URETPROBE_CHECKED(skel, lib, gnutls_record_send, probe_SSL_write_exit);
    ATTACH_UPROBE_CHECKED(skel, lib, gnutls_record_recv, probe_SSL_rw_enter);
    ATTACH_URETPROBE_CHECKED(skel, lib, gnutls_record_recv, probe_SSL_read_exit);

    return 0;
}

int attach_nss(struct sslsniff_bpf *skel, const char *lib) {
    ATTACH_UPROBE_CHECKED(skel, lib, PR_Write, probe_SSL_rw_enter);
    ATTACH_URETPROBE_CHECKED(skel, lib, PR_Write, probe_SSL_write_exit);
    ATTACH_UPROBE_CHECKED(skel, lib, PR_Send, probe_SSL_rw_enter);
    ATTACH_URETPROBE_CHECKED(skel, lib, PR_Send, probe_SSL_write_exit);
    ATTACH_UPROBE_CHECKED(skel, lib, PR_Read, probe_SSL_rw_enter);
    ATTACH_URETPROBE_CHECKED(skel, lib, PR_Read, probe_SSL_read_exit);
    ATTACH_UPROBE_CHECKED(skel, lib, PR_Recv, probe_SSL_rw_enter);
    ATTACH_URETPROBE_CHECKED(skel, lib, PR_Recv, probe_SSL_read_exit);

    return 0;
}

We further examine the attach_ function and can see that they both use the ATTACH_UPROBE_CHECKED and ATTACH_URETPROBE_CHECKED macros to implement specific mounting logic. These two macros are used respectively for setting uprobe (function entry) and uretprobe (function return).

Considering that different libraries have different API function names (for example, OpenSSL uses SSL_write, while GnuTLS uses gnutls_record_send), we need to write a separate attach_ function for each library.

For instance, in the attach_openssl function, we set up probes for both SSL_write and SSL_read. If users also want to track handshake latency (env.latency) and the handshake process (env.handshake), we set up a probe for SSL_do_handshake.

In the eBPF ecosystem, perf_buffer is an efficient mechanism used to transfer data from kernel space to user space. This is particularly useful for kernel-space eBPF programs as they can’t directly interact with user space. With perf_buffer, we can collect data in kernel-space eBPF programs and then asynchronously read this data in user space. We use the perf_buffer__poll function to read data reported in kernel space, as shown below:

while (!exiting) {
    err = perf_buffer__poll(pb, PERF_POLL_TIMEOUT_MS);
    if (err < 0 && err != -EINTR) {
        warn("error polling perf buffer: %s\n", strerror(-err));
        goto cleanup;
    }
    err = 0;
}

Finally, in the print_event function, we print the data to standard output:

// Function to print the event from the perf buffer
void print_event(struct probe_SSL_data_t *event, const char *evt) {
    ...
    if (buf_size != 0) {
        if (env.hexdump) {
            // 2 characters for each byte + null terminator
            char hex_data[MAX_BUF_SIZE * 2 + 1] = {0};  
            buf_to_hex((uint8_t *)buf, buf_size, hex_data);
            
            printf("\n%s\n", s_mark);
            for (size_t i = 0; i < strlen(hex_data); i += 32) {
                printf("%.32s\n", hex_data + i);
            }
            printf("%s\n\n", e_mark);
        } else {
            printf("\n%s\n%s\n%s\n\n", s_mark, buf, e_mark);
        }
    }
}

You can find the complete source code here: https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/30-sslsniff

Compilation and Execution

To start using sslsniff, you need to first compile it:

make

Once done, follow these steps:

Start sslsniff

In a terminal, execute the following command to start sslsniff:

1	sudo ./sslsniff

Execute CURL command

In another terminal, execute:

1	curl https://example.com

Under normal circumstances, you will see output similar to the following:

<!doctype html>
<html>
<head>
    <title>Example Domain</title>
    ...
<body>
<div>
    ...
</div>
</body>
</html>

sslsniff Output

After executing the curl command, sslsniff will display the following content:

READ/RECV    0.132786160        curl             47458   1256  
----- DATA -----
<!doctype html>
...
<div>
    <h1>Example Domain</h1>
    ...
</div>
</body>
</html>

----- END DATA -----

Note: The displayed HTML content may vary depending on the specific content of the example.com page.

Displaying Latency and Handshake Process

To view latency and handshake process, execute the following command:

$ sudo ./sslsniff -l --handshake
OpenSSL path: /lib/x86_64-linux-gnu/libssl.so.3
GnuTLS path: /lib/x86_64-linux-gnu/libgnutls.so.30
NSS path: /lib/x86_64-linux-gnu/libnspr4.so
FUNC         TIME(s)            COMM             PID     LEN     LAT(ms)
HANDSHAKE    0.000000000        curl             6460    1      1.384  WRITE/SEND   0.000115400        curl             6460    24     0.014

Hexadecimal Output

To display data in hexadecimal format, execute the following command:

$ sudo ./sslsniff --hexdump
WRITE/SEND   0.000000000        curl             16104   24    
----- DATA -----
505249202a20485454502f322e300d0a
0d0a534d0d0a0d0a
----- END DATA -----

...

Summary

eBPF is a very powerful technology that can help us gain deeper insights into how a system works. This tutorial is a simple example demonstrating how to use eBPF to monitor SSL/TLS communication. If you’re interested in eBPF technology and want to learn more and practice further, you can visit our tutorial code repository at https://github.com/eunomia-bpf/bpf-developer-tutorial and tutorial website at https://eunomia.dev/zh/tutorials/.

References:

Posted 2023-06-14Updated 2023-11-27an hour read (About 7343 words)

自然语言编程: 从 AutoGPT 往前迈的一小步

这段时间看到了许多使用 AI 编写代码的故事或是示例，但也许自然语言编程并不需要是像之前想象那样，使用 AI 生成代码，并且自动执行它；全自动生成软件成品是一个科幻元素，但使用 AI 去生成代码有时可能也是个伪需求。也许我们要重新审视在 AI 时代下的编程、程序，乃至于软件工程意味着什么。

另一方面，如果我们能给 AI 明确的、复杂的多步任务和指导，让它遵照执行，也可能能极大地提升 AI 的逻辑分析能力与规划能力 —- 像 AutoGPT 那样全自动分解、规划和执行任务目标，可能并不是最好的方案，我们可以有更好的方式让 AI 完成我们的目标。

引子，与历史

自然语言编程（Natural Language Programming）是一种研究领域，它的目标是让计算机能够理解和执行人类的自然语言指令。这个领域的起源可以追溯到计算机科学的早期阶段。在 20 世纪 50 年代和 60 年代，人们开始尝试使用自然语言来与计算机进行交互。这个时期的研究主要集中在自然语言理解（Natural Language Understanding）上，目标是让计算机能够理解人类的语言。这包括了早期的聊天机器人，如ELIZA 和 SHRDLU。

然而，真正的自然语言编程需要计算机不仅能理解人类的语言，还能根据这些语言来执行任务。只有到最近，使用 LLM 调用 API、AutoGPT 的 Agent 等的出现，才使这一切成为可能。

AutoGPT 的限制

AutoGPT是一个实验性的开源应用，它使用 GPT-4 语言模型来自动执行用户设定的目标。AutoGPT 的核心是自动创建任务并执行任务，它具有接收模糊问题、任务拆解分析和自我成长能力。例如，AutoGPT 的 prompt 之一如下：

Your task is to devise up to 5 highly effective goals and an appropriate role-based name (_GPT) for an autonomous agent, ensuring that the goals are optimally aligned with the successful completion of its assigned task.

The user will provide the task, you will provide only the output in the exact format specified below with no explanation or conversation.

Example input:
Help me with marketing my business

Example output:
Name: CMOGPT
Description: a professional digital marketer AI that assists Solopreneurs in growing their businesses by providing world-class expertise in solving marketing problems for SaaS, content products, agencies, and more.
Goals:
- Engage in effective problem-solving, prioritization, planning, and supporting execution to address your marketing needs as your virtual Chief Marketing Officer.

- Provide specific, actionable, and concise advice to help you make informed decisions without the use of platitudes or overly wordy explanations.

- Identify and prioritize quick wins and cost-effective campaigns that maximize results with minimal time and budget investment.

- Proactively take the lead in guiding you and offering suggestions when faced with unclear information or uncertainty to ensure your marketing strategy remains on track.

这个prompt的目标是为一个自主代理设定高效的目标，并为其赋予一个基于角色的名称（_GPT）。这个代理的任务是帮助用户完成特定的任务，例如在这个例子中，帮助用户进行商业营销。输入是用户提供的任务，输出是代理的名称、描述和目标。输出的格式非常明确，不包含任何解释或对话；代理的名称是CMOGPT，这是基于其角色——虚拟首席营销官（Chief Marketing Officer）的缩写。描述部分详细阐述了这个代理的专业领域和服务对象，即为独立创业者提供营销问题解决方案。

目标部分列出了四个具体的目标，这些目标都与成功完成其分配任务——帮助用户进行商业营销——紧密相关。这些目标包括有效的问题解决、提供具体和可行的建议、识别和优先处理快速获胜和成本效益高的活动，以及在面临不清晰的信息或不确定性时主动引导用户和提供建议。

这只是 AutoGPT 的 prompt 的一部分，接下来，就可以根据划分的任务目标和具体的任务，进行分析和执行，完成一个复杂的需求。

作为首个完全自主运行的GPT-4示例，AutoGPT 的任务规划、分解、执行的方案是基于 GPT-4 的自然语言生成和理解能力，以及多种数据源和工具的调用。babyAGI 也是类似的思路，它是一个自然语言编程系统，通过创建三个代理（任务执行代理、任务创建代理和任务优先级代理）来执行目标。每个代理都有自己的提示和约束，包括来自每个相关任务执行的上下文。这个过程将循环执行，直到没有剩余的任务并且目标完成。

例如，我们可以尝试给 AutoGPT 两个问题：

测试问题1：ETH和特斯拉，哪一个值得在未来5年内投资？它一开始就会把问题拆成 3 个子任务：收集历史数据表现、分析数据趋势和模式、提供最终建议。对于比较的维度，一开始只有历史表现，然后逐步增加了更多维度，包括风险分析（市场条件、监管变化和其他）、新兴技术、公司财务、盈利能力、未来增长能力等，也就是说，AgentGPT 的思考变的越来越全面.在增加比较维度后，AgentGPT 会主动迭代其答案，给出最新建议。测试问题2：如何在加密货币世界中进行聪明的投资.这个问题比上一个更模糊。没有明确的比较标的，需要 AgentGPT 去寻找和定义。没有明确的投资期限，看 AgentGPT 如何处理。可以同样把这个问题抛给 ChatGPT 和 AgentGPT, 这次 AgentGPT 疯狂拆解和新增子任务，在运行的 20 分钟内生成了 53 个子任务，输出超过了 1.5w 字，最终被手动停止。一开始 AgentGPT 还比较正常，仍然把问题拆成 3 个子任务：收集历史数据表现、分析当前市场趋势和潜在发展、在风险管理策略下提供最终建议。但可能因为没有具体的标的，维度发散起来就收不住：历史数据、市场趋势、未来潜力、风险管理、监管影响、多元投资、市场波动、止损、基本面、流动性、经济事件、社区情绪、KOL 情绪…

很明显，AutoGPT 本身的架构并不是为了执行特定的任务或解决特定的问题而设计的，它会存在很多类似的问题：

由于GPT-4语言模型的 token 很贵，其使用费用可能很高，但实际上大多数费用可能被浪费在了无意义的探索上面。
作为一项自主实验，Auto-GPT可能会产生不符合现实世界商业惯例或法律要求的内容或采取的行动。您有责任确保基于本软件输出的任何行动或决定符合所有适用的法律、法规和道德标准。
AutoGPT 会重复类似的问题(没有足够好的记忆和推理、复用之前的结果的能力)、发散、不可预测。

可以发现，即使 AutoGPT 目前具有任务拆解和自我成长能力，但实际上，我们很难完全放手让 GPT 的 agent 在一个开放和模糊的空间中，自行探索和解决问题。这样大概率解决的方案和人类的意图是难以对齐的，最终不得不被停止。尽管还有缺陷，AutoGPT 显示出的能力，依然将人和 AI 的边界继续向前推进了一步。

下一步是什么？

有些对于 AutoGPT 和类似的 AI 系统的改进和解决方案，例如：

增加对应的任务记忆功能：通过增加记忆功能，AutoGPT可以避免重复执行相同的任务。也能大量节约 token。例如 GPTcache:

传统的缓存系统通常利用一个新的查询和一个缓存查询之间的精确匹配来确定所请求的内容在获取数据之前是否在缓存中可用。然而，由于LLM查询的复杂性和可变性，对LLM缓存使用精确匹配的方法不太有效，导致缓存命中率低。为了解决这个问题，GPTCache采用了语义缓存等替代策略。语义缓存可以识别并存储类似或相关的查询，从而提高缓存的命中率，增强整体缓存效率。

GPTCache采用嵌入算法将查询转换为嵌入，并使用向量存储对这些嵌入进行相似性搜索。这个过程允许GPTCache识别并从缓存存储中检索相似或相关的查询，如模块部分所说明的。

设定任务拆解的限制：通过在 AutoGPT 中设定任务拆解的限制，可以防止任务过于发散；同时，可以让 AI 自动规划不同的模型和通过token的使用，降低AutoGPT的运行成本。例如，复杂任务可以自动采用 GPT4 的模型，对于简单的翻译使用 GPT 3.5；或者通过 GPT4 进行任务规划和分析，而使用 3.5 执行具体的任务。

更常见的方案是引入人类监督和交互。可以通过人类每隔一段时间，或者在有需要的时候去监督一下 AI 的执行情况，并确保AutoGPT的行为符合现实世界的商业惯例和法律要求。如果不符合人类的意图的话，通过对话可以对 Agent 进行调整，要求它做更符合人类意图的事情（实际上这在多人合作完成任务的场景下，例如公司或组织中，也是非常常见的）。但相对来说，这种方式经常低效而且缓慢：如果我需要监督 AI 才能保证它不出错，为什么我不自己做呢？

有没有更好的方式？

但实际上，在现实世界中，也许我们有更好的方式让 AI agent 和人类的意图对齐。想象这样的场景：你希望一个对某件复杂任务的流程不了解的人，完成一项特定任务：例如上手一个代码项目的开发和环境配置，学习一门新的编程语言，或者编写一本长篇小说、分析某个商业投资的可行性等等。也许这种情况下，我们可以有一本手册或者教程，它并不需要是精确的、一步一步的指令，但是它会包含一个大致的流程和任务分解，让人类能够快速上手完成对应的任务。那么，我们为什么不能用非常轻松的方式，给 AI 一些大概的指示和任务描述，让它根据这些任务来完成对应的工作呢？

相比 AutoGPT，我们实际上需要的是：

更强的可控性，让它和人类意图进行充分的对齐；
比 CoT（思维链）走的更远，让 AI 能够完成更加复杂的任务，同时不仅限于一步步执行，还需要有递归、循环、条件判断等等特性；
可以由人类给予更加明确的指导，让 AI 完成复杂的任务，而不是让 AI 完全自己探索如何分解任务；
保留一定的灵活性和机动能力，同时让编写指令尽可能简单。实际上，我们可能并不需要用代码开发和精确的链条来介入这个流程，现实世界大多数充斥着模糊、不确定、近似和时刻需要动态调整的部分，设计一个合理的抽象反而是非常困难的。langchain 那种以代码开发为核心的形态，也许并不会是一个合理的答案，AutoGPT 的出现也证明了这一点。

根据 wiki 百科，计算机程序（Computer Program）可以定义为指一组指示电子计算机或其他具有消息处理能力的电子设备每一步动作的指令序列。也许，某种意义上它也是一种 “程序”，但并不是传统的编程语言：自然语言适合模糊化、灵活、可高效扩展的需求，而传统的程序设计语言实际上是一种精确的抽象和计算，二者缺一不可，它们可以相互转化，但是并不一定需要是从自然语言描述转化为精确的计算机指令。未来，每个人都可以是程序员，只要能够用自然语言描述出对应的需求和步骤，无论清晰或者模糊。

自然语言编程

自然语言编程不应该是：

+++ proc1
-- Return five random emojis
+++

+++ proc2
-- Modify proc1 to return random numbers instead
-- Let $n = [the number of countries in Latin America]
-- Instead of five, use $n
/execute proc1
+++

/proc2

用自然语言编程模拟计算机程序的编写没有意义，不需要有明确的定义、关键词、语法等等。

自然语言编程是：

1	把 asserts 目录下所有的以 txt 结尾的文档翻译成英文，然后喂给 AI 做训练

或者一个更复杂的例子：

3 由质谱推测分子结构的一般程序
1)综合评价质谱总图。通常样品的质谱分析，在总离子流色谱图(TIC)中会出现许多峰，由每个峰都可得到相应的质谱图，从这些质谱中选择出合理的质谱图：首先找出总离子流图中对应较强组分的质谱图；观察质谱图中是否出现分子离子峰；主要的碎片离子峰质量是否合理相关；排除不相关的杂质峰；如果谱图中全是奇质量，通常是分子离子峰未出现。
2)分子峰识别，辅助软电离技术，如CI-MS，FAB-MS，ESI等。
3)找出合理重排的奇电离子和结构特征离子峰。
4)重要的低质量区的特征碎片峰。
5)同位素的识别应用。
6)应用高分辨质谱技术可给出分子的元素组成和分子式。能获取高分辨质谱数据当然很理想，遗憾的是大部分双聚焦高分辨质谱仪因高分辨实验操作比较麻烦和仪器状态不佳，要获得1万的分辨率也很难实现；价格高昂的离子回旋共振FT-MS又不够普及。
7) MS-MS质谱-质谱联用。如果仪器具有质谱-质谱联用分析功能，充分利用这种技术，对有足够强度的峰，做二次和多次质谱分析，特别对样品中存在的杂质组分和混合物样品的质谱分析，是非常有效的质谱技术。
8)质谱谱库参考谱图检索。许多质谱仪配带十几万张标准化合物的谱库，可对得到的谱图进行方便的检索，给出相似度较大的化合物。但质谱谱库给出的只是已有化合物的EI-MS谱图，现有的谱库不能给出各种软电离谱图信息和完全未知化合物的结构信息。
9)综合研究能获取的所有光谱结构信息。质谱图要求对大部分较大的谱峰和某些结构特征的小峰作出合理解释；如有条件需要结合红外光谱图和核磁共振谱图进行综合分析，给出可信度最大的分子结构信息。
10)最终确认推测的分子结构，还要通过获取标准物质后做全部化学、光谱信息和应用理化性质作符合检查，最终确定推测化学结构的准确性。

这里是另外一个例子，我们希望使用自然语言，来指导 AI 完成一个长篇小说的创作。

有一个任务叫做“翻译和根据大纲创作小说章节”。
我会给你章节的大纲、背景知识，你需要：
1. 根据大纲和背景知识，写出章节的内容；
2. 把内容翻译成英文。要求翻译质量高，文笔出色。

我想要给你一个任务，叫做作品创作和翻译：
你是一个资深作家和翻译家，我希望你帮忙创建一个长篇小说，并且把它翻译成英文。
我会告诉你需要这个小说的题目是什么，以及大概需要多少个章节。你会按照我的指示一步步完成任务。
首先，这个小说的主题是什么？写出主题。
你要不要上网搜索一下这个小说的主题的一些相关背景？把搜索到的背景资料也整理一下。
之后，根据主题和背景资料，这篇小说的大纲是什么？把它的大纲列出来。
然后，对于每个章节，把章节的大纲和背景列出来，对每个章节继续“翻译和根据大纲创作小说章节”这个任务。
最后，把每个章节拼凑在一起，保存在文件里面。

现在，我想要完成以下的事情：
1. 作品创作和翻译一本关于“猫”的童话小说，一共有10个章节，每个章节大概 500 字。
2. 作品创作和翻译一本关于“猫娘”的科幻小说，一共有10个章节，每个章节大概 1000 字。

在这个情况下，我们可以将自然语言编写的一系列视为一种任务分解和执行的过程。我们可以通过定义一些特定的关键词和语法规则来实现这个目标。我们的目标是将自然语言脚本分解为一系列的指令和参数，然后执行这些指令。

自然语言编程不是，也不应该是通常意义上的编程语言编程。我们并不进行自然语言到代码的转换，没有确定的语法、语言和编程范式，大语言模型就是我们的解释器、CPU 和内存；自然语言适合处理需求模糊、信息密度高的应用，代码适合处理精确、可靠的部分。或者说，自然语言编程实际上是 prompt engineering 的高阶形态，自然语言的指令不再局限于单次和 AI 的交互上下文，并且希望能够借助它来增加和扩展 AI 完成复杂推理、复杂任务执行的能力。

以下是一些可能的步骤和考虑因素：

函数（任务）：在自然语言编程中，函数可以被视为一种任务或者目标。例如，在你的脚本中，“作品创作和翻译”就可以被视为一个函数，它需要接收一些参数（例如，主题、章节数量、每章字数等），然后执行一系列的步骤来完成这个任务。
变量：全局变量可以被视为在整个脚本中都可以访问的信息、概念、知识。在你的脚本中，例如“小说的主题”、“章节数量”、“每章字数”等都可以被视为全局变量。
指令：在自然语言编程中，指令可以被视为一种行为或者动作。例如，“写出主题”、“搜索背景资料”、“列出大纲”等都可以被视为指令。这些指令可以是顺序执行的，也可以是选择执行的，甚至可以是循环执行的。
执行流程：执行流程是指如何按照一定的顺序和逻辑来执行这些指令。在你的脚本中，执行流程可能是这样的：首先，执行“写出主题”这个指令，然后执行“搜索背景资料”这个指令，接着执行“列出大纲”这个指令，然后对于每个章节，执行“翻译和根据大纲创作小说章节”这个指令，最后执行“拼凑章节”和“保存文件”这两个指令。
分析自然语言脚本：分析自然语言脚本的目标是将脚本中的语句分解为一系列的指令和参数。这可能需要使用一些自然语言处理（NLP）的技术，例如语义分析、实体识别等。例如，从“作品创作和翻译一本关于“猫”的童话小说，一共有10个章节，每个章节大概 500 字。”这个语句中，我们可以识别出“作品创作和翻译”是一个函数，“猫”、“童话小说”、“10个章节”、“每章500字”是这个函数的参数。
执行指令：执行指令的目标是按照分析出的指令和参数来完成任务。这可能需要调用一些外部的API或者服务。例如，执行“翻译和根据大纲创作小说章节”这个指令可能需要调用一个文本生成的API，执行“翻译”这个指令可能需要调用一个翻译的API。

一个可能的思路

为了正确、可控地执行它，我们首先定义函数（任务）、变量和执行流程，然后通过一系列的提示来提取这些信息，并最后让AI处理对应的执行流程。以下是一种可能的方式，以及示例的提示信息（实际上的提示比这个复杂得多）：

定义和提取函数（任务）：
- 函数（任务）是一个具有明确输入和输出的可复用单元。在自然语言脚本中，函数（任务）通常是一些需要完成的目标或动作。
- 示例提示：请在你的脚本中找出所有的任务或目标。这些通常是一些动词短语，例如“写出主题”、“搜索背景资料”、“列出大纲”等。例如，”作品创作和翻译”可以被视为一个函数，它需要接收一些参数（例如，主题、章节数量、每章字数等），然后执行一系列的步骤来完成这个任务。
定义和提取变量：
- 变量是在整个脚本中都可以访问的信息，它可以是一些知识、概念、信息，一段文本等等。
- 示例提示：请在你的脚本中找出所有的变量。这些通常是一些名词或名词短语，例如“小说的主题”、“章节数量”、“每章字数”等。
定义和提取执行流程：
- 执行流程是指如何按照一定的顺序和逻辑来执行这些指令。在你的脚本中，执行流程可能是这样的：首先，执行“写出主题”这个指令，然后执行“搜索背景资料”这个指令，接着执行“列出大纲”这个指令，然后对于每个章节，执行“翻译和根据大纲创作小说章节”这个指令，最后执行“拼凑章节”和“保存文件”这两个指令。
- 示例提示：请在你的脚本中找出每个任务或目标的执行流程。这通常是一系列的步骤或指令，例如“首先，写出主题，然后，搜索背景资料，接着，列出大纲，然后对于每个章节，翻译和根据大纲创作小说章节，最后，拼凑章节，保存文件”。
处理执行流程：
- 根据提取出的执行流程，AI需要处理对应的执行流程。根据循环、分支、顺序执行做出不同的处理。
- 示例提示：现在，我将开始根据提取出的执行流程来处理每个任务。例如，对于“翻

译和根据大纲创作小说章节”这个任务，我将首先根据大纲创作章节的内容，然后将内容翻译成英文。

这样的流程可以帮助我们将自然语言脚本转化为可执行的任务。但是请注意，这需要一些自然语言处理和分析的技能，以及对脚本内容的深入理解。此外，AI的处理能力也会受到一些限制，例如，它可能无法直接处理一些过于复杂的任务（可以在脚本中进一步划分和明确任务），或者需要人类的帮助来完成一些步骤。

自然语言脚本编译器与运行时：langScript

我们做了一个简单的开源项目实验，来完成这个目标和执行自然语言编写的脚本 —- 它不需要有完全确定的语法和格式，也并不需要像学习一门编程语言一样学习它。本质上，它只是对于需求和任务稍微明确一些的指引和抽象，我们可以更方便地使用自然语言定义流程和指导 AI 完成复杂工作。

未来：AI 还是软件？

也许我们很快就要重新思考软件工程是什么这个问题：

AI 和更广义的 “软件” 之间有什么区别呢?
一个软件中的哪些部分可以被 AI 替代?
AI 可以怎样重塑软件的生命周期、开发方式？
自然语言的部分和代码的部分各有什么优势和劣势？哪些情况下我们可以直接使用自然语言，哪些情况下需要代码进行抽象？
在一个软件的组成部分中，用 AI 替换掉代码，能带来哪些变革? 会不会能更好的完成对应的需求?
更进一步，一个完整的信息系统（以处理信息流为目的的人机一体化系统，有软件、硬件和人类的部分）和 AI 的关系是什么? AI在信息中是什么样的地位? AI 会怎样从信息的角度重塑我们的社会结构和组织模式？

目前来看，我们还远远没有答案。

原 Netscape（网景公司）创始人 Marc Andreessen 说过一句经典的话：软件正在吞噬世界。人工智能领域知名科学家 Andrej Karpathy 在 2017 年为上面的话做了补充：软件（1.0）正在吞噬世界，现在人工智能（软件2.0）正在吞噬软件

Software (1.0) is eating the world, and now AI (Software 2.0) is eating software.

简介

CET 实现原理

示例代码

编译和运行

测试 CET

结论

Table of Contents

How eBPF Ensures Security with Verifier

What the eBPF Verifier Is and What It Does

How the eBPF Verifier Works

Challenges

Other works to improve verifier

Limitations in eBPF Access Control

CAP_BPF

bpf namespace

Unprivileged eBPF

Trusted Unprivileged BPF

Other possible solutions

MOAT: Towards Safe BPF Kernel Extension (Isolation)

Unleashing Unprivileged eBPF Potential with Dynamic Sandboxing

Kernel extension verification is untenable

Wasm-bpf: WebAssembly eBPF library, toolchain and runtime

bpftime: Userspace eBPF runtime for uprobe & syscall hook & plugin

Conclusion

XRP: In-Kernel Storage Functions with eBPF

Specification and verification in the field: Applying formal methods to BPF just-in-time compilers in the Linux kernel

λ-IO: A Unified IO Stack for Computational Storage

Extension Framework for File Systems in User space

Electrode: Accelerating Distributed Protocols with eBPF

BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing

hXDP: Efficient Software Packet Processing on FPGA NICs

Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code

Fast In-kernel Traffic Sketching in eBPF

SPRIGHT: extracting the server from serverless computing! high-performance eBPF-based event-driven, shared-memory processing

Programmable System Call Security with eBPF

Cross Container Attacks: The Bewildered eBPF on Clouds

Comparing Security in eBPF and WebAssembly

A flow-based IDS using Machine Learning in eBPF

Femto-containers: lightweight virtualization and fault isolation for small software functions on low-power IoT microcontrollers

背景

项目产出要求

需求分析

打包发布格式与存储格式

OCI 镜像

Docker 和 OCI

标准容器的五个原则

1. 标准操作

2. 不受内容限制

3. 不受基础设施限制

4. 自动化设计

5. 工业级交付

eBPF OCI

优缺点

案例

用户体验设计

eBPF 软件编写模板

包管理器

角色 1：普通用户/user

角色 2：通用 ebpf 数据文件发布者/ebpf developer

eBPF 程序存储和管理平台（网页前端）

页面内容（以 Docker 举例）

额外页面

系统设计方案：Serverless 架构

API 设计

Serverless SQL 数据库

进度规划

Challenges in Tracking HTTP, HTTP/2, and Other Seven-Layer Protocols

eBPF Socket Filter vs. Syscall Tracing: In-Depth Analysis and Comparison

eBPF Socket Filter

eBPF Syscall Tracing

Comparison of eBPF Socket Filter and Syscall Tracing

Capturing HTTP Traffic with eBPF Socket Filter

Potential Limitations

User-Space Code

Compilation and Execution

Capturing HTTP Traffic Using eBPF Syscall Tracepoints

Hook Locations and Flow

Complete Set of System Calls to Hook

Summary

提高结果的六种策略

`bpftime`: Userspace eBPF runtime for uprobe & syscall hook & plugin